- Log into jupyter.marlboro.college. On that system, create a "hello world" program in two ways: in a jupyter notebook hello.ipynb (which includes with a markdown title cell), and in a python file hello.py. Create the file with the built-in text editor (notice how it does color formatting once you give the file a .py name), and run in from a terminal.
- Using the CSV data file vernon_1850.csv, write a python program (in a jupyter notebook) that calculates and prints their average age.
- Read chapter 1 in the textbook, and come to class ready to discuss.

- Read chapters 3 and 4 in the text, and explore the book's code and examples.
- You mission this week is to start working with some data, massaging it and graphing it.
- See my Jan 31 notes for the details :
- Grab some csv data from kaggle or elsewhere.
- Play with it : put it into buckets or combine some columns.
- Make some plots to visualize what it's all about.
- Do this in a jupyter notebook, explaining what you've done.

- If time allows, do this with twice.

- Read chapters 5 and 6 in the text (statistics and probability).
- Using the Iris data and doing something like what's in my jupyter notebook, make a histogram of the virginica sepal lengths.
- Find their mean and standard deviation.
- Superimpose a plot of the normal distribution which has the same mean and standard deviation. Is it a reasonable fit?

- Find the probability that one of these flowers has a length greater than 8 cm.
- Confirm with a scatter plot and correlation coefficient that there is not much of a relation between the versicolor and virginica sepal length data.
- Check to see if there is a correlation between the versicolor sepal and petal lengths with a scatter plot and the coefficient. What do you find?

- Choose a few pages (at least) of text from a well known book.
- Write a python program to find
- the probability P(w) of its words,
- the conditional probabilities P(2nd=word_j |1st=word_i) for consecutive (1st_word, 2nd_word) pairs.
- (You may find this code to count the words in Moby Dick to be helpful.)
- What are the most common words? Given one of those, what are the most common words that follow it?
- Show by direct calculation that Bayes Theorem holds for one pair P(1st|2nd) .

- Is this coin fair? (It gives a random 't' or 'h' with each page load.) Make and discuss an explicit hypothesis test to decide, in two cases : with 10 coin flips, and with 5000 coin flips.

- Create a jupyter notebook that uses k-nearest neighbors algorithm as described in our textbook and in class on one of these datasets.
- Come to class Friday ready to describe what you did and how it worked.

- Apply the naive bayes text classification method to either
- a tiny example of your own, like I worked through in class in Tuesday
- a text classification dataset like this one

- Please don't use the black-box routines from scikit-learn - the point here is to work through the calculation "from scratch".

- As we gear up to do this online, please drop me a note here to let me know you're doing.
- What timezone are you in?
- How is your access to the internet?
- Do you have questions or concerns?

- Do something like what I showed in class on Friday in this linear regression with gradient descent notebook .
- See my Friday notes for the details.

- Read chapter 18 in the text, "neural nets", and play with that code.
- Check out the related blog post by the same author, Fizz Buzz in Tensor Flow
- ... a more specific coding piece for this may be coming ...
- Decide what data you want to work with for your final project, and what sorts of investigations you want to do on it. (Presentations will be in a month, Tue May 5. Expect a "how is it going" update due in about two weeks.

- Work on your projects. Describe what you've done.
- Come to class Tuesday with something to show - data loaded into a jupyter notebook and a plot, for example.
- Read chapter 19, on deep learning, and/or check out some of the articles I posted in the class notes. In a jupyter notebook, try running one from the "from scratch" examples or a tutorial from tensorflow or pytorch. (Both should work on jupyter.marlboro.)

- Continue work on your projects.
- Be ready to give another project status update in class on Tuesday.
- Read chapter 17 on decision trees, my notes from Tuesday, and/or explore the "for further exploration" at the end of the chapter.
- Also check out chapter 20, on clustering.
- Optional : try a decision tree model on any data you choose, using scikit learn's decision tree or the textbook code or other library. Or try a clustering algorithm from the text or elsewhere. (sci-kit learn has a bunch).
- Tell me here what you did this week.

- Present your data analysis projects to the rest of the class on our last meeting.

- Turn in a jupyter notebook of your final project data analysis.
- Include
- your data sources
- a bibliography of other similar or related work that helped you along
- a description of your exploratory investigation, with plots
- questions that your work tries to answer
- any machine learning models that you developed and applied
- whatever conclusions or thoughts you ended with

- a place for Jim to leave end of term comments

- Please give me any feedback you have about how the class went - what you liked and didn't like.
- What would have improved it?
- What worked for you?

- How did the last month online work? What would have helped it go better?