Some of your work.

Notice complexities of using scikit learn for this stuff : lots of buzzwords from probability and statistics that you need to wade through, lots of technical jargon in the API that you need to stare at to fit their (powerful, general, professional) tools to your particular problem.

Here are a few notebooks on kaggle using their sample dataset and MultinomialNB from scikit-learn :

- O Pathak's Spam Filtering
- Mohit's Simple Spam Filter

And some of Paul Graham's articles on this topic :

- A Plan for Spam (2002)
- Better Baysian Filtering (2003)

I'm not quite sure what we should spend our time on after break. We'll have about five weeks, and I'd like you to spend a good chunk of that time on your own projects.

Here are the machine learning ideas from the text

- regression : 14 (simple linear regression), 15 (multiple regression), 8 (gradient descent i.e. calculus done with code)
- decision trees : chap 17
- neural nets & deep learning : chap 18, 19 (powerful, popular, tricky)

Definitions and summaries of these are at :

... already too much to do in the time remaining. Your thoughts?

https://cs.marlboro.college /cours /spring2020 /data /notes /mar13

last modified Tue October 15 2024 6:00 am

last modified Tue October 15 2024 6:00 am