Oct 31
aside
Want $50,000?
homework
discuss ; make sure Bayse Thm is clear.
where we are
Talking about Machine Learning
- general ides, but not every math detail in the text
- two particular examples to focus on
I've asked you to read
- chap 13, on probability and Bayes Theorem
- chap 18, on the basic notions of Machine Learning
- agent algorithm
- decision trees
- regression analysis
- neural networks
- chap 20, on probabilistic learning
- Naive Bayes models
- ... and lots more complicated versions
The text goes into much more detail of the math
and variations on models than we will. Focus
on the two examples, and use sources outside
the text to get the gist as needed.
We're going to do Naive Bayes first, moving on to neural nets soon.
So: hit the books and resources.
spam classification
This is the archetypal Baysian learning system.
Walk through the algorithm with my email example.
(Though I haven't uploaded my raw data, for
privacy reasons.)
Other resources :
Walk through NLP-NaiveBayes.pdf .
Look at my code (perl; sorry; old code) from
my email, and discuss how the gist of it works.
I have not uploaded the 1/2 Gig of my email and spam messages
for privacy and space reasons, though I may show the format
of these files and folders in class.
I also have a
bayes.lisp example that may
(or may not be helpful.)
Your assignment for next week - start and we can discuss Thu:
- find and/or invent some text documents to classify
- ... or get the numbers from the UCI spambase if that seems like too much work
- write some lisp or python code to implement this algorithm.
- remember that counting is done with hashes or dictionaries