Oct 13
Machine learning, statistical approaches, and all that.
resources
homework & textbook discussion
Topics
- conditional probability
- learning - general principles
- training set
- noise
- pattern recognition
- typically "low level" technique; not logic "high level"
- usually end up with "machine" characterized by numbers whose specifics are not subject to a simple interpretation
- naive bayes model
- neural nets
- many variations
- example: character recognition
probability
- http://en.wikipedia.org/wiki/Bayesian_probability
- random variables
- boolean : verdict = (true, false)
- discrete : weather = (sun, clouds, rain, snow)
- continuous : -10 < x < 10
- probability distribution as either
- frequency (typical in hard sciences)
- agent knowledge (typical in AI)
- Bayesian probablity is more about the "belief" version of probality: every new fact alters other probabilities.
Example: What is Matt Ollis doing right now ... expressed as a probability. If I now tell you that he's downtown, what are the "probabilities" now?
Terminology:
P(A) is "probability of A" .
P(A|B) is "probability of A given that we know B"
P(a,b,c) is "joint distribution"
Example from text, pg 475 :
toothache !toothache
catch !catch catch !catch
cavity 0.108 0.012 0.072 0.008
!cavity 0.016 0.064 0.144 0.576
What is
P(cavity) ?
P(cavity or toothache) ?
P(cavity | catch) ?
Rules :
P(x|y) * P(y) = P(x and y)
or
P(x|y) = P(x and y) / P(y)
if we sum over all possibilities for y,
P(x) = sum_y P(x|y) * P(y)
if A and B are independent, then
P(A|B) = P(A)
P(B|A) = P(B)
P(A and B) = P(A) * P(B)
Bayes rule :
P(M and N) = P(M|N) * P(N) = P(N|M) * P(M)
therefore
P(M|N) = P(N|M) * P(M) / P(N)
"This simple equation underlies all modern AI systems for probabilistic inference."
cancer example
1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?
"What do you think the answer is? If you haven't encountered this kind of problem before, please take a moment to come up with your own answer before continuing."
Here's the actual Bayes formula. (Note that the denominator is P(X).)
p(A|X) = p(X|A)*p(A) / { p(X|A)*p(A) + p(X|~A)*p(~A) }
Given some phenomenon A that we want to investigate, and an observation X that is evidence about A - for example, in the previous example, A is breast cancer and X is a positive mammography - Bayes' Theorem tells us how we should update our probability of A, given the new evidence X.
Once we get to learning, we'll apply this rule to training spam filters...
naive bayes spam filtering
AIMA 13.22 Text categorization is the task of assigning a given document to one of a fixed set of categories on the basis of the text it contains. Naive Bayes models are often used for this task. In these models, the query variable is the document category, and the “effect” variables are the presence or absence of each word in the language; the assumption is that words occur independently in documents, with frequencies determined by the document category.
a. Explain precisely how such a model can be constructed, given as “training data” a set of documents that have been assigned to categories.
b. Explain precisely how to categorize a new document.
c. Is the conditional independence assumption reasonable? Discuss.