Feb 23 - conditional probability
context
We're talking about probability, working through the
material in chapter 2, putting together the theory
behind the math models used to build the statistical
tests that we're heading towards later in the term.
Today we'll discuss an idea called "conditional probability" :
what is the probability of something thing A given that you
already know some other thing B. This notation is P(A|B)
which you would say as "the probability of A given B".
You will *not* be tested on this material. It's cool stuff,
but a bit sideways to what you absolutely need to know
in order to do the statistical tests that we're heading
towards. Interesting and useful in certain situations, but optional
for the core material of intro stats.
We already talked about "independent" things a bit.
If two things A and B are independent, then P(A|B) = P(A).
In other words, knowing about B doesn't change A's probability.
I think the best way to see this material is with examples.
All the ones here have two variables, each with two possible
values - this is the simplest setup that shows how this stuff works.
example 1 - medical test
Suppose that you are worried that you might have a rare disease. You
decide to get tested, and suppose that the testing methods for this
disease are correct 99 percent of the time (in other words, if you
have the disease, it shows that you do with 99 percent probability,
and if you don't have the disease, it shows that you do not with 99
percent probability). Suppose this disease is actually quite rare,
occurring randomly in the general population in only one of every
10,000 people. (From
https://www.math.hmc.edu/funfacts/ffiles/30002.6.shtml )
Discuss & work through these concepts :
- joint probability table (table of P(A & B) ; has all information)
- tree diagram (same data, different display; best for sequences)
- marginal (i.e. "sums in margins") probability calculations
- conditional probability definition & calculation, i.e. P(sick|positive test)
example 2 - parents & teens college or not
A look at teens who did or didn't go to college (variable A)
from families with a parent who did or didn't go
to college (variable B). (From our textbook, pg 88.)
The numbers (from which we can work out probabilities)
of people are :
count(yes parent & yes teen) = 231
count(yes parent & no teen) = 49
count(no parent & yes teen) = 214
count(no parent & no teen) = 298
- again, work through all these ideas.
- P(yes teen | yes parent) = ?
- P(yes parent | yes teen) = ?
theory aside
Bayes Theorem ... let's you flip a conditional probality around.
Since P(A & B) = P(A) * P(B|A) = P(B) * P(A|B)
Then P(A|B) = P(A) * P(B|A) / P(B)
Spam filter example :
Measure P( word[i] | spam) by counting each word[i]
in collection of emails labeled as spam or not spam.
Use Bayes theorem to then look at a new mail message,
see what words are in it, and find P(spam | word[i])
example 3 - taxicab witness
A cab was involved in a hit and run accident at night.
Two cab companies, the Green and the Blue, operate in the city.
You are given the following data:
- 85% of the cabs in the city are Green and 15% are Blue.
- A witness identified the cab as Blue.
- The court tested the reliability of the witness under the same circumstances that existed on the night of the accident and concluded that the witness correctly identified each one of the two colors 80% of the time and failed 20% of the time.
What is the probability that the cab involved in the accident was Blue rather than Green?