------------------------------- All of these refer to 40 year old women. * 1% who participate in routine screening have breast cancer. * 80% with breast cancer will get positive mammographies. * 9.6% without breast cancer will also get positive mammographies. * Given that a woman has a positive mammography in a routine screening, what is the probability that she actually has breast cancer? ------------------------------ The same statements in formal language, using the notation cancer = has breast cancer !cancer = doesn't have breast cancer pos = mammogram test is positive !po = mammogram test is negative * P(cancer) = 0.01 * P(pos|cancer) = 0.80 * P(pos|!cancer) = 0.096 * P(cancer|pos) = ? ------------------------------ First by counting the numbers. Suppose N = 1000. Then: * N(cancer) = P(cancer)*N = 10 * 8 (i.e. 80% of 10) of these will test positive for cancer. * 2 of these will test negative. * N(!cancer) = 990 * 95 (i.e. 0.096*990) of these test positive for cancer * 895 of these test negative. The total number of women who test positive is 95 + 8 = 103, and so P(cancer|positive) = 8/103 = 0.078 = 7.8% ------------------------------ The joint probability distribution is P(cancer, pos) = 8/1000 = 0.008 P(cancer, !pos) = 2/1000 = 0.002 P(!cancer, pos) = 95/1000 = 0.095 P(!cancer, !pos) = 895/1000 = 0.895 which lets us calculate anything directly from definitions: P(cancer) = 0.008 + 0.002 = 0.01 P(!cancer) = 0.095 + 0.895 = 0.99 P(pos) = 0.008 + 0.095 = 0.103 P(!pos) = 0.002 + 0.895 = 0.897 P(cancer|pos) = P(cancer,pos)/P(pos) = 0.008/0.103 = 0.078 ------------------------------ The Bayes formula is P(a|b) = P(b|a) * P(a) / P(b) = P(b|a) * P(a) / sum_x { P(b|x) * P(x) } which in this case is P(pos|cancer) * P(cancer) P(cancer|pos) = ---------------------------------------------------------- P(pos|cancer) * P(cancer) + P(pose|!cancer) * P(!cancer) 0.80 * 0.01 = ---------------------------- 0.80 * 0.01 + 0.096 * 0.99 = 0.008/(0.008 + 0.095) = 0.008/0.103 = 0.078 -------------------------------