Tues Oct 7 - Statistics Lecture Notes

• go over homework
• 5-31: C(35,3) = (35*34*33)/(3*2*1)
• 5-32: one person hurt p=0.2. With n=7 people, probability that at least one is hurt is Prob(x>=1) = 1-P(0) = 1-(0.8)7 = 0.79
• 6-21 : type numbers into Exel, make p(x) column by dividing, find mean = sum(x*p), variance = sum(x2 p) - mean2
(This one was more tedious than I first expected.)
• class experiment 2 : "=RAND()" in exel; fill in in a large rectangular block. Paste "=AVERAGE(drag row)" into each entry of a nearby column, then make a histogram of those numbers. Discuss what's going on: RAND() is a uniform probability distribution, but an average of many RAND()'s tends towards a normal distribution.

• Discuss upcoming 2nd test next week, chap 5-7, pick a day. Also point out next assignment.

• Extra topic: average and standard deviation of a sum of random variables
• Derive this for those who want to see the math.
• mean(x+y) = mean(x)+mean(y)
• variance(x+y) = variance(x)+variance(y)
(IF x,y independent)

• Consequence for binomial :
• n=1: mean = p, variance = pq = p * (1-p)
• n=N: mean = Np, variance = sigma2= Npq = N * p * (1-p)

• which leads to this IMPORTANT RULE OF THUMB :
• As you vary N, sigma/mean = constant / sqrt(N)
• true almost whenever you're doing many trials and averaging the results
• particularly noteworthy when you want to get better results by doing more trials, or doing a larger survey
• example: Say you do an experiment 5 times and find answer = 2.3 +- .4, and the error is due to random factors.
Doing it more times will give you a better answer: the randomness will tend to cancel out.
How many times must you run it to get accuracy of +- .04 ?
Answer: a 10 times smaller sigma means 100 times bigger N, so you'd need N=500.

• chap 7: binomial approximated as Normal
• very roughly, you need Np > 5 (mean big enough) and Npq > 5 (variance big enough) for this approximation to be reasonable.
• procedure is
• Given N, p
• Find mean, sigma
• Given discrete raw scores, convert to continuous z-scores,
• Do standard normal distribution lookup (table D-4) as before to get probabilities
• example: 7-15, pg 189: P(bob wins)=1/4. Find probability that he wins 18 or more times out of 24 games.
• N=24, p=0.25, q=0.75
• This implies mean=N*p=6, sigma=sqrt(N*p*q)= 2.1
• Treating discrete "18" as continuous "17.5 to 18.5", "at least 18" is "continuous raw score > 17.5", which is z > (17.5 - 6)/2.1 = 5.48
• total probability above 5.48 is off the table D-4, p < 0.00003
• exact version uses binomial with P(18)+P(19)+...+P(24)
<& /home/footer.html, 'Statistics Fall 2003', '../Statistics.html' &>