Tues Oct 7 - Statistics Lecture Notes
- go over homework
- 5-31: C(35,3) = (35*34*33)/(3*2*1)
- 5-32: one person hurt p=0.2. With n=7 people, probability that
at least one is hurt is Prob(x>=1) = 1-P(0) = 1-(0.8)7 = 0.79
- 6-21 : type numbers into Exel, make p(x) column by dividing,
find mean = sum(x*p), variance = sum(x2 p) - mean2
(This one was more tedious than I first expected.)
- class experiment 2 : "=RAND()" in exel; fill in in a large rectangular
block. Paste "=AVERAGE(drag row)" into each entry of a nearby column,
then make a histogram of those numbers.
Discuss what's going on: RAND() is a uniform probability distribution,
but an average of many RAND()'s tends towards a normal distribution.
- Discuss upcoming 2nd test next week, chap 5-7,
pick a day. Also point out next assignment.
- Extra topic: average and standard deviation of a sum of random variables
- Derive this for those who want to see the math.
- mean(x+y) = mean(x)+mean(y)
- variance(x+y) = variance(x)+variance(y)
(IF x,y independent)
- Consequence for binomial :
- n=1: mean = p, variance = pq = p * (1-p)
- n=N: mean = Np, variance = sigma2= Npq = N * p * (1-p)
- which leads to this IMPORTANT RULE OF THUMB :
- As you vary N, sigma/mean = constant / sqrt(N)
- true almost whenever you're doing many trials and averaging the results
- particularly noteworthy when you want to get better results
by doing more trials, or doing a larger survey
- example: Say you do an experiment 5 times and find answer = 2.3 +- .4,
and the error is due to random factors.
Doing it more times will give you a better answer:
the randomness will tend to cancel out.
How many times must you run it to get accuracy of +- .04 ?
Answer: a 10 times smaller sigma means 100 times bigger N, so
you'd need N=500.
- chap 7: binomial approximated as Normal
- very roughly, you
need Np > 5 (mean big enough) and Npq > 5 (variance big enough)
for this approximation to be reasonable.
- procedure is
- Given N, p
- Find mean, sigma
- Given discrete raw scores, convert to continuous z-scores,
- Do standard normal distribution lookup (table D-4) as before
to get probabilities
- example: 7-15, pg 189: P(bob wins)=1/4. Find probability that
he wins 18 or more times out of 24 games.
- N=24, p=0.25, q=0.75
- This implies mean=N*p=6, sigma=sqrt(N*p*q)= 2.1
- Treating discrete "18" as continuous "17.5 to 18.5",
"at least 18" is "continuous raw score > 17.5", which is
z > (17.5 - 6)/2.1 = 5.48
- total probability above 5.48 is off the table D-4, p < 0.00003
- exact version uses binomial with P(18)+P(19)+...+P(24)
<& /home/footer.html, 'Statistics Fall 2003', '../Statistics.html' &>