Statistics Lecture Notes

Tues Oct 7 - Statistics Lecture Notes

go over homework

5-31: C(35,3) = (35*34*33)/(3*2*1)
5-32: one person hurt p=0.2. With n=7 people, probability that at least one is hurt is Prob(x>=1) = 1-P(0) = 1-(0.8)⁷ = 0.79
6-21 : type numbers into Exel, make p(x) column by dividing, find mean = sum(x*p), variance = sum(x² p) - mean²
(This one was more tedious than I first expected.)
class experiment 2 : "=RAND()" in exel; fill in in a large rectangular block. Paste "=AVERAGE(drag row)" into each entry of a nearby column, then make a histogram of those numbers. Discuss what's going on: RAND() is a uniform probability distribution, but an average of many RAND()'s tends towards a normal distribution.

Discuss upcoming 2nd test next week, chap 5-7, pick a day. Also point out next assignment.
Extra topic: average and standard deviation of a sum of random variables
- Derive this for those who want to see the math.
- mean(x+y) = mean(x)+mean(y)
- variance(x+y) = variance(x)+variance(y)
  (IF x,y independent)
Consequence for binomial :
- n=1: mean = p, variance = pq = p * (1-p)
- n=N: mean = Np, variance = sigma²= Npq = N * p * (1-p)
which leads to this IMPORTANT RULE OF THUMB :
- As you vary N, sigma/mean = constant / sqrt(N)
- true almost whenever you're doing many trials and averaging the results
- particularly noteworthy when you want to get better results by doing more trials, or doing a larger survey
- example: Say you do an experiment 5 times and find answer = 2.3 +- .4, and the error is due to random factors.
  Doing it more times will give you a better answer: the randomness will tend to cancel out.
  How many times must you run it to get accuracy of +- .04 ?
  Answer: a 10 times smaller sigma means 100 times bigger N, so you'd need N=500.
chap 7: binomial approximated as Normal
- very roughly, you need Np > 5 (mean big enough) and Npq > 5 (variance big enough) for this approximation to be reasonable.
- procedure is
  - Given N, p
  - Find mean, sigma
  - Given discrete raw scores, convert to continuous z-scores,
  - Do standard normal distribution lookup (table D-4) as before to get probabilities
- example: 7-15, pg 189: P(bob wins)=1/4. Find probability that he wins 18 or more times out of 24 games.
  - N=24, p=0.25, q=0.75
  - This implies mean=N*p=6, sigma=sqrt(N*p*q)= 2.1
  - Treating discrete "18" as continuous "17.5 to 18.5", "at least 18" is "continuous raw score > 17.5", which is z > (17.5 - 6)/2.1 = 5.48
  - total probability above 5.48 is off the table D-4, p < 0.00003
  - exact version uses binomial with P(18)+P(19)+...+P(24)

<& /home/footer.html, 'Statistics Fall 2003', '../Statistics.html' &>