Tues Oct 7 - Statistics Lecture Notes
  -  go over homework 
    - 5-31: C(35,3) = (35*34*33)/(3*2*1)
- 5-32: one person hurt p=0.2. With n=7 people, probability that
      at least one is hurt is Prob(x>=1) = 1-P(0) = 1-(0.8)7 = 0.79
- 6-21 : type numbers into Exel, make p(x) column by dividing, 
      find mean = sum(x*p), variance = sum(x2 p) - mean2
      
 (This one was more tedious than I first expected.)
- class experiment 2 : "=RAND()" in exel; fill in in a large rectangular
      block.  Paste "=AVERAGE(drag row)" into each entry of a nearby column, 
      then make a histogram of those numbers.  
      Discuss what's going on: RAND() is a uniform probability distribution,
      but an average of many RAND()'s tends towards a normal distribution.
  
- Discuss upcoming 2nd test next week, chap 5-7, 
    pick a day.  Also point out next assignment.
   
- Extra topic: average and standard deviation of a sum of random variables
 
      - Derive this for those who want to see the math.
      
- mean(x+y) = mean(x)+mean(y)
- variance(x+y) = variance(x)+variance(y)  
 (IF x,y independent)
 
   
- Consequence for binomial :
    
       -  n=1: mean = p, variance = pq = p * (1-p)
-  n=N: mean = Np, variance = sigma2= Npq = N * p * (1-p) 
 
   
- which leads to this IMPORTANT RULE OF THUMB :
    
      -  As you vary N, sigma/mean = constant / sqrt(N)
      
-  true almost whenever you're doing many trials and averaging the results
      
-  particularly noteworthy when you want to get better results
	   by doing more trials, or doing a larger survey
      
-  example: Say you do an experiment 5 times and find answer = 2.3 +- .4,
	   and the error is due to random factors.  
 Doing it more times will give you a better answer: 
	   the randomness will tend to cancel out.
 How many times must you run it to get accuracy of  +- .04 ?
 Answer: a 10 times smaller sigma means 100 times bigger N, so 
	   you'd need N=500.
 
   
- chap 7: binomial approximated as Normal
    
      - very roughly, you
	need Np > 5 (mean big enough) and Npq > 5 (variance big enough)
	for this approximation to be reasonable.
      
- procedure is 
        
           - Given N, p
           
- Find mean, sigma
           
- Given discrete raw scores, convert to continuous z-scores,
           
- Do standard normal distribution lookup (table D-4) as before
	       to get probabilities
        
 
- example: 7-15, pg 189: P(bob wins)=1/4.  Find probability that
          he wins 18 or more times out of 24 games.
	  
	     -  N=24, p=0.25, q=0.75
             
-  This implies mean=N*p=6, sigma=sqrt(N*p*q)= 2.1 
             
-  Treating discrete "18" as continuous "17.5 to 18.5", 
               "at least 18" is "continuous raw score > 17.5", which is
               z > (17.5 - 6)/2.1 = 5.48
             
-  total probability above 5.48 is off the table D-4, p < 0.00003
             
-  exact version uses binomial with P(18)+P(19)+...+P(24)
          
 
 
<& /home/footer.html, 'Statistics Fall 2003', '../Statistics.html' &>