Statistics

Spring 2016
course
navigation

April 19

aside

schedule for the rest of the the term

If we can, I would like to cover:
... but there isn't much time left in the term. We'll see.
Tues Apr 19 quiz review Also chapter 5 stuff if time allows : paired data, t distribution Thu Apr 21 quiz 2 Finish chap 5 discussion Tue Apr 26 project check in ; ANOVA & Chi squared Thu Apr 28 linear regression & term review Tue May 3 share projects - writeups due Fri May 6
Final exam : take home exam, emailed to you noon Sun May 8, due noon Mon May 9.

quiz 2 topics

1. Normal distribution :
2. Estimating mean of population from a sample of size N
3. Hypothesis testing :
The R functions we care about are
  1. qnorm(p) = z, inverse function from pnorm (which z has prob p of x <= z )
R recipe given some data values :
Say you are trying to see if some population mean is larger than 4 (i.e. a one sided test).
> data = c(2, 3, 4, 10, 9, 7, 6, 5, 2, 3, 3, 7) # sample > H0 = 4 # null hypothesis > sigma = 0.05 # do a one sided > result = mean(data) # sample mean = 5.08 = estimate of population mean > N = length(data) # number of data points (12) > error = sd(data)/sqrt(N) # standard error = 0.78 = estimate of sigma of sample mean > z = (result - H0)/error # z score, i.e. how far result is from H0 = 1.38 > pvalue = 1 - pnorm(z) # one sided "greater than" pvalue = 0.0832
Here we fail to reject the null hypothesis since 0.08 is not smaller than 0.05. That is, while this result is begger than 4, this result is not unlikely enough to rule out the null hypothesis and random chance.

alcohol.csv

(I'm not going to put anything this complex on the exam.)
Here's the R recipe that works for me :
data = read.csv("alcohol.csv") # data frame yes = subset(data, data$alcohol == "yes"); # drunk people data frame no = subset(data, data$alcohol == "no"); # sober people data frame mem_yes = yes$memory # drunk memory sample values mem_no = no$memory # sober mem sample values result = mean(mem_yes) - mean(mem_no) # observed result = -1.5 H0 = 0.0 # null hypothesis error_yes = sd( mem_yes )/sqrt( length(mem_yes) ) error_no = sd( mem_yes )/sqrt( length(mem_yes) ) # combine to get overall error by formula for diff of two random variables : error = sqrt( error_yes**2 + error_no**2) # result standard error z = ( result - H0 ) / error # = - 0.6 pvalue = pnorm(z) # = 0.26
We cannot reject the null hypothesis, since we don't have pvalue < 0.05

chap 5 complications

If time allows :
http://cs.marlboro.edu/ courses/ spring2016/statistics/ notes/ April_19
last modified Tuesday April 19 2016 8:38 am EDT