Statistics

Spring 2016
course
navigation

April 12

where we are

So: homework 3 is due today, looking at the material in chapter 4 of the text, namely hypothesis tests & confidence intervals.
I would like to have our 2nd in class quiz on the chapter 4 stuff, and the normal distribution stuff from chapter 3, next Tuesday.
Today I would like to go over the homework, and talk about the material in 5.1 - 5.3 : difference of means, paired data, and the t-distribution. These are all, I think, only small additions to what we've already done.
I would then like to pause to catch our breath for a few days, while we (a) review for the 2nd quiz, and (b) start in on your projects. We will do as much review as you'd like on Thursday, and then discuss what you've found out so far about data and questions for your projects.
So please do bring whatever you have found out so far to class on Thursday. We can all do some brainstorming and googling to think about where to go with your term project.

go over homework 3

I've posted my version of homework 3 in this solutions folder.
Let's go over it, shall we?

textbook material from chapter 5.1 - 5.3

Finally, depending on time, let's discuss
difference of means
If we have two samples (a, b) that we want to compare, we often want to know if they are different. The typical thing is to look at mean(a)-mean(b). This new variable has sigma = sqrt(sigma_a**2 + sigma_b**2).
paired data
However, if you have two sets of data where there is a clear relation between each "a" and each "b" - such as a "before" and "after", where a[i] is the change in b[i] - then you shouldn't use the difference of means approach. Instead, subtract each a[i]-b[i] and just treat them as one data set, not two.
Student's t distribution
In the last chapter we looked at mean(sample) as a random variable, and for large N (where N is the size of the sample), said that it followed a normal distribution. Turns out that if N is small, the distribution is wider, and the mean varies more. This is related to the "if you're behind take a bigger risk" notion we came across in our gaming. The upshot is that everything works as it did before, but we use a different distribution, labeled by N. But instead of using N, we use N-1 and call it "degrees of freedom" ... go figure.
There are some graphs of it at https://en.wikipedia.org/wiki/Student's_t-distribution .
In R the t distribution is :
# https://stat.ethz.ch/R-manual/R-devel/library/stats/html/TDist.html # df = degrees of freedom # These are just like dnorm, pnorm, qnorm, rnorm, # except that the mean is zero and standard deviation one.] # So you must first turn your variable into a z-score # before using this. # As df gets larger, this turns into the normal distribution. > dt(x, df) # density t distribution (i.e. the distribution itself) > pt(z, df) # probability of a result <= z (i.e. area to left of z) > qt(p, df) # inverse of pt, finds z such that probabilility is p > rt(n, df) # generate n random samples from a t distribution
(We could use dt() to generate the same pictures as in the wikipedia article.)
We'll do an example or two. It's basically what we've been doing but with a slightly different function.
http://cs.marlboro.edu/ courses/ spring2016/statistics/ notes/ April_12
last modified Tuesday April 12 2016 8:50 am EDT