March 29

Welcome back!

asides

old business : homework

It's been several weeks, so let's remember where we are :

Started with looking at data visualizations & plots.
Then learned some of the fundamentals of probability.
Just before break talked about the "normal distribution"

... and I asked you to do two exercises on it from chapter 3.

I've posted homework solutions - let's go over them.

for the good of the order : projects

It's time to start thinking about a term project.

For next Tuesday, I want you each to propose some sort of statistical project to work on over the next month, and present to the class at the end of April. So start thinking about what you would like to investigate.

I'll have more to say about this on Thursday - I'm still working up more specific guidelines.

new business : chap 4 - hypothesis tests, p-values, and all that

This week and probably next we'll be looking at the material in chapter 4, which covers hypothesis testing.

Please start reading sections 4.1 through 4.3.

The ideas here are central to the notion of statistical testing, which is much of what doing experimental science is all about.

 4.1 "point estimates" - sampling to estimate a value (e.g. mean)
 4.2 "confidence intervals" - estimating the range of a point estimate
 4.3 "hypothesis testing" - p-values and all that (often confusing)

I'll give an overview of some of this material today, using the slides from the website to get you primed for the reading.

we won't get through all this ... but it's where we're headed

First, confidence intervals.

(1) We can estimate a population parameter (e.g. mean) by sampling.

(2) If we use a sample of size N to estimate the population mean, its uncertainty is standard_error = \( \frac{\sigma}{\sqrt{N}} \) . (This is not obvious. The book doesn't explain it. I may try to justify it - we'll see.)

(3) The estimate of the mean will be normally distributed for large N, even if the original distribution isn't normal. (!)

(4) We can use that to give a "confidence interval" of what the population mean probably is, at (say) the 95% confidence level.

Second, hypothesis testing.

This is a general approach which in this chapter we apply to the confidence interval stuff. In later chapters we develop other tests using this same framework.

 H0 = null hypothesis (No result: data is due to randomness.)
 HA = alternative explanation (Typically what we want to show.)
 
 alpha   = significance level = 1 - confidence (from confidence level)
         = chosen threshold of "yes we have a non-random result", fixed before experiment
 
 p-value = probability of measured experimental result, if H0 (null hypothesis) is true.
         = 0.05 (typical) ... source of much confusion and error in science

Then the possibilities are given in a 2x2 table :

 reality / decision    do_not_reject_H0                   reject_H0 
                   ----------------------------------------------------------------------------------
       H0          |   OK : no effect                     type 1 error (find effect incorrectly)
                   |
   not H0          |  type 2 error (miss real effect)     OK : find real effect

Are we having fun yet?

http://cs.marlboro.edu/ courses/ spring2016/statistics/ notes/ March_29
last modified Monday March 28 2016 11:21 pm EDT

Statistics