March 29
Welcome back!
asides
old business : homework
It's been several weeks, so let's remember where we are :
- Started with looking at data visualizations & plots.
- Then learned some of the fundamentals of probability.
- Just before break talked about the "normal distribution"
... and I asked you to do two exercises on it from chapter 3.
for the good of the order : projects
It's time to start thinking about a term project.
For next Tuesday, I want you each to propose some
sort of statistical project to work on over the
next month, and present to the class at the end
of April. So start thinking about what you would
like to investigate.
I'll have more to say about this on Thursday -
I'm still working up more specific guidelines.
new business : chap 4 - hypothesis tests, p-values, and all that
This week and probably next we'll be looking at the
material in chapter 4, which covers hypothesis testing.
Please start reading sections 4.1 through 4.3.
The ideas here are central to the notion of
statistical testing, which is much of what doing
experimental science is all about.
4.1 "point estimates" - sampling to estimate a value (e.g. mean)
4.2 "confidence intervals" - estimating the range of a point estimate
4.3 "hypothesis testing" - p-values and all that (often confusing)
I'll give an overview of some of this material today, using
the
slides from the website to get you primed for the reading.
we won't get through all this ... but it's where we're headed
First, confidence intervals.
(1) We can estimate a population parameter (e.g. mean) by sampling.
(2) If we use a sample of size N to estimate the population mean, its uncertainty is standard_error = \( \frac{\sigma}{\sqrt{N}} \) . (This is not obvious. The book doesn't explain it. I may try to justify it - we'll see.)
(3) The estimate of the mean will be normally distributed for large N, even if the original distribution isn't normal. (!)
(4) We can use that to give a "confidence interval" of what the population mean probably is, at (say) the 95% confidence level.
Second, hypothesis testing.
This is a general approach which in this chapter
we apply to the confidence interval stuff. In later chapters we develop
other tests using this same framework.
H0 = null hypothesis (No result: data is due to randomness.)
HA = alternative explanation (Typically what we want to show.)
alpha = significance level = 1 - confidence (from confidence level)
= chosen threshold of "yes we have a non-random result", fixed before experiment
p-value = probability of measured experimental result, if H0 (null hypothesis) is true.
= 0.05 (typical) ... source of much confusion and error in science
Then the possibilities are given in a 2x2 table :
reality / decision do_not_reject_H0 reject_H0
----------------------------------------------------------------------------------
H0 | OK : no effect type 1 error (find effect incorrectly)
|
not H0 | type 2 error (miss real effect) OK : find real effect
Are we having fun yet?