April 7
where we are
Today :
- discuss projects
- discuss p-values and their use in science, and issues with experimental design
- differences of means : combining standard error
- more examples of hypothesis tests, for practice
For next week :
- 3rd homework will be posted today or tomorrow - I'll send an email.
- read about "Student t" in next chapter. We'll do more practice next week.
- google things like "open data sources" and explore what's available for your project.
project discussion
articles & reading from last time
Talk about briefly ... experimental design and p-values
- DO : 1) design experimental (H0, sigma, ...); 2) analyze data & conclude something
- DON'T : 1) get data, 2) look at data, 3) then design experiment, then 4) do analysis
But what is wrong with looking at the data? Answer: you *should* look at some data. But you *should not* do a numerical experiment on *that* data. Otherwise, strong risk of cherry picking the question.
Instead: if you get data from an outside source, divide into parts. Stare at one part, make plots, form conjectures. Then design experiment, and preform experiment on a *different part*.
Example of problem :
- Pick 10 cards from deck. Notice that there are (say) more spades. Then form hypothesis "this deck has more spades." Then do hypothesis test on those same 10 cards ... OOPS.
- Instead: pick 10 cards. Form hypothesis about deck. The use a different set of 10 cards to confirm or reject hypothesis.
The "dark chocolate and weight loss" paper (which was doing bad science on purpose) gives an example of the problem.
The p-value is an important idea ... but not the only idea, and not the only check
on whether the conclusion is valid.
practice
Textbook :
R practice with the attached data for this question :
Does drinking affect memory?
The data gives experimental subjects, one line per subject,
who have been given a memory test after being given a glass
of fluid with or without alcohol.
The task is to set up a hypothesis test and see what conclusion
can be drawn from the data.
(The memory.R source file is what I used to generate the data
and do some visualization and analysis. The comments describe
what it is and how it can be used. The functions illustrate
some data manipulation and graphing with R.)