Feb 2 - data & visualization
Context: where we are & where we're going.
Questions about anything?
visualization
First like to have an open discussion of the assigned
readings and first part of chapter 1,
on data and visualizing it. Not so much the nitty-gritty
of how we make the plots, but what they mean and what
makes a good or bad one.
(I've added some additional sources to the "outside resources" page, including some video
links that I may assign in the future.)
Here's a graphic I particularly like (though it will take some explaining) :
And I guess appropriate for today is the Washington Post's
data
Start going over the first half of chapter 1 material. (Some of
this may feel like vocabulary more than substance,
but it is good background to have.)
1. data & variables
- stents and stroke : "summary statistic", "statistical significance"
- data types:
- numbers (continuous, discrete) vs categories (ordered, unordered)
- also "id" (a marker to describe it) vs "value" (what is measured) - more on this later, in the R analysis
- relationships between variables :
- correlation (trend in x-y scatter graph) - *not* causation
- several mostly-the-same vocabularies : "indenpedent" , "associated"
2. data collection
- anecdotes (i.e. *not* data , in spite of this SMBC comic)
- population & sampling
- observation vs experiment
- experimental design practice & vocabulary : "blind", "random", "placebo", "replicate", "control"
And on the topic of data and gathering it ...
what's coming next
If we get through all that, then on Thursday we'll continue with the second half of chapter 1 in the text :
- some popular plots
- scatterplot (is there a relation between these?)
- histogram (how do these things compare?)
- box-and-whiskers (is one of these different than the others?)
- descriptive statistics
- median
- mean
- standard deviation