Feb 4
asides
Iowa caucuses and coin flips
Is truncating the Y-axis misleading?
context
Where we are:
- working through chap 1: data, graphs & visualiztion, background, getting some practice with R, mean & std dev
Today we'll look at the second half of chapter 1, more math-y and specific.
Questions about anything so far?
mean and standard deviation
Discuss and define. (Math on the whiteboard.)
These are crucial ideas for the statistical
tests that come later in the course.
You should (a) have an intuition for what they are,
(b) be able to compute them by hand, and (c)
be able to use software like R to compute them for you.
In addition to the basic formulas,
I will also mention the "frequency" approach:
scores = c(95, 95, 90, 90, 90)
mean = (95 + 95 + 90 + 90 + 90)/5 = sum/how_many
But we can also write this as
mean = (2*95 + 3*90)/(2 + 3)
where the 2 and the 3 count how many times we got 95 and 90.
In fact, if we list all the scores from (say) 90 to 95
and call f(t) the "frequency" or count of how many
test scores equal to t there were, we would have
t f(t)
--------
90 3
91 0
93 0
94 0
95 2
and we could write
mean = (3*90 + 0*91 + 0*92 + 0*93 + 0*94 + 0*94 + 2*95)/(3+0+0+0+0+0+2)
= sum(t * f(t)) / sum(f(t)) for all values of t
Well, you might say that this is much more complicated, so why bother
doing it this way?
The answer is that for many situations it's more convenient
and intuitive. One example is the average position (i.e. center of mass)
of a physical object that has a density that varies from place to place.
But for this class, the simpler "add-em-up-and-divide-by-how-many"
should work fine.
I will also discuss the difference between two different formulas
for standard deviation which differ by a factor of sqrt(n/(n-1))
and why there are two versions, sometimes called "sample standard
deviation" \( s \) vs the "population standard deviation" \( \sigma \).
R and making graphs
Work through some of the "Intro to Data lab"
The graphs can be done in either of two way:
using R's "standard graphics" (which is what
the lab describes) or using ggplot (which
are nice looking and perhaps simpler).
There are lots of example at
which is listed on the "outside resources" page.
Also see
Coming up next: some homework and practice.