Statistics Sampler

Jim Mahoney, September 2003

•Setup

Mathematica Basics

•Looking around

•Spreadsheet tables

To enter a table from the keyboard, use control-comma to add another column and control-return to add another row.  The grouping symbols at the left and right sides are just parenthesis which grow as needed.  Use tab to move to the next entry and control-space to move the cursor out of the table.  Mathematica stores this data as a list of lists.  Non-numeric data must be enclosed by double quotes.

data = ( "height"     "weight"     "age"        "eyecolor&q ... t;           72.                    190.                   19                     "blue"

{{height, weight, age, eyecolor}, {70.`, 150.`, 22, blue}, {72.`, 200.`, 30, blue}, {60.`, 160.`, 20, green}, {65.`, 220.`, 21, brown}, {72.`, 190.`, 19, blue}}

We can save this table to a .csv (comma separated values).  Like most computer applications, Mathematica has the notion of a "current directory" where files are put.  Use SetDirectory["name"] to move around.
To read this data in we'd type data = Import[" sampleData . csv "].

Directory[]

/Users/mahoney

Export["sampleData.csv", data]

sampleData.csv

Here's how we'd extract some values into a list.

heights = DropNonNumeric[Column[data, 1]] ; weights = DropNonNumeric[Column[data, 2]] ; ages = DropNonNumeric[Column[data, 3]] ; colors = Rest[Column[data, 4]]

{blue, blue, green, brown, blue}

There are also many ways to make pictures from the data.  Here are a couple simple examples.

Frequencies[colors]

{{3, blue}, {1, brown}, {1, green}}

PieChart[Frequencies[colors]]

[Graphics:HTMLFiles/StatisticsSampler_42.gif]

-Graphics -

This next one shows how options are added.  The arrow is typed as ->.

BarChart[heights, weights,  BarOrientation -> Horizontal, Frame -> True]

[Graphics:HTMLFiles/StatisticsSampler_45.gif]

-Graphics -

And we can evaluate various statistical formulas.

Mean[heights]

67.8`

Mathematica has verbose and not always obvious names for the formulas in the text.  The one for s, which estimates a population standard deviation from a given sample is called StandardDeviation.  The actual formula that this calculates is

s = (nΣ(x _ i^2) - (Σx _ i)^2)/n(n - 1)^(1/2)

s = StandardDeviation[heights]

5.215361924162119`

If on the other had we want the pure standard deviation of this list, treating it as the entire population, then the formula is slightly different.  

σ = (nΣ(x _ i^2) - (Σx _ i)^2)/n^2^(1/2)

σ = StandardDeviationMLE[heights]

4.664761515876241`

Note that to type the symbol σ (sigma) from the keyboard,type use the escape key to type esc-s-esc or esc-sigma-esc.Likewise the symbol Σ (capital Sigma) is typed as esc-S-esc or esc-Sigma-esc.

Here's a summary of a variety of the descriptive statistics.

LocationReport[heights]

{Mean -> 67.8`, HarmonicMean -> 67.46293245469523`, Median -> 70.`}

DispersionReport[heights]

{Variance -> 27.200000000000003`, StandardDeviation -> 5.215361924162119`, SampleRange -> 12.`, MeanDeviation -> 4.24`, MedianDeviation -> 2.`, QuartileDeviation -> 4.125`}

One quirk of Mathematica is that it doesn't convert to decimal numbers unless necessary.  The ages in the table above were given as integer values without a decimal point, and so the Mean formula gives this result.

Mean[ages]

112/5

To see a decimal value for an expression like this, append //N (which means Number) to the end.

Mean[ages] // N

22.4`


Converted by Mathematica  (September 11, 2003)