Hypothesis Tests 1

Jim Mahoney, October 2003

Overview

This is a summary of the material in chapters 8, 9, and 10 in our Understanding Statistics text, which describe the basic idea behind hypothesis tests as well as several specific tests based directly on the binomal, the normal distribution, and the z - scorethat we've done in earlier chapters.

What's coming up next are several other more sophisticated tests, the Student's t-test, the Chi Square test, and the Anova test.  Each is appropriate for slightly different circumstances.  But first, the three situations we want to look at are
    one sample tests of percentages (chap 8)
    two sample tests of percentages (chap 9)
    comparing means of large samples (chap 10)

The appropriate formulas in each case come from the same underlying principles, which I'm going to try to summarize here in one place.  

The overall procedure for all these hypothesis tests is roughly the same, as described in the text near the end of chapter 8.  Typically you come up with a null hypothesis H_0which fits the model of one of of the tests, with a corresponding motivated hypothesis H_αwhich you are trying to show.  You then decide upon the data you'll collect, and a critical value of some parameter based on a chosen level of significance α.  The various kinds of tests have different formulas for these critical values, though the underlying idea is always based on a probability distrubtion.  Then you collect your data, compare your results with the critical value, and either reject the null hypothesis or not.  As discussed in the text and in class, there are two ways to come up with the wrong answer depending on whether H_0is really true or not; the probabilities of these Type-I and Type-II errors are named α and β.

Formulas

mean and standard deviation

Any random variable x with a probability distribution P_x has a mean μ and standard deviation σ given by

μ_x = Σ x P_x = 〈x〉 σ_x = 〈 (x - μ_x)^2〉^(1/2) = ((Σ x^2 P_x) - μ_x^2)^(1/2)

We can add or subtract two such random variables to create a third, and express the new mean and standard deviation in terms of the old ones.

y = x_1 ± x_2 μ_y = μ_x_1 ± μ_x_2 σ_y =  (σ_x_1^2 + σ_x_2^2)^(1/2)

These results assume that the two random variables x_1 and x_2 are independent.  They follow directly from the definitions of mean and standard deviations and the fact that P_y = P_x_1P_x_2.

binomial

A binomial random variable x has possible values 1 (success) and 0 (failure).  
We call the probability of success p and the probability of failure q = 1 - p.
In N trials the number of successes is

μ_N = N p σ_N = (N p q)^(1/2)

binomial with N=1

Using a subscript "1" to emphasise that we're talking about one trial (N = 1), we have

μ_1 = p  σ_1 = (p q)^(1/2)

percentage

Here's where it can get a bit confusing.  
Often we would prefer to use the percentage (or fraction) of successes rather than the actual number of successes.  All we need do to get this percentage is divide by the number of trials N.  And since this is just a change of scale, the same division by N applies to the mean and standard deviation of this percentage.

μ_p = μ/N = p = μ_1 σ_p = σ_N/N = (p q)/N^(1/2) = σ_1/N^(1/2)

difference of two percentages

Now we just apply the results from above.  If we do two surveys, with numbers of trials N_1 and N_2 respectively, and find FormBox[Cell[TextData[{Cell[BoxData[S ]],  and , Cell[BoxData[S ]],  successes, then the mean  ... s like this.}]], TraditionalForm]                                      1                         2

p_1 = S_1 / N_1   ;       p_2 = S_2/N_2 dp = p_1 - p_2 σ_dp = (σ_p_1^2 + σ_p_2^2)^(1/2) = ((p_1 q_1)/N_1 + (p_2 q_2)/N_2)^(1/2)

standard deviation of the estimate of the mean

If we use a sample size of N to find m, which estimates the true population mean, then a different sample will give a different m.  Thus m is also a random variable.  The mean value of m is μ, the true population mean, and the standard deviation is again smaller by the inverse of the square root of N, for the same reasons.  (However, when N is small, and we don't know the true population standard deviation, then its better to use the Student's t distribution than this σ and the normal distribution.)

μ_m = μ σ_m = σ_1/N^(1/2)

standard deviation of the difference of means

Likewise, if we have two populations whose means we want to compare,  the difference of their m ' s estimated from samples of size N_1 and N_2follows the same kind of logic.

μ_dm = μ_1 - μ_2 s_dm = (s_m_1^2 + s_m_2^2)^(1/2) = (s_1^2/N_1 + s_2^2/N_2)^(1/2)

z-score

For all of these, we can find probabilities by looking up the appropriate values for a normal distribution.  To do this from the table in appendix D-4, we convert to a standard normal distribution z, in which μ_z = 0 and σ_z = 1.  Remember that this is only a reasonable approximation to the binomial if μ > 5 and σ^2> 5.

x = μ + z σ z = (x - μ)/σ

in class 10/28


Created by Mathematica  (October 28, 2003)