smoke
female male female male sigma of p1-p2
yes no no 30 30 =SQRT(F9^2 +G9^2) 0.09858293
no no yes 14 11
no yes p 0.31818182 0.26829268
yes no N 44 41
no yes sigma_p 0.07021754 0.06919603 two tail cutoff
no no =NORMINV(0.995, 0, 0.1) 0.25758313
no no
no no

Is there a difference in male / female smoking habits?

The data from our suvey are shown at the left; the summary is above.
(I used the =countif(range,text) Excel function to do the counting.
Or you could sort the entries and subtract column numbers.)

Procedure: difference of percentages.
p1 = fraction of males who smoke
p2 = fraction of females who smoke

H0: the percentage of smokers is the same for males and females.
p1 = p2

Halpha: the percentages are different. (two tail)

Standard deviation of percentage is sqrt(p*(1-p)/N)
which gives sigma_p1 = 0.07521, sigma_p2 = 0.07526.

Standard deviation of the difference is sqrt( s1^2 + s2^2 ) = 0.10

Choosing significance level alpha=0.01, our two-tail cutoff is
given by NORMINV(0.995, 0, 0.10) = 0.26. In other words,
a normal variable with mean=0 and sigma=0.1 is in the
range (-0.26 < x < 0.26) 99% of the time.

Our decision rule is to reject the null hypothesis
if p1-p2 is >0.26 or p1-p2 is < -0.26.

The result is that p1-p2 = 0.32 - 0.27 = 0.05,
and we fail to reject the null hypothesis.

Therefore from this data we can *not* conclude that
there is a difference in the smoking habits
of male and female Marlboro students.
(At least, not at the 99% signficance level.)

The confidence intervals give the same intuition;
at 2sigma (95%), p1=0.32+-0.14; p2=0.27+-0.14.
These ranges overlap, and so the two don't look all that different.
no yes
yes no
no no
no no
yes no
no yes
no no
no yes
no no
no yes
no no
no no
yes no
no yes
yes no
no yes
no no
no no
yes no
no yes
yes no
no yes
no yes
yes no
yes no
no no
no no
no no
no no
yes no
yes no
no no
yes no
no
no
yes