Feb 18 - inference and hypothesis testing

In [6]:
from scratch.statistics import mean, standard_deviation
from random import randint
import urllib.request

The null hypothesis : the coin is fair. Let's simulate a fair coin and see how many head or tails are likely in 100 flips.

In [5]:
count = 1000
simulations = [sum(randint(0,1) for i in range(100)) for _ in range(count)]
print(mean(simulations))
print(standard_deviation(simulations))
49.913
4.9988418078010195

So if we flip 100 coins, we would expect on average 50, with a standard deviation of 5.

For a normal distribution, 95% of the values are within 2 standard deviations.

So with a p-value of 5% , we should get between 50 - 25 to 50 + 25, or between 40 and 60.

If we get outside that range, we reject the null hypothesis.

In [25]:
url='https://cs.marlboro.college/tools/coins/c.cgi'
flips = urllib.request.urlopen(url).read()
flips = flips.decode().strip().split(' ')
len( [f for f in flips if f == 'h'] )
Out[25]:
17

For this "c.cgi" coin flip engine, 17 is not between 40 and 60, so we reject the null hypothesis and conclude that this is an unfair coin.