Exercises for Section 3.5 Two-sample tests#
This notebook contains the solutions to the exercises from Section 3.5 Two-sample tests in the No Bullshit Guide to Statistics.
Notebooks setup#
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Estimator functions defined in Section 3.1#
def mean(sample):
return sum(sample) / len(sample)
def var(sample):
xbar = mean(sample)
sumsqdevs = sum([(xi-xbar)**2 for xi in sample])
return sumsqdevs / (len(sample)-1)
def std(sample):
s2 = var(sample)
return np.sqrt(s2)
def dmeans(xsample, ysample):
dhat = mean(xsample) - mean(ysample)
return dhat
Exercises#
E1. Permutation test for the sleep score sin the doctors#
doctors = pd.read_csv("../datasets/doctors.csv")
scoresU = doctors[doctors["loc"]=="urb"]["score"]
scoresR = doctors[doctors["loc"]=="rur"]["score"]
# observed difference between means
dhat = dmeans(scoresR, scoresU)
dhat
6.992885375494076
from ministats import permutation_test_dmeans
pvalue = permutation_test_dmeans(scoresR, scoresU)
pvalue
0.0539
# ALT. use the ttest_ind with permutations argument
from scipy.stats import ttest_ind
ttest_ind(scoresR, scoresU, permutations=10000).pvalue
0.045795420457954206
E2. Sleep scores using t-test#
E3. Example 6T with pooled variance#
Redo Example 6T but this time run the two-sample \(t\)-test with pooled variance.
students = pd.read_csv("../datasets/students.csv")
scoresD = students[students["curriculum"]=="debate"]["score"]
scoresL = students[students["curriculum"]=="lecture"]["score"]
res = ttest_ind(scoresD, scoresL, equal_var=True)
res.pvalue
0.10917234443214315
###