Exercises for Section 3.5 Two-sample tests#

This notebook contains the solutions to the exercises from Section 3.5 Two-sample tests in the No Bullshit Guide to Statistics.

Notebooks setup#

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Estimator functions defined in Section 3.1#

def mean(sample):
    return sum(sample) / len(sample)

def var(sample):
    xbar = mean(sample)
    sumsqdevs = sum([(xi-xbar)**2 for xi in sample])
    return sumsqdevs / (len(sample)-1)

def std(sample):
    s2 = var(sample)
    return np.sqrt(s2)

def dmeans(xsample, ysample):
    dhat = mean(xsample) - mean(ysample)
    return dhat

Exercises#

E1. Permutation test for the sleep score sin the doctors#

doctors = pd.read_csv("../datasets/doctors.csv")
scoresU = doctors[doctors["loc"]=="urb"]["score"]
scoresR = doctors[doctors["loc"]=="rur"]["score"]

# observed difference between means
dhat = dmeans(scoresR, scoresU)
dhat
6.992885375494076
from ministats import permutation_test_dmeans
pvalue = permutation_test_dmeans(scoresR, scoresU)
pvalue
0.0539
# ALT. use the ttest_ind with permutations argument
from scipy.stats import ttest_ind
ttest_ind(scoresR, scoresU, permutations=10000).pvalue
0.045795420457954206

E2. Sleep scores using t-test#

E3. Example 6T with pooled variance#

Redo Example 6T but this time run the two-sample \(t\)-test with pooled variance.

students = pd.read_csv("../datasets/students.csv")
scoresD = students[students["curriculum"]=="debate"]["score"]
scoresL = students[students["curriculum"]=="lecture"]["score"]
res = ttest_ind(scoresD, scoresL, equal_var=True)
res.pvalue
0.10917234443214315
###