Exercises for Section 3.5 Two-sample tests#
This notebook contains the solutions to the exercises from Section 3.5 Two-sample tests in the No Bullshit Guide to Statistics.
Notebooks setup#
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Estimator functions defined in Section 3.1#
def mean(sample):
return sum(sample) / len(sample)
def var(sample):
xbar = mean(sample)
sumsqdevs = sum([(xi-xbar)**2 for xi in sample])
return sumsqdevs / (len(sample)-1)
def std(sample):
s2 = var(sample)
return np.sqrt(s2)
def dmeans(xsample, ysample):
dhat = mean(xsample) - mean(ysample)
return dhat
Exercises#
E1. Permutation test for the sleep score sin the doctors#
doctors = pd.read_csv("../datasets/doctors.csv")
scoresU = doctors[doctors["loc"]=="urb"]["score"]
scoresR = doctors[doctors["loc"]=="rur"]["score"]
# observed difference between means
dhat = dmeans(scoresR, scoresU)
dhat
6.992885375494076
from ministats import permutation_test_dmeans
pvalue = permutation_test_dmeans(scoresR, scoresU)
pvalue
0.051
# ALT. use the ttest_ind with permutations argument
from scipy.stats import ttest_ind
ttest_ind(scoresR, scoresU, permutations=10000).pvalue
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[5], line 3
1 # ALT. use the ttest_ind with permutations argument
2 from scipy.stats import ttest_ind
----> 3 ttest_ind(scoresR, scoresU, permutations=10000).pvalue
File /opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/scipy/stats/_axis_nan_policy.py:592, in _axis_nan_policy_factory.<locals>.axis_nan_policy_decorator.<locals>.axis_nan_policy_wrapper(***failed resolving arguments***)
589 res = _add_reduced_axes(res, reduced_axes, keepdims)
590 return tuple_to_result(*res)
--> 592 res = hypotest_fun_out(*samples, **kwds)
593 res = result_to_tuple(res, n_out)
594 res = _add_reduced_axes(res, reduced_axes, keepdims)
TypeError: ttest_ind() got an unexpected keyword argument 'permutations'
E2. Sleep scores using t-test#
E3. Example 6T with pooled variance#
Redo Example 6T but this time run the two-sample \(t\)-test with pooled variance.
students = pd.read_csv("../datasets/students.csv")
scoresD = students[students["curriculum"]=="debate"]["score"]
scoresL = students[students["curriculum"]=="lecture"]["score"]
res = ttest_ind(scoresD, scoresL, equal_var=True)
res.pvalue
0.10917234443214315
###