Exercises for Section 3.5 Two-sample tests#

This notebook contains the solutions to the exercises from Section 3.5 Two-sample tests in the No Bullshit Guide to Statistics.

Notebooks setup#

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Estimator functions defined in Section 3.1#

def mean(sample):
    return sum(sample) / len(sample)

def var(sample):
    xbar = mean(sample)
    sumsqdevs = sum([(xi-xbar)**2 for xi in sample])
    return sumsqdevs / (len(sample)-1)

def std(sample):
    s2 = var(sample)
    return np.sqrt(s2)

def dmeans(xsample, ysample):
    dhat = mean(xsample) - mean(ysample)
    return dhat

Exercises#

E1. Permutation test for the sleep score sin the doctors#

doctors = pd.read_csv("../datasets/doctors.csv")
scoresU = doctors[doctors["loc"]=="urb"]["score"]
scoresR = doctors[doctors["loc"]=="rur"]["score"]

# observed difference between means
dhat = dmeans(scoresR, scoresU)
dhat
6.992885375494076
from ministats import permutation_test_dmeans
pvalue = permutation_test_dmeans(scoresR, scoresU)
pvalue
0.051
# ALT. use the ttest_ind with permutations argument
from scipy.stats import ttest_ind
ttest_ind(scoresR, scoresU, permutations=10000).pvalue
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[5], line 3
      1 # ALT. use the ttest_ind with permutations argument
      2 from scipy.stats import ttest_ind
----> 3 ttest_ind(scoresR, scoresU, permutations=10000).pvalue

File /opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/scipy/stats/_axis_nan_policy.py:592, in _axis_nan_policy_factory.<locals>.axis_nan_policy_decorator.<locals>.axis_nan_policy_wrapper(***failed resolving arguments***)
    589     res = _add_reduced_axes(res, reduced_axes, keepdims)
    590     return tuple_to_result(*res)
--> 592 res = hypotest_fun_out(*samples, **kwds)
    593 res = result_to_tuple(res, n_out)
    594 res = _add_reduced_axes(res, reduced_axes, keepdims)

TypeError: ttest_ind() got an unexpected keyword argument 'permutations'

E2. Sleep scores using t-test#

E3. Example 6T with pooled variance#

Redo Example 6T but this time run the two-sample \(t\)-test with pooled variance.

students = pd.read_csv("../datasets/students.csv")
scoresD = students[students["curriculum"]=="debate"]["score"]
scoresL = students[students["curriculum"]=="lecture"]["score"]
res = ttest_ind(scoresD, scoresL, equal_var=True)
res.pvalue
0.10917234443214315
###