Permutation tests for comparing two groups#

Notebook setup#

# Install stats library
%pip install --quiet ministats 
Note: you may need to restart the kernel to use updated packages.
# Figures setup
import matplotlib.pyplot as plt
import seaborn as sns

plt.clf()  # needed otherwise `sns.set_theme` doesn't work
sns.set_theme(
    style="whitegrid",
    rc={'figure.figsize': (7, 2)},
)
# High-resolution figures please
%config InlineBackend.figure_format = 'retina'

def savefig(fig, filename):
    fig.tight_layout()
    fig.savefig(filename, dpi=300, bbox_inches="tight", pad_inches=0)
<Figure size 640x480 with 0 Axes>

Introduction#

Permutation test#

TODO: import from blog post and slides

Suppose we have a obtained samples from group of students who took the smart drug treated, and a similar group who didn’t take the smart drug controls.

# data

treated = [92.69, 117.15, 124.79, 100.57, 104.27, 121.56, 104.18,
           122.43, 98.85, 104.26, 118.56, 138.98, 101.33, 118.57, 
           123.37, 105.9, 121.75, 123.26, 118.58, 80.03, 121.15, 
           122.06, 112.31, 108.67, 75.44, 110.27, 115.25, 125.57,
           114.57, 98.09, 91.15, 112.52, 100.12, 115.2, 95.32, 
           121.37, 100.09, 113.8, 101.73, 124.9, 87.83, 106.22, 
           99.97, 107.51, 83.99, 98.03, 71.91, 109.99, 90.83, 105.48]

controls = [85.1, 84.05, 90.43, 115.92, 97.64, 116.41, 68.88, 110.51, 
            125.12, 94.04, 134.86, 85.0, 91.61, 69.95, 94.51, 81.16, 
            130.61, 108.93, 123.38, 127.69, 83.36, 76.97, 124.87, 86.36, 
            105.71, 93.01, 101.58, 93.58, 106.51, 91.67, 112.93, 88.74, 
            114.05, 80.32, 92.91, 85.34, 104.01, 91.47, 109.2, 104.04, 
            86.1, 91.52, 98.5, 94.62, 101.27, 107.41, 100.68, 114.94, 
            88.8, 121.8]

To compare the two groups, we’ll subtract the average score computed from each group.

def mean(sample):
    return sum(sample) / len(sample)

def dmeans(xsample, ysample):
    dhat = mean(xsample) - mean(ysample)
    return dhat

# Calculate the observed difference between means
dscore = dmeans(treated, controls)
dscore
7.886999999999972

Statistical question?#

Are the two groups the same? This is equivalent to saying the smart drug had no effect.

Disproving the skeptical colleague#

We’ll now use the 10000 permutations of the original data to obtain sampling distribution of the difference between means under the null hypothesis.

import numpy as np
np.random.seed(43)


pdhats = []
for i in range(0, 10000):
    all_iqs = np.concatenate((treated, controls))
    pall_iqs = np.random.permutation(all_iqs)
    ptreated = pall_iqs[0:len(treated)]
    pcontrols = pall_iqs[len(treated):]
    pdhat = dmeans(ptreated, pcontrols)
    pdhats.append(pdhat)

Compute the p-value of the observed difference between means dprice under the null hypothesis.

tails = [d for d in pdhats if abs(d) > dscore]
pvalue = len(tails) / len(pdhats)
pvalue
0.0101
bins = np.arange(-11, 11, 0.3)

# plot the sampling distribution in blue
ax = sns.histplot(pdhats, bins=bins)

# plot red line for the observed statistic
plt.axvline(dscore, color="red")

# plot the values that are equal or more extreme in red
sns.histplot(tails, ax=ax, bins=bins, color="red")
ax.set_xlabel("$d$")
ax.set_ylabel("$f_{D}$")

savefig(plt.gcf(), "figures/pvalue_viz_permutation_test_iqs.png")

Alternative using formula#

from ministats import ttest_dmeans
ttest_dmeans(treated, controls)
0.01016361165213775

Conclusion#