Exercises for Section 3.2 Confidence intervals#

This notebook contains the solutions to the exercises from Section 3.2 Confidence intervals in the No Bullshit Guide to Statistics.

Notebooks setup#

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Estimator functions defined in Section 3.1#

def mean(sample):
    return sum(sample) / len(sample)

def var(sample):
    xbar = mean(sample)
    sumsqdevs = sum([(xi-xbar)**2 for xi in sample])
    return sumsqdevs / (len(sample)-1)

def std(sample):
    s2 = var(sample)
    return np.sqrt(s2)

def dmeans(xsample, ysample):
    dhat = mean(xsample) - mean(ysample)
    return dhat

Exercises#

Exercise 3.17#

Compute a confidence 90% confidence interval for the population mean based on the sample from Batch 04 of the kombucha dataset.

kombucha = pd.read_csv("datasets/kombucha.csv")
ksample04 = kombucha[kombucha["batch"]==4]["volume"]

a) analytical approximation#

from scipy.stats import t as tdist

n04 = ... 
kbar04 = ...
seKbar04 = ...

t_l = ...
t_u = ...

# construct confidence interval
# [... + t_l*...,  ... + t_u*...]

b) bootstrap estimation#

from ministats import gen_boot_dist

# obtain bootstrap sampling distribution 
kbars04_boot = ...

# construct confidence interval
# [np.percentile(...), np.percentile(...)]

Exercise 3.18#

Calculate a confidence 90% confidence interval for the for population variance based on the sample from Batch 05 of the kombucha dataset.

kombucha = pd.read_csv("datasets/kombucha.csv")
ksample05 = ... # load sample from Batch 5

a) analytical approximation#

n05 = ...
kvar05 = ...

from scipy.stats import chi2
x2_l = ...
x2_u = ...

# construct confidence interval
...
Ellipsis

b) bootstrap estimation#

kvars05_boot = ...

# construct confidence interval
...
Ellipsis

Exercise 3.19#

Compute a 95% confidence interval for the difference between rural and city sleep scores in the doctors dataset. a) Use analytical approximation formula in terms Student’s \(t\)-distribution. b) Use bootstrap estimation.

Hint: Use the code doctors[doctors["loc"]=="rur"] to select the subset of the doctors working in a rural location.

doctors = pd.read_csv("datasets/doctors.csv")
scoresR = doctors[doctors["loc"]=="rur"]["score"]
scoresU = doctors[doctors["loc"]=="urb"]["score"]

# observed difference between scores
dscores = ...

a) analytical approximation#

# obtain the sample sizes and stds of the two groups
nR, stdR = ..., ...
nU, stdU = ..., ...

# standard error of the difference between group means
seDscores = ...

# calculate the degrees of freedom
from ministats import calcdf
# df = ...

# Student's t-distribution with df degrees of freedom
from scipy.stats import t as tdist
t_l = ...
t_u = ...

# construct confidence interval
...
Ellipsis

b) bootstrap estimation#

# compute bootstrap estimates for mean in each group
np.random.seed(43)
meanR_boot = ...
meanU_boot = ...

# compute the difference between means of bootstrap samples
dmeans_boot = ...

# construct confidence interval
...
Ellipsis

Exercise 3.20#

Calculate a 80% confidence interval for the difference between debate and lecture groups the students dataset.

Hint: Use the code student[student["curriculum"]=="debate"] to select the subset of the students who had the debate curriculum.

students = pd.read_csv("datasets/students.csv")
scoresD = ... # select student scores for students with curriculum = debate
scoresL = ... # select student scores for students with curriculum = lecture

# observed difference between scores
dhat = ...

a) analytical approximation#

# obtain the sample sizes and stds of the two groups
nD, stdD = ..., ...
nL, stdL = ..., ...

# standard error of the difference between group means
seDscores = ...

# calculate the degrees of freedom
from ministats import calcdf
df = ...

# Student's t-distribution with df degrees of freedom
from scipy.stats import t as tdist
t_l = ...
t_u = ...

# construct confidence interval
...
Ellipsis

b) bootstrap estimation#

np.random.seed(42)
meanD_boot = ...
meanL_boot = ...

# compute the difference between means of bootstrap samples
dmeans_boot = ...

# construct confidence interval
...
Ellipsis

Exercise 3.21#

As part of a lab experiment, sixty-four two-week old rats were given a vitamin D supplement for a period of one month, and their weights were recored at the end of the month (30 days). The sample mean was \(89.60\) ;g with standard deviation \(12.96\) ;g. Calculate a 95%confidence interval for the mean weight for rats undergoing this treatment based on: a) The normal model. b) Student’s \(t\) -distribution. c) Compare your answers in a) and b) and comment on the relevance of using Student’s \(t\) -distribution in this case.

n = 64
xbar = 89.60
xstd = 12.96

# estimated standard error
sehat = ...

a) Using normal approximation#

from scipy.stats import norm
z_l = ...
z_u = ...

# construct confidence interval
...
Ellipsis

b) Using Student’s \(t\)-distribution#

from scipy.stats import t as tdist
t_l = ...
t_u = ...

# construct confidence interval
...
Ellipsis