Exercises for Section 3.2 Confidence intervals#
This notebook contains the solutions to the exercises from Section 3.2 Confidence intervals in the No Bullshit Guide to Statistics.
Notebooks setup#
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Estimator functions defined in Section 3.1#
def mean(sample):
return sum(sample) / len(sample)
def var(sample):
xbar = mean(sample)
sumsqdevs = sum([(xi-xbar)**2 for xi in sample])
return sumsqdevs / (len(sample)-1)
def std(sample):
s2 = var(sample)
return np.sqrt(s2)
def dmeans(xsample, ysample):
dhat = mean(xsample) - mean(ysample)
return dhat
Exercises#
Exercise 3.17#
Compute a confidence 90% confidence interval for the population mean
based on the sample from Batch 04 of the kombucha
dataset.
kombucha = pd.read_csv("datasets/kombucha.csv")
ksample04 = kombucha[kombucha["batch"]==4]["volume"]
a) analytical approximation#
from scipy.stats import t as tdist
n04 = ...
kbar04 = ...
seKbar04 = ...
t_l = ...
t_u = ...
# construct confidence interval
# [... + t_l*..., ... + t_u*...]
b) bootstrap estimation#
from ministats import gen_boot_dist
# obtain bootstrap sampling distribution
kbars04_boot = ...
# construct confidence interval
# [np.percentile(...), np.percentile(...)]
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Exercise 3.18#
Calculate a confidence 90% confidence interval for the for population variance
based on the sample from Batch 05 of the kombucha
dataset.
kombucha = pd.read_csv("datasets/kombucha.csv")
ksample05 = ... # load sample from Batch 5
a) analytical approximation#
n05 = ...
kvar05 = ...
from scipy.stats import chi2
x2_l = ...
x2_u = ...
# construct confidence interval
...
Ellipsis
b) bootstrap estimation#
kvars05_boot = ...
# construct confidence interval
...
Ellipsis
Exercise 3.19#
Compute a 95% confidence interval for the difference between rural and city sleep scores in the doctors dataset. a) Use analytical approximation formula in terms Student’s \(t\)-distribution. b) Use bootstrap estimation.
Hint: Use the code doctors[doctors["loc"]=="rur"]
to select
the subset of the doctors working in a rur
al location.
doctors = pd.read_csv("datasets/doctors.csv")
scoresR = doctors[doctors["loc"]=="rur"]["score"]
scoresU = doctors[doctors["loc"]=="urb"]["score"]
# observed difference between scores
dscores = ...
a) analytical approximation#
# obtain the sample sizes and stds of the two groups
nR, stdR = ..., ...
nU, stdU = ..., ...
# standard error of the difference between group means
seDscores = ...
# calculate the degrees of freedom
from ministats import calcdf
# df = ...
# Student's t-distribution with df degrees of freedom
from scipy.stats import t as tdist
t_l = ...
t_u = ...
# construct confidence interval
...
Ellipsis
b) bootstrap estimation#
# compute bootstrap estimates for mean in each group
np.random.seed(43)
meanR_boot = ...
meanU_boot = ...
# compute the difference between means of bootstrap samples
dmeans_boot = ...
# construct confidence interval
...
Ellipsis
Exercise 3.20#
Calculate a 80% confidence interval for the difference between debate and lecture groups the students
dataset.
Hint: Use the code student[student["curriculum"]=="debate"]
to select
the subset of the students who had the debate
curriculum.
students = pd.read_csv("datasets/students.csv")
scoresD = ... # select student scores for students with curriculum = debate
scoresL = ... # select student scores for students with curriculum = lecture
# observed difference between scores
dhat = ...
a) analytical approximation#
# obtain the sample sizes and stds of the two groups
nD, stdD = ..., ...
nL, stdL = ..., ...
# standard error of the difference between group means
seDscores = ...
# calculate the degrees of freedom
from ministats import calcdf
df = ...
# Student's t-distribution with df degrees of freedom
from scipy.stats import t as tdist
t_l = ...
t_u = ...
# construct confidence interval
...
Ellipsis
b) bootstrap estimation#
np.random.seed(42)
meanD_boot = ...
meanL_boot = ...
# compute the difference between means of bootstrap samples
dmeans_boot = ...
# construct confidence interval
...
Ellipsis
Exercise 3.21#
As part of a lab experiment, sixty-four two-week old rats were given a vitamin D supplement for a period of one month, and their weights were recored at the end of the month (30 days). The sample mean was \(89.60\) ;g with standard deviation \(12.96\) ;g. Calculate a 95%confidence interval for the mean weight for rats undergoing this treatment based on: a) The normal model. b) Student’s \(t\) -distribution. c) Compare your answers in a) and b) and comment on the relevance of using Student’s \(t\) -distribution in this case.
n = 64
xbar = 89.60
xstd = 12.96
# estimated standard error
sehat = ...
a) Using normal approximation#
from scipy.stats import norm
z_l = ...
z_u = ...
# construct confidence interval
...
Ellipsis
b) Using Student’s \(t\)-distribution#
from scipy.stats import t as tdist
t_l = ...
t_u = ...
# construct confidence interval
...
Ellipsis