# Exercises for Section 3.2 Confidence intervals#

This notebook contains the solutions to the exercises
from Section 3.2 Confidence intervals
in the **No Bullshit Guide to Statistics**.

## Notebooks setup#

```
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
```

## Estimator functions defined in Section 3.1#

```
def mean(sample):
return sum(sample) / len(sample)
def var(sample):
xbar = mean(sample)
sumsqdevs = sum([(xi-xbar)**2 for xi in sample])
return sumsqdevs / (len(sample)-1)
def std(sample):
s2 = var(sample)
return np.sqrt(s2)
def dmeans(xsample, ysample):
dhat = mean(xsample) - mean(ysample)
return dhat
```

## Exercises#

### Exercise 3.17#

Compute a confidence 90% confidence interval for the population mean
based on the sample from Batch 04 of the `kombucha`

dataset.

```
kombucha = pd.read_csv("datasets/kombucha.csv")
ksample04 = kombucha[kombucha["batch"]==4]["volume"]
```

#### a) analytical approximation#

```
from scipy.stats import t as tdist
n04 = ...
kbar04 = ...
seKbar04 = ...
t_l = ...
t_u = ...
# construct confidence interval
# [... + t_l*..., ... + t_u*...]
```

#### b) bootstrap estimation#

```
from stats_helpers import gen_boot_dist
# obtain bootstrap sampling distribution
kbars04_boot = ...
# construct confidence interval
# [np.percentile(...), np.percentile(...)]
```

### Exercise 3.18#

Calculate a confidence 90% confidence interval for the for population variance
based on the sample from Batch 05 of the `kombucha`

dataset.

```
kombucha = pd.read_csv("datasets/kombucha.csv")
ksample05 = ... # load sample from Batch 5
```

#### a) analytical approximation#

```
n05 = ...
kvar05 = ...
from scipy.stats import chi2
x2_l = ...
x2_u = ...
# construct confidence interval
...
```

```
Ellipsis
```

#### b) bootstrap estimation#

```
kvars05_boot = ...
# construct confidence interval
...
```

```
Ellipsis
```

### Exercise 3.19#

Compute a 95% confidence interval for the difference between rural and city sleep scores in the doctors dataset. **a)** Use analytical approximation formula in terms Student’s \(t\)-distribution. **b)** Use bootstrap estimation.

Hint: Use the code `doctors[doctors["location"]=="rural"]`

to select
the subset of the doctors working in a `rural`

location.

```
doctors = pd.read_csv("datasets/doctors.csv")
scoresR = doctors[doctors["location"]=="rural"]["score"]
scoresU = doctors[doctors["location"]=="urban"]["score"]
# observed difference between scores
dscores = ...
```

#### a) analytical approximation#

```
# obtain the sample sizes and stds of the two groups
nR, stdR = ..., ...
nU, stdU = ..., ...
# standard error of the difference between group means
seDscores = ...
# calculate the degrees of freedom
from stats_helpers import calcdf
# df = ...
# Student's t-distribution with df degrees of freedom
from scipy.stats import t as tdist
t_l = ...
t_u = ...
# construct confidence interval
...
```

```
Ellipsis
```

#### b) bootstrap estimation#

```
# compute bootstrap estimates for mean in each group
np.random.seed(43)
meanR_boot = ...
meanU_boot = ...
# compute the difference between means of bootstrap samples
dmeans_boot = ...
# construct confidence interval
...
```

```
Ellipsis
```

### Exercise 3.20#

Calculate a 80% confidence interval for the difference between debate and lecture groups the `students`

dataset.

Hint: Use the code `student[student["curriculum"]=="debate"]`

to select
the subset of the students who had the `debate`

curriculum.

```
students = pd.read_csv("datasets/students.csv")
scoresD = ... # select student scores for students with curriculum = debate
scoresL = ... # select student scores for students with curriculum = lecture
# observed difference between scores
dhat = ...
```

#### a) analytical approximation#

```
# obtain the sample sizes and stds of the two groups
nD, stdD = ..., ...
nL, stdL = ..., ...
# standard error of the difference between group means
seDscores = ...
# calculate the degrees of freedom
from stats_helpers import calcdf
df = ...
# Student's t-distribution with df degrees of freedom
from scipy.stats import t as tdist
t_l = ...
t_u = ...
# construct confidence interval
...
```

```
Ellipsis
```

#### b) bootstrap estimation#

```
np.random.seed(42)
meanD_boot = ...
meanL_boot = ...
# compute the difference between means of bootstrap samples
dmeans_boot = ...
# construct confidence interval
...
```

```
Ellipsis
```

### Exercise 3.21#

As part of a lab experiment,
sixty-four two-week old rats were given a vitamin D supplement for a period of one month,
and their weights were recored at the end of the month (30 days).
The sample mean was \(89.60\) ;g with standard deviation \(12.96\) ;g.
Calculate a 95%confidence interval for the mean weight for rats undergoing this treatment based on: **a)** The normal model. **b)** Student’s \(t\) -distribution. **c)** Compare your answers in a) and b) and comment on the relevance of using Student’s \(t\) -distribution in this case.

```
n = 64
xbar = 89.60
xstd = 12.96
# estimated standard error
sehat = ...
```

#### a) Using normal approximation#

```
from scipy.stats import norm
z_l = ...
z_u = ...
# construct confidence interval
...
```

```
Ellipsis
```

#### b) Using Student’s \(t\)-distribution#

```
from scipy.stats import t as tdist
t_l = ...
t_u = ...
# construct confidence interval
...
```

```
Ellipsis
```