Section 2.6 — Inventory of continuous distributions#
This notebook contains all the code examples from Section 2.6 Inventory of continuous distributions of the No Bullshit Guide to Statistics.
Notebook setup#
# load Python modules
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Figures setup
sns.set_theme(
context="paper",
style="whitegrid",
palette="colorblind",
rc={'figure.figsize': (7,4)},
)
%config InlineBackend.figure_format = 'retina'
# set random seed for repeatability
np.random.seed(42)
%pip install ministats
Requirement already satisfied: ministats in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (0.3.12)
Requirement already satisfied: matplotlib>=3.7 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from ministats) (3.10.0)
Requirement already satisfied: numpy>=1.24 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from ministats) (1.26.4)
Requirement already satisfied: scipy>=1.6 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from ministats) (1.14.1)
Requirement already satisfied: pandas>=2 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from ministats) (2.2.3)
Requirement already satisfied: seaborn>=0.13.2 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from ministats) (0.13.2)
Requirement already satisfied: statsmodels>=0.14.1 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from ministats) (0.14.4)
Requirement already satisfied: bambi in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from ministats) (0.15.1.dev1+g089d8e9)
Requirement already satisfied: arviz in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from ministats) (0.20.0)
Requirement already satisfied: contourpy>=1.0.1 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from matplotlib>=3.7->ministats) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from matplotlib>=3.7->ministats) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from matplotlib>=3.7->ministats) (4.55.3)
Requirement already satisfied: kiwisolver>=1.3.1 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from matplotlib>=3.7->ministats) (1.4.8)
Requirement already satisfied: packaging>=20.0 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from matplotlib>=3.7->ministats) (24.2)
Requirement already satisfied: pillow>=8 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from matplotlib>=3.7->ministats) (11.0.0)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from matplotlib>=3.7->ministats) (3.2.0)
Requirement already satisfied: python-dateutil>=2.7 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from matplotlib>=3.7->ministats) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from pandas>=2->ministats) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from pandas>=2->ministats) (2024.2)
Requirement already satisfied: patsy>=0.5.6 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from statsmodels>=0.14.1->ministats) (1.0.1)
Requirement already satisfied: setuptools>=60.0.0 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from arviz->ministats) (65.5.0)
Requirement already satisfied: xarray>=2022.6.0 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from arviz->ministats) (2024.11.0)
Requirement already satisfied: h5netcdf>=1.0.2 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from arviz->ministats) (1.4.1)
Requirement already satisfied: typing-extensions>=4.1.0 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from arviz->ministats) (4.12.2)
Requirement already satisfied: xarray-einstats>=0.3 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from arviz->ministats) (0.8.0)
Requirement already satisfied: formulae>=0.5.3 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from bambi->ministats) (0.5.4)
Requirement already satisfied: graphviz in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from bambi->ministats) (0.20.3)
Requirement already satisfied: pymc>=5.18.0 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from bambi->ministats) (5.19.1)
Requirement already satisfied: h5py in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from h5netcdf>=1.0.2->arviz->ministats) (3.12.1)
Requirement already satisfied: cachetools>=4.2.1 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from pymc>=5.18.0->bambi->ministats) (5.5.0)
Requirement already satisfied: cloudpickle in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from pymc>=5.18.0->bambi->ministats) (3.1.0)
Requirement already satisfied: pytensor<2.27,>=2.26.1 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from pymc>=5.18.0->bambi->ministats) (2.26.4)
Requirement already satisfied: rich>=13.7.1 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from pymc>=5.18.0->bambi->ministats) (13.9.4)
Requirement already satisfied: threadpoolctl<4.0.0,>=3.1.0 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from pymc>=5.18.0->bambi->ministats) (3.5.0)
Requirement already satisfied: six>=1.5 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib>=3.7->ministats) (1.17.0)
Requirement already satisfied: filelock>=3.15 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from pytensor<2.27,>=2.26.1->pymc>=5.18.0->bambi->ministats) (3.16.1)
Requirement already satisfied: etuples in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from pytensor<2.27,>=2.26.1->pymc>=5.18.0->bambi->ministats) (0.3.9)
Requirement already satisfied: logical-unification in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from pytensor<2.27,>=2.26.1->pymc>=5.18.0->bambi->ministats) (0.4.6)
Requirement already satisfied: miniKanren in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from pytensor<2.27,>=2.26.1->pymc>=5.18.0->bambi->ministats) (1.0.3)
Requirement already satisfied: cons in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from pytensor<2.27,>=2.26.1->pymc>=5.18.0->bambi->ministats) (0.4.6)
Requirement already satisfied: markdown-it-py>=2.2.0 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from rich>=13.7.1->pymc>=5.18.0->bambi->ministats) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from rich>=13.7.1->pymc>=5.18.0->bambi->ministats) (2.18.0)
Requirement already satisfied: mdurl~=0.1 in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich>=13.7.1->pymc>=5.18.0->bambi->ministats) (0.1.2)
Requirement already satisfied: toolz in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from logical-unification->pytensor<2.27,>=2.26.1->pymc>=5.18.0->bambi->ministats) (1.0.0)
Requirement already satisfied: multipledispatch in /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages (from logical-unification->pytensor<2.27,>=2.26.1->pymc>=5.18.0->bambi->ministats) (1.0.0)
Note: you may need to restart the kernel to use updated packages.
from ministats import plot_pdf
from ministats import plot_cdf
from ministats import plot_pdf_and_cdf
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Review of formulas#
Gamma function#
from scipy.special import gamma as gammaf
gammaf(1) # = 0! = 1
1.0
gammaf(2) # = 1! = 1
1.0
gammaf(3) # = 2! = 2*1
2.0
gammaf(4) # = 3! = 3*2*1
6.0
gammaf(5) # = 4! = 4*3*2*1
24.0
[gammaf(z) for z in [4, 4.1, 4.5, 4.9, 5]]
[6.0, 6.812622863016677, 11.63172839656745, 20.66738596185786, 24.0]
Continuous distribution reference#
Uniform distribution#
The uniform distribution \(\mathcal{U}(\alpha,\beta)\) is described by the following probability density function:
For a uniform distribution \(\mathcal{U}(\alpha,\beta)\), each \(x\) between \(\alpha\) and \(\beta\) is equally likely to occur, and values of \(x\) outside this range have zero probability of occurring.
from scipy.stats import uniform
alpha = 2
beta = 7
rvU = uniform(alpha, beta-alpha)
# draw 10 random samples from X
rvU.rvs(10)
array([3.87270059, 6.75357153, 5.65996971, 4.99329242, 2.7800932 ,
2.7799726 , 2.29041806, 6.33088073, 5.00557506, 5.54036289])
# # ALT. use sns.lineplot
# # plot the probability density function (pdf) of the random variable X
# xs = np.linspace(0, 10, 1000)
# fUs = rvU.pdf(xs)
# sns.lineplot(x=xs, y=fUs)
Cumulative distribution function#
Standard uniform distribution#
The standard uniform distribution \(U_s \sim \mathcal{U}(0,1)\) is described by the following probability density function:
where \(U\) is the name of the random variable and \(u\) are particular values it can take on.
The above equation describes tells you how likely it is to observe \(\{U_s=x\}\). For a uniform distribution \(\mathcal{U}(0,1)\), each \(x\) between 0 and 1 is equally likely to occur, and values of \(x\) outside this range have zero probability of occurring.
from scipy.stats import uniform
rvUs = uniform(0, 1)
# draw 10 random samples from X
rvUs.rvs(1)
array([0.02058449])
import random
random.seed(3)
random.random()
0.23796462709189137
random.uniform(0,1)
0.5442292252959519
import numpy as np
np.random.seed(42)
np.random.rand(10)
array([0.37454012, 0.95071431, 0.73199394, 0.59865848, 0.15601864,
0.15599452, 0.05808361, 0.86617615, 0.60111501, 0.70807258])
Simulating other random variables#
We can use the uniform random variable to generate random variables from other distributions.
For example,
suppose we want to generate observations of a coin toss random variable
which comes out heads
50% of the time and tails
50% of the time.
We can use the standard uniform random variables obtained from random.random()
and split the outcomes at the “halfway point” of the sample space,
to generate the 50-50 randomness of a coin toss.
The function flip_coin
defined below shows how to do this:
def flip_coin():
u = random.random() # random number in [0,1]
if u < 0.5:
return "heads"
else:
return "tails"
# simulate one coin toss
flip_coin()
'heads'
# simulate 10 coin tosses
[flip_coin() for i in range(0,10)]
['tails',
'tails',
'heads',
'heads',
'tails',
'heads',
'heads',
'tails',
'heads',
'tails']
Exponential#
from scipy.stats import expon
lam = 7
loc = 0
scale = 1/lam
rvE = expon(loc, scale)
The computer model expon
accepts as its first argument an optional “location” parameter,
which can shift the exponential distribution to the right,
but we want loc=0
to get the simple case,
that corresponds to the un-shifted distribution \(\textrm{Expon}(\lambda)\).
rvE.mean(), rvE.var()
(0.14285714285714285, 0.02040816326530612)
# math formulas for mean and var
1/lam, 1/lam**2
(0.14285714285714285, 0.02040816326530612)
## ALT. we can obtain mean and ver using the .stats() method
## The code below also computes the skewness and the kurtosis
# mean, var, skew, kurt = rvE.stats(moments='mvsk')
# mean, var, skew, kurt
# f_E(5) = pdf value at x=10
rvE.pdf(0.2)
1.7261787475912451
Normal#
A random variable \(N\) with a normal distribution \(\mathcal{N}(\mu,\sigma)\) is described by the probability density function:
The mean \(\mu\) and the standard deviation \(\sigma\) are called the parameters of the distribution. The math notation \(\mathcal{N}(\mu, \sigma)\) is used to describe the whole family of normal probability distributions.
from scipy.stats import norm
mu = 10 # = 𝜇 where is the centre?
sigma = 3 # = 𝜎 how spread out is it?
rvN = norm(mu, sigma)
rvN.mean(), rvN.var()
(10.0, 9.0)
# ALT. generate the plot manually
# create a normal random variable
from scipy.stats import norm
mean = 1000 # 𝜇 (mu) = where is its center?
std = 100 # 𝜎 (sigma) = how spread out is it?
rvN = norm(mean, std)
# plot its probability density function (pdf)
xs = np.linspace(300, 1700, 1000)
ys = rvN.pdf(xs)
ax = sns.lineplot(x=xs, y=ys)
Standard normal#
A standard normal is denoted \(Z\) with a normal distribution \(\mathcal{N}(\mu=0,\sigma=1)\) and described by the probability density function:
from scipy.stats import norm
rvZ = norm(0,1)
rvZ.mean(), rvZ.var()
(0.0, 1.0)
fig, ax = plt.subplots()
plot_pdf(rvZ, xlims=[-4,4], ax=ax, rv_name="Z")
Cumulative probabilities in the tails#
Probability of \(Z\) being smaller than \(-2.2\).
rvZ.cdf(-2.3)
0.010724110021675809
Probability of \(Z\) being greater than \(2.2\).
1 - rvZ.cdf(2.3)
0.010724110021675837
Probability of \(|Z| > 2.2\).
rvZ.cdf(-2.3) + (1-rvZ.cdf(2.3))
0.021448220043351646
norm.cdf(-2.3,0,1) + (1-norm.cdf(2.3,0,1))
0.021448220043351646
Inverse cumulative distribution calculations#
rvZ.ppf(0.05)
-1.6448536269514729
rvZ.ppf(0.95)
1.6448536269514722
rvZ.interval(0.9)
(-1.6448536269514729, 1.6448536269514722)
Student’s \(t\)-distribution#
This is a generalization of the standard normal with “heavy” tails.
from scipy.stats import t
df = 10
rvT = t(df)
rvT.mean(), rvT.var()
(0.0, 1.25)
# Kurtosis formula kurt(rvT) = 6/(df-4) for df>4
rvT.stats("k")
1.0
rvT.cdf(-2.3)
0.022127156642143552
rvT.ppf(0.05), rvT.ppf(0.95)
(-1.8124611228107341, 1.8124611228107335)
Snedecor’s \(F\)-distribution#
from scipy.stats import f
df1, df2 = 15, 10
rvF = f(df1, df2)
rvF.mean(), rvF.var()
(1.25, 0.7986111111111112)
Chi-squared distribution#
from scipy.stats import chi2
k = 10
rvX2 = chi2(k)
rvX2.mean(), rvX2.var()
(10.0, 20.0)
1 - rvX2.cdf(20)
0.02925268807696113
Gamma (optional)#
https://en.wikipedia.org/wiki/Gamma_distribution
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gamma.html
from scipy.stats import gamma as gammad
alpha = 4
loc = 0
lam = 2
beta = 1/lam
rvG = gammad(alpha, loc, beta)
rvG.mean(), rvG.var()
(2.0, 1.0)
Beta (optional)#
from scipy.stats import beta as betad
alpha = 3
beta = 7
rvB = betad(alpha, beta)
rvB.mean(), rvB.var()
(0.3, 0.019090909090909092)
Cauchy (optional)#
from scipy.stats import cauchy
x0 = 3
gamma = 5
rvC = cauchy(x0, gamma)
rvC.mean(), rvC.var()
(nan, nan)