Section 2.5 — Multiple continuous random variables#
This notebook contains all the code examples from Section 2.5 Multiple continuous random variables of the No Bullshit Guide to Statistics.
Notebook setup#
We’ll start by importing the Python modules we’ll need for this notebook.
# load Python modules
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from plot_helpers import RCPARAMS
RCPARAMS.update({"figure.figsize": (7, 2)})
sns.set_theme(
context="paper",
style="whitegrid",
palette="colorblind",
rc=RCPARAMS,
)
%config InlineBackend.figure_format = 'retina'
%pip install -q ministats
Note: you may need to restart the kernel to use updated packages.
Definitions#
Random variables#
random variable \(X\): a quantity that can take on different values.
sample space \(\mathcal{X}\): describes the set of all possible outcomes of the random variable \(X\).
outcome: a particular value \(\{X = x\}\) that can occur as a result of observing the random variable \(X\).
event subset of the sample space \(\{a \leq X \leq b\}\) that can occur as a result of observing the random variable \(X\).
\(f_X\): the probability density function (PDF) is a function that assigns probabilities to the different outcome in the sample space of a random variable. The probability distribution function of the random variable \(X\) is a function of the form \(f_X: \mathcal{X} \to \mathbb{R}\).
Multiple random variables#
Example 3: bivariate normal distribution#
TODO: formula for general bivarate normal
from scipy.stats import multivariate_normal
# parameters
mu = [10, 5]
Sigma = [[ 3**2, 0.75*3*1],
[ 0.75*3*1, 1**2]]
# multivariate normal
rvXY = multivariate_normal(mu, Sigma)
rvXY.pdf((10,5))
0.08020655225672235
from scipy.integrate import dblquad
def fXY(x,y):
"""
Adapter function because `dblquad` expects the function
we're integrating to be of the form f(y,x), and not f(x,y).
"""
return rvXY.pdf([y,x])
dblquad(fXY, a=11, b=np.inf, gfun=6, hfun=np.inf)[0]
0.1372330649420418
1 - rvXY.cdf((11,np.inf)) - rvXY.cdf((np.inf,6)) + rvXY.cdf((11,6))
0.13723306482268627
Joint probability density functions#
TODO: formulas
from ministats import plot_joint_pdf_contourf
xlims = [3, 17]
ylims = [1.5, 8.5]
plot_joint_pdf_contourf(rvXY, xlims=xlims, ylims=ylims);
Marginal density functions#
TODO: formulas
from ministats.book.figures import plot_joint_pdf_and_marginals
plot_joint_pdf_and_marginals(rvXY, xlims=xlims, ylims=ylims);
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[10], line 1
----> 1 from ministats.book.figures import plot_joint_pdf_and_marginals
3 plot_joint_pdf_and_marginals(rvXY, xlims=xlims, ylims=ylims);
ModuleNotFoundError: No module named 'ministats.book'
Conditional probability density functions#
from ministats.book.figures import plot_slices_through_joint_pdf
xcuts = range(2, 15, 1)
plot_slices_through_joint_pdf(rvXY, xlims=[3,17], ylims=[0,10], xcuts=xcuts);
Examples#
Example ?: Multivariate normal#
# TODO
Useful probability formulas#
Multivariable expectation#
Mean#
rvXY.mean
array([10., 5.])
Covariance#
Sigma = rvXY.cov
Sigma
array([[9. , 2.25],
[2.25, 1. ]])
covXY = Sigma[0,1]
covXY
2.25
Correlation#
stdX = np.sqrt(Sigma[0,0])
stdY = np.sqrt(Sigma[1,1])
corrXY = covXY / (stdX * stdY)
corrXY
0.75
Independent, identically distributed random variabls#
TODO formulas