Section 2.5 — Multiple continuous random variables

Section 2.5 — Multiple continuous random variables#

This notebook contains all the code examples from Section 2.5 Multiple continuous random variables of the No Bullshit Guide to Statistics.

Notebook setup#

We’ll start by importing the Python modules we’ll need for this notebook.

# load Python modules
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from plot_helpers import RCPARAMS
RCPARAMS.update({"figure.figsize": (7, 2)})
sns.set_theme(
    context="paper",
    style="whitegrid",
    palette="colorblind",
    rc=RCPARAMS,
)
%config InlineBackend.figure_format = 'retina'

%pip install -q ministats

Note: you may need to restart the kernel to use updated packages.

Definitions#

Random variables#

random variable \(X\): a quantity that can take on different values.
sample space \(\mathcal{X}\): describes the set of all possible outcomes of the random variable \(X\).
outcome: a particular value \(\{X = x\}\) that can occur as a result of observing the random variable \(X\).
event subset of the sample space \(\{a \leq X \leq b\}\) that can occur as a result of observing the random variable \(X\).
\(f_X\): the probability density function (PDF) is a function that assigns probabilities to the different outcome in the sample space of a random variable. The probability distribution function of the random variable \(X\) is a function of the form \(f_X: \mathcal{X} \to \mathbb{R}\).

Multiple random variables#

Example 3: bivariate normal distribution#

TODO: formula for general bivarate normal

from scipy.stats import multivariate_normal

# parameters
mu = [10, 5]
Sigma = [[  3**2,     0.75*3*1],
         [  0.75*3*1,     1**2]]


# multivariate normal
rvXY = multivariate_normal(mu, Sigma)

rvXY.pdf((10,5))

0.08020655225672235

from scipy.integrate import dblquad

def fXY(x,y):
    """
    Adapter function because `dblquad` expects the function
    we're integrating to be of the form f(y,x), and not f(x,y).
    """
    return rvXY.pdf([y,x])

dblquad(fXY, a=11, b=np.inf, gfun=6, hfun=np.inf)[0]

0.1372330649420418

1 - rvXY.cdf((11,np.inf)) - rvXY.cdf((np.inf,6)) + rvXY.cdf((11,6))

0.13723306482268627

Joint probability density functions#

TODO: formulas

from ministats import plot_joint_pdf_contourf

xlims = [3, 17]
ylims = [1.5, 8.5]

plot_joint_pdf_contourf(rvXY, xlims=xlims, ylims=ylims);

../_images/60c7177a7b4e88a33251e3caae21c1cdc1aa027b679bdbe8194903bc9acae672.png

from ministats import plot_joint_pdf_surface

viewdict = dict(elev=60., azim=-110, roll=-16)
plot_joint_pdf_surface(rvXY, xlims=xlims, ylims=ylims, viewdict=viewdict);

../_images/5beae91798086a48c29e4d1c2b540389911af271b96eed59b2b3ff27f45680a5.png

Marginal density functions#

TODO: formulas

from ministats.book.figures import plot_joint_pdf_and_marginals

plot_joint_pdf_and_marginals(rvXY, xlims=xlims, ylims=ylims);

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[10], line 1
----> 1 from ministats.book.figures import plot_joint_pdf_and_marginals
      3 plot_joint_pdf_and_marginals(rvXY, xlims=xlims, ylims=ylims);

ModuleNotFoundError: No module named 'ministats.book'

Conditional probability density functions#

from ministats.book.figures import plot_slices_through_joint_pdf

xcuts = range(2, 15, 1)
plot_slices_through_joint_pdf(rvXY, xlims=[3,17], ylims=[0,10], xcuts=xcuts);

../_images/57f26ed43cc0fa3b16b9a9c69e6909e9c5167cc4c807fb6efe737f2bdf24bcac.png

from ministats.book.figures import plot_conditional_fYgivenX

xcuts = range(6, 15, 1)
plot_conditional_fYgivenX(rvXY, xlims=[3,17], ylims=[0,10], xcuts=xcuts);

../_images/b2cb3c06c847265232f2154907ac11ca9b4f5585cab291ba2a14d57e238c8080.png

Examples#

Example ?: Multivariate normal#

# TODO

Useful probability formulas#

Multivariable expectation#

Mean#

rvXY.mean

array([10.,  5.])

Covariance#

Sigma = rvXY.cov
Sigma

array([[9.  , 2.25],
       [2.25, 1.  ]])

covXY = Sigma[0,1]
covXY

2.25

Correlation#

stdX = np.sqrt(Sigma[0,0])
stdY = np.sqrt(Sigma[1,1])

corrXY = covXY / (stdX * stdY)
corrXY

0.75