Chapter 2 — Probability theory#
Probability theory is a language for describing randomness, variability, and uncertainty. You need to know probability theory to understand statistics. This is why Part 1 of the book includes ~200 pages of formulas, explanations, figures, and code examples to get you up to speed on all the probability theory topics that you need to know.
Below, you’ll find the computational notebooks from Chapter 2 of the book. Going through these notebooks would be an interesting even if you don’t have the book, since the notebooks are mostly self contained.
Notebooks#
Each notebook contains the code examples from corresponding section. If you’re reading the book, you can follow along by running the commands in the these notebooks, to run all the probability calculations for yourself.
Discrete random variables 21_discrete_random_vars.ipynb
Multiple random variables 22_multiple_random_vars.ipynb
Inventory of discrete distributions 23_inventory_discrete_dists.ipynb
Calculus prerequisites 24_calculus_prerequisites.ipynb
Continuous random variables 25_continuous_random_vars.ipynb
Inventory of continuous distributions 26_inventory_continuous_dists.ipynb
Random variable generation 27_random_var_generation.ipynb
Probability models for random samples 28_random_samples.ipynb
Exercises#
Each section contains the exercises to help you practice probability calculations covered explained in that section.
Probability models for real world data#
Here is a list of the different domains that can be usefully described using probability distributions:
math models for r.v. \(X\) (defined as probability distribution function \(f_X\))
computer models like
rvX
created from one of the model families inscipy.stats
initialized with appropriate parameters. The computer modelrvX
for the random variable \(X\) has methods like:rvX.rvs()
,rvX.cdf(b)
,pdf/pmf
, and stats likervX.mean()
,rvX.median()
,rvX.var()
,rvX.std()
,rvQ.ppf(q)
, etc.random draws form a generative process (computer simulation that generates random numbers, see Section 2.7 for examples)
random draws from a real world process (e.g. factory that produces a new item)
data for an entire population (census)
sample data from a population (the data type we learned about in Chapter 1)
synthetic data obtained by resampling:
bootstrap samples = sampling from the empirical distribution
permutation test that forget group membership