Chapter 2 — Probability theory

Probability theory is a language for describing randomness, variability, and uncertainty. You need to know probability theory to understand statistics. This is why Part 1 of the book includes ~200 pages of formulas, explanations, figures, and code examples to get you up to speed on all the probability theory topics that you need to know.

Below, you'll find the computational notebooks from Chapter 2 of the book. Going through these notebooks would be an interesting even if you don't have the book, since the notebooks are mostly self contained.

Notebooks

Each notebook contains the code examples from corresponding section. If you're reading the book, you can follow along by running the commands in the these notebooks, to run all the probability calculations for yourself.

Discrete random variables 21_discrete_random_vars.ipynb
Multiple random variables 22_multiple_random_vars.ipynb
Inventory of discrete distributions 23_inventory_discrete_dists.ipynb
Continuous random variables 24_continuous_random_vars.ipynb
Multiple continuous random variables 25_multiple_continuous_random_vars.ipynb
Inventory of continuous distributions 26_inventory_continuous_dists.ipynb
Random variable generation 27_simulations.ipynb
Probability models for random samples 28_random_samples.ipynb

Exercises

Each section contains the exercises to help you practice probability calculations covered explained in that section.

Probability models for real world data

Here is a list of the different domains that can be usefully described using probability distributions: - math models for r.v. $X$ (defined as probability distribution function $f_X$ ) - computer models like rvX created from one of the model families in scipy.stats initialized with appropriate parameters. The computer model rvX for the random variable $X$ has methods like: rvX.rvs(), rvX.cdf(b), pdf/pmf, and stats like rvX.mean(), rvX.median(), rvX.var(), rvX.std(), rvQ.ppf(q), etc. - random draws form a generative process (computer simulation that generates random numbers, see Section 2.7 for examples) - random draws from a real world process (e.g. factory that produces a new item) - data for an entire population (census) - sample data from a population (the data type we learned about in Chapter 1) - synthetic data obtained by resampling: - bootstrap samples = sampling from the empirical distribution - permutation test that forget group membership