Chapter 2 — Probability theory#

Probability theory is a language for describing uncertainty, variability, and randomness. You need to know probability theory to understand statistics. This is why the book includes ~200 pages of formulas, explanations, figures, and code examples to get you up to speed on all the probability theory topics that you need to know.


Each notebook contains the code examples from corresponding section in book. If you’re reading the book, you should follow along by running the commands in the these notebooks, to check all the probability calculations for yourself. You don’t need to copy-paste the code examples manually—I’ve already copy-pasted them for you!

Going through these notebooks would be an interesting even if you don’t have the book, since the notebooks are mostly self contained. In particular, in combinations with the video tutorials like the Stats overview series, you could probably learn a good bit of statistics without ever buying the No Bullshit Guide to Statistics.


Learning the Python code used for doing probability calculations will be very helpful for solving the exercises and problems in the book. I’ve prepared notebooks in the ../notebooks/ folder, that you can use to start working on the exercises and problems.

What is probability theory?#

Probability theory is a language for describing uncertainty in measurements, variability in observations, and randomness in outcomes.

  • Originally the study of “randomness” and “expectations” started to quantify random events in gambling.

  • Later extended to as a general purpose tool to model any process that contains uncertainty:

    • random variables = described by probability distribution (CDF, pmf/pdf) modelled as a math function with parameters \(\theta\)

    • noise = can be modelled as a random variable

    • sampling = variations due to random selection of a subset from the population

    • beliefs = can be described as probability distributions

  • Probability theory is an essential tool for statistics.

  • Probability theory is also a foundational subject that used in physics, machine learning, biology, optimization, algorithms, etc. (side note: in terms of usefulness, I’d say probability theory is up there with linear algebra—you need to know this shit!)

Random variables#

In probability theory, a random variable \(X\) is a tool for describing variability of some quantity. Unlike a regular variable \(x\) that is a placeholder for a single value, the random variable \(X\) can have different values outcomes every time it is observed. The random variables \(X\) is described by a probability distribution \(f_X\) (a function that assigns the probabilities to different possible outcomes of the random variable \(X\)). The probability distribution (probability model) \(f_X\) is usually described as \(f_X = \mathcal{M}(\theta)\), where \(\mathcal{M}\) is a probability model family (e.g. uniform, normal, exponential, Poisson, etc.), and \(\theta\) are the model parameters (denoted by Greek letters like \(\theta\), \(\mu\), \(\sigma\), etc.).

Probability models for real world data#

The main reason why you might wand to learn probability theory, is because it provides a unified language for talking about the uncertainty, variability, and randomness we observe in real-world data.

Here is a short list of the different domains that can be usefully described using probability distributions:

  • math models for r.v. \(X\) (defined as probability distribution function \(f_X\))

  • computer models like rvX created from one of the model families in scipy.stats initialized with appropriate parameters. The computer model rvX for the random variable \(X\) has methods like: rvX.rvs(), rvX.cdf(b), pdf/pmf, and stats like rvX.mean(), rvX.median(), rvX.var(), rvX.std(), rvQ.ppf(q), etc.

  • random draws form a generative process (computer simulation that generates random numbers, see Section 2.7 for examples)

  • random draws from a real world process (e.g. factory that produces a new item)

  • data for an entire population (census)

  • sample data from a population (the data type we learned about in Chapter 1)

  • synthetic data obtained by resampling:

    • bootstrap samples = sampling from the empirical distribution

    • permutation test that forget group membership