Introduction

Introduction#

In these series of notebooks we’ll look at a complete example of a statistical analysis that touches all major topics described in the upcoming book No Bullshit Guide to Statistics. We’ll illustrate all the statistics concepts using hands-on demonstrations based on Python code.

To follow along at home, click this button to launch jupyter instance in the cloud.

Links to book resources:#

The No Bullshit Guide to Statistics book outline
The statistics concept map
Source code of this notebook: minireference/noBSstats
Read the blog posts about the statistics curriculum and the book proposal

Example of a real-world use of statisitcs#

A startup founder, let’s call her Amy, has come-up with a novel way to build a data science team at her company. Instead of hiring expensive data science specialists (business analyst, statisticians, machine learning experts), she has decided to offer stats training for all employees (both business and tech). She calls this the “everybody stats” policy, and plans to implement stats training of roughly 100h for all new hires.

Her theory is that new hires who complete the stats training will be able to “work smarter” and produce more value for the company. Sales people will get better at sales by analyzing customer data, developers will be able to prioritize performance issues in their code, system administrators will be able to optimize infrastructure to reduce costs, and operations will improve logistics, and project managers will have something useful to do instead of just asking for status updates.

Employee Lifetime Value (ELV) is an objectively measured score that includes an employees total contributions to the company after one year on the job. This single number measures contributions to business value, solving problems, helping others, contributing ideas, etc. We assume that Amy has some way of measuring ELV for all new hires.

Random assignment process#

In order to evaluate the effectiveness of of this “everybody stats” initiative, Amy will offer the stats training for half the new hires, while a the other half of new employees will not receive stats training and serve as the control group:

Group NS (control): employees with no statistics training
Group S (intervention): employees who receive statistics training

The random assignment process is described by the following illustration:

Let’s see the data…#

Amy has collected ELV data for both groups in the form of a spreadsheet data/employee_lifetime_values_records.ods. In total, the spreadsheet contains the data for 61 emloyees, 31 in the control group (no stats training), and 30 in the intervention group (has received 100h of stats training).

Research questions#

In the rest of the notebooks, we’ll see how Amy can compare the ELV values for the two groups using statistics. Specifically, our goal is to answer the following two questions:

Does statistical training make a difference in ELV?
How big is the increase in ELV that occurs thanks to stats training?

Plan for the next three videos#

Start by practical DATA handling and visualization (this video)
Take a math detour in PROB notebook to learn tools of probability theory (next video)
Look at the STATS techniques to answer the two research questions (third video)
Bonus topic: learn about LINEAR MODELS which can describe relationship between stats training and ELV (fourth video)

Let’s continue to 01_DATA.ipynb and get started…