Section 4.4 — Regression with categorical predictors

Section 4.4 — Regression with categorical predictors#

This notebook contains the code examples from Section 4.4 Regression with categorical predictors from the No Bullshit Guide to Statistics.

Notebook setup#

# load Python modules
import os
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Figures setup
plt.clf()  # needed otherwise `sns.set_theme` doesn't work
from plot_helpers import RCPARAMS
# RCPARAMS.update({'figure.figsize': (10, 3)})   # good for screen
RCPARAMS.update({'figure.figsize': (5, 1.6)})    # good for print
sns.set_theme(
    context="paper",
    style="whitegrid",
    palette="colorblind",
    rc=RCPARAMS,
)

# High-resolution please
%config InlineBackend.figure_format = 'retina'

# Where to store figures
DESTDIR = "figures/lm/categorical"

<Figure size 640x480 with 0 Axes>

#######################################################

Definitions#

import pandas as pd
import statsmodels.formula.api as smf

Design matrix for linear model `lm1`#

students = pd.read_csv("../datasets/students.csv")
lm1 = smf.ols("score ~ 1 + effort", data=students).fit()
lm1.model.exog[0:3]

array([[ 1.  , 10.96],
       [ 1.  ,  8.69],
       [ 1.  ,  8.6 ]])

students["effort"].values[0:3]

array([10.96,  8.69,  8.6 ])

Design matrix for linear model `lm2`#

doctors = pd.read_csv("../datasets/doctors.csv")
formula = "score ~ 1 + alc + weed + exrc"
lm2  = smf.ols(formula, data=doctors).fit()
lm2.model.exog[0:3]

array([[ 1. ,  0. ,  5. ,  0. ],
       [ 1. , 20. ,  0. ,  4.5],
       [ 1. ,  1. ,  0. ,  7. ]])

doctors[["alc","weed","exrc"]].values[0:3]

array([[ 0. ,  5. ,  0. ],
       [20. ,  0. ,  4.5],
       [ 1. ,  0. ,  7. ]])

Example 1: binary predictor variable#

import statsmodels.formula.api as smf
lmloc = smf.ols("score ~ 1 + C(loc)", data=doctors).fit()
lmloc.params

Intercept        52.956522
C(loc)[T.urb]    -6.992885
dtype: float64

Visualization of the results of the model .

rur_mean = doctors[doctors["loc"]=="rur"]["score"].mean()
urb_mean = doctors[doctors["loc"]=="urb"]["score"].mean()

rur_mean, urb_mean, urb_mean - rur_mean

(52.95652173913044, 45.96363636363636, -6.992885375494076)

Encoding#

doctors["loc"][0:5]

  rur
  urb
  urb
  urb
  rur
Name: loc, dtype: object

lmloc.model.exog[0:5]

array([[1., 0.],
       [1., 1.],
       [1., 1.],
       [1., 1.],
       [1., 0.]])

# ALT.
# dmatrix("1 + C(loc)", doctors)[0:5]

Dummy coding for categorical predictors#

cats = ["A", "B", "C", "C"]
catdf = pd.DataFrame({"cat":cats})
catdf

	cat
0	A
1	B
2	C
3	C

from patsy import dmatrix

dmatrix("1 + C(cat)", data=catdf)

DesignMatrix with shape (4, 3)
  Intercept  C(cat)[T.B]  C(cat)[T.C]
          1            0            0
          1            1            0
          1            0            1
          1            0            1
  Terms:
    'Intercept' (column 0)
    'C(cat)' (columns 1:3)

Example 2: predictors with three levels#

doctors = pd.read_csv("../datasets/doctors.csv")
doctors["work"].head(5)

  hos
  cli
  hos
  eld
  cli
Name: work, dtype: object

dmatrix("1 + C(work)", data=doctors)[0:5]

array([[1., 0., 1.],
       [1., 0., 0.],
       [1., 0., 1.],
       [1., 1., 0.],
       [1., 0., 0.]])

lmw = smf.ols("score ~ 1 + C(work)", data=doctors).fit()
lmw.params

Intercept         46.545455
C(work)[T.eld]     4.569930
C(work)[T.hos]     2.668831
dtype: float64

Visualization of the estimated parameters of the model .

lmw.rsquared

0.0077217625749193

lmw.fvalue, lmw.f_pvalue

(0.5953116925291129, 0.5526627461285702)

Example 3: improved model for the sleep scores#

We can mix of numerical and categorical predictors

formula3 = "score ~ 1 + alc + weed + exrc + C(loc)"
lm3 = smf.ols(formula3, data=doctors).fit()
lm3.params

Intercept        63.606961
C(loc)[T.urb]    -5.002404
alc              -1.784915
weed             -0.840668
exrc              1.783107
dtype: float64

lm3.rsquared, lm3.aic

(0.8544615790287665, 1092.5985552344712)

from ministats import plot_partreg

plot_partreg(lm3, "alc");

WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.

../_images/46f7cfdd39d40584a8911bc6668eb45936d5d9f89cfbf2e66a13893be929d706.png

lm3.summary()

OLS Regression Results
Dep. Variable:	score	R-squared:	0.854
Model:	OLS	Adj. R-squared:	0.851
Method:	Least Squares	F-statistic:	221.6
Date:	Thu, 19 Dec 2024	Prob (F-statistic):	4.18e-62
Time:	16:25:47	Log-Likelihood:	-541.30
No. Observations:	156	AIC:	1093.
Df Residuals:	151	BIC:	1108.
Df Model:	4
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	63.6070	1.524	41.734	0.000	60.596	66.618
C(loc)[T.urb]	-5.0024	1.401	-3.572	0.000	-7.770	-2.235
alc	-1.7849	0.068	-26.424	0.000	-1.918	-1.651
weed	-0.8407	0.462	-1.821	0.071	-1.753	0.071
exrc	1.7831	0.133	13.400	0.000	1.520	2.046

Omnibus:	4.325	Durbin-Watson:	1.823
Prob(Omnibus):	0.115	Jarque-Bera (JB):	4.038
Skew:	0.279	Prob(JB):	0.133
Kurtosis:	3.556	Cond. No.	46.2

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Compare to the model without `loc` predictor#

formula2 = "score ~ 1 + alc + weed + exrc"
lm2 = smf.ols(formula2, data=doctors).fit()
F, p, _ = lm3.compare_f_test(lm2)
F, p

(12.758115596295623, 0.0004759812308491827)

Everything is a linear model#

One-sample t-test as a linear model#

kombucha = pd.read_csv("../datasets/kombucha.csv")
ksample04 = kombucha[kombucha["batch"]==4]["volume"]
ksample04.mean()

1003.8335

from scipy.stats import ttest_1samp
resk = ttest_1samp(ksample04, popmean=1000)
resk.statistic, resk.pvalue

(3.087703149420272, 0.0037056653503329618)

# # ALT. using the helper function from `ministats`
# from ministats import ttest_mean
# ttest_mean(ksample04, mu0=1000)

# Prepare zero-centered data (volume - 1000)
kdat04 = pd.DataFrame()
kdat04["zcvolume"] = ksample04 - 1000

# Fit linear model with only an intercept term
import statsmodels.formula.api as smf
lmk = smf.ols("zcvolume ~ 1", data=kdat04).fit()
lmk.params

Intercept    3.8335
dtype: float64

lmk.tvalues.iloc[0], lmk.pvalues.iloc[0]

(3.0877031494203044, 0.0037056653503326335)

Two-sample t-test as a linear model#

East vs. West electricity prices#

eprices = pd.read_csv("../datasets/eprices.csv")
pricesW = eprices[eprices["loc"]=="West"]["price"]
pricesE = eprices[eprices["loc"]=="East"]["price"]
pricesW.mean() - pricesE.mean()

3.000000000000001

Two-sample t-test with pooled variance#

from scipy.stats import ttest_ind
rese = ttest_ind(pricesW, pricesE, equal_var=True)
rese.statistic, rese.pvalue

(5.022875513276465, 0.00012497067987678488)

ci_Delta = rese.confidence_interval(confidence_level=0.9)
[ci_Delta.low, ci_Delta.high]

[1.957240525873166, 4.042759474126836]

Linear model approach#

lme = smf.ols("price ~ 1 + C(loc)", data=eprices).fit()
print(lme.params)
lme.tvalues.iloc[1], lme.pvalues.iloc[1]

Intercept         6.155556
C(loc)[T.West]    3.000000
dtype: float64

(5.02287551327646, 0.00012497067987678602)

lme.conf_int(alpha=0.1).iloc[1].values

array([1.95724053, 4.04275947])

Visualization of the comparison between the two groups.

Example 1 (revisited): urban vs. rural doctors#

from scipy.stats import ttest_ind

scoresR = doctors[doctors["loc"]=="rur"]["score"]
scoresU = doctors[doctors["loc"]=="urb"]["score"]

resloc = ttest_ind(scoresU, scoresR, equal_var=True)
resloc.statistic, resloc.pvalue

(-1.9657612140164198, 0.05112460353979369)

lmloc = smf.ols("score ~ 1 + C(loc)", data=doctors).fit()
lmloc.tvalues.iloc[1], lmloc.pvalues.iloc[1]

(-1.9657612140164218, 0.051124603539793465)

One-way ANOVA as a linear model#

from scipy.stats import f_oneway

scoresH = doctors[doctors["work"]=="hos"]["score"]
scoresC = doctors[doctors["work"]=="cli"]["score"]
scoresE = doctors[doctors["work"]=="eld"]["score"]

resw = f_oneway(scoresH, scoresC, scoresE)
resw.statistic, resw.pvalue

(0.5953116925291182, 0.5526627461285608)

lmw = smf.ols("score ~ 1 + C(work)", data=doctors).fit()
print(lmw.params)
lmw.fvalue, lmw.f_pvalue

Intercept         46.545455
C(work)[T.eld]     4.569930
C(work)[T.hos]     2.668831
dtype: float64

(0.5953116925291129, 0.5526627461285702)

from ministats.plots.figures import plot_lm_anova

with plt.rc_context({"figure.figsize":(5,3), "text.usetex":True}):
    ax = plot_lm_anova(doctors, x="work", y="score")
    ax.set_ylim([31,69])    
    sns.move_legend(ax, "lower right")

Error in callback <function _draw_all_if_interactive at 0x7fdbe0cb3d90> (for post_execute), with arguments args (),kwargs {}:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/texmanager.py:247, in TexManager._run_checked_subprocess(cls, command, tex, cwd)
    246 try:
--> 247     report = subprocess.check_output(
    248         command, cwd=cwd if cwd is not None else cls._texcache,
    249         stderr=subprocess.STDOUT)
    250 except FileNotFoundError as exc:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/subprocess.py:421, in check_output(timeout, *popenargs, **kwargs)
    419     kwargs['input'] = empty
--> 421 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
    422            **kwargs).stdout

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/subprocess.py:503, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
    501     kwargs['stderr'] = PIPE
--> 503 with Popen(*popenargs, **kwargs) as process:
    504     try:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/subprocess.py:971, in Popen.__init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask, pipesize)
    968             self.stderr = io.TextIOWrapper(self.stderr,
    969                     encoding=encoding, errors=errors)
--> 971     self._execute_child(args, executable, preexec_fn, close_fds,
    972                         pass_fds, cwd, env,
    973                         startupinfo, creationflags, shell,
    974                         p2cread, p2cwrite,
    975                         c2pread, c2pwrite,
    976                         errread, errwrite,
    977                         restore_signals,
    978                         gid, gids, uid, umask,
    979                         start_new_session)
    980 except:
    981     # Cleanup if the child failed starting.

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/subprocess.py:1863, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, gid, gids, uid, umask, start_new_session)
   1862         err_msg = os.strerror(errno_num)
-> 1863     raise child_exception_type(errno_num, err_msg, err_filename)
   1864 raise child_exception_type(err_msg)

FileNotFoundError: [Errno 2] No such file or directory: 'latex'

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/pyplot.py:279, in _draw_all_if_interactive()
    277 def _draw_all_if_interactive() -> None:
    278     if matplotlib.is_interactive():
--> 279         draw_all()

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/_pylab_helpers.py:131, in Gcf.draw_all(cls, force)
    129 for manager in cls.get_all_fig_managers():
    130     if force or manager.canvas.figure.stale:
--> 131         manager.canvas.draw_idle()

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/backend_bases.py:1891, in FigureCanvasBase.draw_idle(self, *args, **kwargs)
   1889 if not self._is_idle_drawing:
   1890     with self._idle_draw_cntx():
-> 1891         self.draw(*args, **kwargs)

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/backends/backend_agg.py:382, in FigureCanvasAgg.draw(self)
    379 # Acquire a lock on the shared font cache.
    380 with (self.toolbar._wait_cursor_for_draw_cm() if self.toolbar
    381       else nullcontext()):
--> 382     self.figure.draw(self.renderer)
    383     # A GUI class may be need to update a window using this draw, so
    384     # don't forget to call the superclass.
    385     super().draw()

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/artist.py:94, in _finalize_rasterization.<locals>.draw_wrapper(artist, renderer, *args, **kwargs)
     92 @wraps(draw)
     93 def draw_wrapper(artist, renderer, *args, **kwargs):
---> 94     result = draw(artist, renderer, *args, **kwargs)
     95     if renderer._rasterizing:
     96         renderer.stop_rasterizing()

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/artist.py:71, in allow_rasterization.<locals>.draw_wrapper(artist, renderer)
     68     if artist.get_agg_filter() is not None:
     69         renderer.start_filter()
---> 71     return draw(artist, renderer)
     72 finally:
     73     if artist.get_agg_filter() is not None:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/figure.py:3257, in Figure.draw(self, renderer)
   3254             # ValueError can occur when resizing a window.
   3256     self.patch.draw(renderer)
-> 3257     mimage._draw_list_compositing_images(
   3258         renderer, self, artists, self.suppressComposite)
   3260     renderer.close_group('figure')
   3261 finally:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/image.py:134, in _draw_list_compositing_images(renderer, parent, artists, suppress_composite)
    132 if not_composite or not has_images:
    133     for a in artists:
--> 134         a.draw(renderer)
    135 else:
    136     # Composite any adjacent images together
    137     image_group = []

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/artist.py:71, in allow_rasterization.<locals>.draw_wrapper(artist, renderer)
     68     if artist.get_agg_filter() is not None:
     69         renderer.start_filter()
---> 71     return draw(artist, renderer)
     72 finally:
     73     if artist.get_agg_filter() is not None:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/axes/_base.py:3181, in _AxesBase.draw(self, renderer)
   3178 if artists_rasterized:
   3179     _draw_rasterized(self.get_figure(root=True), artists_rasterized, renderer)
-> 3181 mimage._draw_list_compositing_images(
   3182     renderer, self, artists, self.get_figure(root=True).suppressComposite)
   3184 renderer.close_group('axes')
   3185 self.stale = False

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/image.py:134, in _draw_list_compositing_images(renderer, parent, artists, suppress_composite)
    132 if not_composite or not has_images:
    133     for a in artists:
--> 134         a.draw(renderer)
    135 else:
    136     # Composite any adjacent images together
    137     image_group = []

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/artist.py:71, in allow_rasterization.<locals>.draw_wrapper(artist, renderer)
     68     if artist.get_agg_filter() is not None:
     69         renderer.start_filter()
---> 71     return draw(artist, renderer)
     72 finally:
     73     if artist.get_agg_filter() is not None:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/axis.py:1416, in Axis.draw(self, renderer)
   1413 renderer.open_group(__name__, gid=self.get_gid())
   1415 ticks_to_draw = self._update_ticks()
-> 1416 tlb1, tlb2 = self._get_ticklabel_bboxes(ticks_to_draw, renderer)
   1418 for tick in ticks_to_draw:
   1419     tick.draw(renderer)

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/axis.py:1343, in Axis._get_ticklabel_bboxes(self, ticks, renderer)
   1341 if renderer is None:
   1342     renderer = self.get_figure(root=True)._get_renderer()
-> 1343 return ([tick.label1.get_window_extent(renderer)
   1344          for tick in ticks if tick.label1.get_visible()],
   1345         [tick.label2.get_window_extent(renderer)
   1346          for tick in ticks if tick.label2.get_visible()])

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/axis.py:1343, in <listcomp>(.0)
   1341 if renderer is None:
   1342     renderer = self.get_figure(root=True)._get_renderer()
-> 1343 return ([tick.label1.get_window_extent(renderer)
   1344          for tick in ticks if tick.label1.get_visible()],
   1345         [tick.label2.get_window_extent(renderer)
   1346          for tick in ticks if tick.label2.get_visible()])

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/text.py:969, in Text.get_window_extent(self, renderer, dpi)
    964     raise RuntimeError(
    965         "Cannot get window extent of text w/o renderer. You likely "
    966         "want to call 'figure.draw_without_rendering()' first.")
    968 with cbook._setattr_cm(fig, dpi=dpi):
--> 969     bbox, info, descent = self._get_layout(self._renderer)
    970     x, y = self.get_unitless_position()
    971     x, y = self.get_transform().transform((x, y))

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/text.py:373, in Text._get_layout(self, renderer)
    370 ys = []
    372 # Full vertical extent of font, including ascenders and descenders:
--> 373 _, lp_h, lp_d = _get_text_metrics_with_cache(
    374     renderer, "lp", self._fontproperties,
    375     ismath="TeX" if self.get_usetex() else False,
    376     dpi=self.get_figure(root=True).dpi)
    377 min_dy = (lp_h - lp_d) * self._linespacing
    379 for i, line in enumerate(lines):

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/text.py:69, in _get_text_metrics_with_cache(renderer, text, fontprop, ismath, dpi)
     66 """Call ``renderer.get_text_width_height_descent``, caching the results."""
     67 # Cached based on a copy of fontprop so that later in-place mutations of
     68 # the passed-in argument do not mess up the cache.
---> 69 return _get_text_metrics_with_cache_impl(
     70     weakref.ref(renderer), text, fontprop.copy(), ismath, dpi)

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/text.py:77, in _get_text_metrics_with_cache_impl(renderer_ref, text, fontprop, ismath, dpi)
     73 @functools.lru_cache(4096)
     74 def _get_text_metrics_with_cache_impl(
     75         renderer_ref, text, fontprop, ismath, dpi):
     76     # dpi is unused, but participates in cache invalidation (via the renderer).
---> 77     return renderer_ref().get_text_width_height_descent(text, fontprop, ismath)

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/backends/backend_agg.py:211, in RendererAgg.get_text_width_height_descent(self, s, prop, ismath)
    209 _api.check_in_list(["TeX", True, False], ismath=ismath)
    210 if ismath == "TeX":
--> 211     return super().get_text_width_height_descent(s, prop, ismath)
    213 if ismath:
    214     ox, oy, width, height, descent, font_image = \
    215         self.mathtext_parser.parse(s, self.dpi, prop)

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/backend_bases.py:566, in RendererBase.get_text_width_height_descent(self, s, prop, ismath)
    562 fontsize = prop.get_size_in_points()
    564 if ismath == 'TeX':
    565     # todo: handle properties
--> 566     return self.get_texmanager().get_text_width_height_descent(
    567         s, fontsize, renderer=self)
    569 dpi = self.points_to_pixels(72)
    570 if ismath:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/texmanager.py:360, in TexManager.get_text_width_height_descent(cls, tex, fontsize, renderer)
    358 if tex.strip() == '':
    359     return 0, 0, 0
--> 360 dvifile = cls.make_dvi(tex, fontsize)
    361 dpi_fraction = renderer.points_to_pixels(1.) if renderer else 1
    362 with dviread.Dvi(dvifile, 72 * dpi_fraction) as dvi:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/texmanager.py:292, in TexManager.make_dvi(cls, tex, fontsize)
    290     with TemporaryDirectory(dir=cwd) as tmpdir:
    291         tmppath = Path(tmpdir)
--> 292         cls._run_checked_subprocess(
    293             ["latex", "-interaction=nonstopmode", "--halt-on-error",
    294              f"--output-directory={tmppath.name}",
    295              f"{texfile.name}"], tex, cwd=cwd)
    296         (tmppath / Path(dvifile).name).replace(dvifile)
    297 return dvifile

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/texmanager.py:251, in TexManager._run_checked_subprocess(cls, command, tex, cwd)
    247     report = subprocess.check_output(
    248         command, cwd=cwd if cwd is not None else cls._texcache,
    249         stderr=subprocess.STDOUT)
    250 except FileNotFoundError as exc:
--> 251     raise RuntimeError(
    252         f'Failed to process string with tex because {command[0]} '
    253         'could not be found') from exc
    254 except subprocess.CalledProcessError as exc:
    255     raise RuntimeError(
    256         '{prog} was not able to process the following string:\n'
    257         '{tex!r}\n\n'
   (...)
    264             exc=exc.output.decode('utf-8', 'backslashreplace'))
    265         ) from None

RuntimeError: Failed to process string with tex because latex could not be found

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/texmanager.py:247, in TexManager._run_checked_subprocess(cls, command, tex, cwd)
    246 try:
--> 247     report = subprocess.check_output(
    248         command, cwd=cwd if cwd is not None else cls._texcache,
    249         stderr=subprocess.STDOUT)
    250 except FileNotFoundError as exc:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/subprocess.py:421, in check_output(timeout, *popenargs, **kwargs)
    419     kwargs['input'] = empty
--> 421 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
    422            **kwargs).stdout

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/subprocess.py:503, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
    501     kwargs['stderr'] = PIPE
--> 503 with Popen(*popenargs, **kwargs) as process:
    504     try:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/subprocess.py:971, in Popen.__init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask, pipesize)
    968             self.stderr = io.TextIOWrapper(self.stderr,
    969                     encoding=encoding, errors=errors)
--> 971     self._execute_child(args, executable, preexec_fn, close_fds,
    972                         pass_fds, cwd, env,
    973                         startupinfo, creationflags, shell,
    974                         p2cread, p2cwrite,
    975                         c2pread, c2pwrite,
    976                         errread, errwrite,
    977                         restore_signals,
    978                         gid, gids, uid, umask,
    979                         start_new_session)
    980 except:
    981     # Cleanup if the child failed starting.

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/subprocess.py:1863, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, gid, gids, uid, umask, start_new_session)
   1862         err_msg = os.strerror(errno_num)
-> 1863     raise child_exception_type(errno_num, err_msg, err_filename)
   1864 raise child_exception_type(err_msg)

FileNotFoundError: [Errno 2] No such file or directory: 'latex'

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/IPython/core/formatters.py:402, in BaseFormatter.__call__(self, obj)
    400     pass
    401 else:
--> 402     return printer(obj)
    403 # Finally look for special method names
    404 method = get_real_method(obj, self.print_method)

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/IPython/core/pylabtools.py:187, in retina_figure(fig, base64, **kwargs)
    178 def retina_figure(fig, base64=False, **kwargs):
    179     """format a figure as a pixel-doubled (retina) PNG
    180 
    181     If `base64` is True, return base64-encoded str instead of raw bytes
   (...)
    185         base64 argument
    186     """
--> 187     pngdata = print_figure(fig, fmt="retina", base64=False, **kwargs)
    188     # Make sure that retina_figure acts just like print_figure and returns
    189     # None when the figure is empty.
    190     if pngdata is None:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/IPython/core/pylabtools.py:170, in print_figure(fig, fmt, bbox_inches, base64, **kwargs)
    167     from matplotlib.backend_bases import FigureCanvasBase
    168     FigureCanvasBase(fig)
--> 170 fig.canvas.print_figure(bytes_io, **kw)
    171 data = bytes_io.getvalue()
    172 if fmt == 'svg':

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/backend_bases.py:2155, in FigureCanvasBase.print_figure(self, filename, dpi, facecolor, edgecolor, orientation, format, bbox_inches, pad_inches, bbox_extra_artists, backend, **kwargs)
   2152     # we do this instead of `self.figure.draw_without_rendering`
   2153     # so that we can inject the orientation
   2154     with getattr(renderer, "_draw_disabled", nullcontext)():
-> 2155         self.figure.draw(renderer)
   2156 if bbox_inches:
   2157     if bbox_inches == "tight":

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/artist.py:94, in _finalize_rasterization.<locals>.draw_wrapper(artist, renderer, *args, **kwargs)
     92 @wraps(draw)
     93 def draw_wrapper(artist, renderer, *args, **kwargs):
---> 94     result = draw(artist, renderer, *args, **kwargs)
     95     if renderer._rasterizing:
     96         renderer.stop_rasterizing()

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/artist.py:71, in allow_rasterization.<locals>.draw_wrapper(artist, renderer)
     68     if artist.get_agg_filter() is not None:
     69         renderer.start_filter()
---> 71     return draw(artist, renderer)
     72 finally:
     73     if artist.get_agg_filter() is not None:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/figure.py:3257, in Figure.draw(self, renderer)
   3254             # ValueError can occur when resizing a window.
   3256     self.patch.draw(renderer)
-> 3257     mimage._draw_list_compositing_images(
   3258         renderer, self, artists, self.suppressComposite)
   3260     renderer.close_group('figure')
   3261 finally:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/image.py:134, in _draw_list_compositing_images(renderer, parent, artists, suppress_composite)
    132 if not_composite or not has_images:
    133     for a in artists:
--> 134         a.draw(renderer)
    135 else:
    136     # Composite any adjacent images together
    137     image_group = []

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/artist.py:71, in allow_rasterization.<locals>.draw_wrapper(artist, renderer)
     68     if artist.get_agg_filter() is not None:
     69         renderer.start_filter()
---> 71     return draw(artist, renderer)
     72 finally:
     73     if artist.get_agg_filter() is not None:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/axes/_base.py:3181, in _AxesBase.draw(self, renderer)
   3178 if artists_rasterized:
   3179     _draw_rasterized(self.get_figure(root=True), artists_rasterized, renderer)
-> 3181 mimage._draw_list_compositing_images(
   3182     renderer, self, artists, self.get_figure(root=True).suppressComposite)
   3184 renderer.close_group('axes')
   3185 self.stale = False

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/image.py:134, in _draw_list_compositing_images(renderer, parent, artists, suppress_composite)
    132 if not_composite or not has_images:
    133     for a in artists:
--> 134         a.draw(renderer)
    135 else:
    136     # Composite any adjacent images together
    137     image_group = []

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/artist.py:71, in allow_rasterization.<locals>.draw_wrapper(artist, renderer)
     68     if artist.get_agg_filter() is not None:
     69         renderer.start_filter()
---> 71     return draw(artist, renderer)
     72 finally:
     73     if artist.get_agg_filter() is not None:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/axis.py:1416, in Axis.draw(self, renderer)
   1413 renderer.open_group(__name__, gid=self.get_gid())
   1415 ticks_to_draw = self._update_ticks()
-> 1416 tlb1, tlb2 = self._get_ticklabel_bboxes(ticks_to_draw, renderer)
   1418 for tick in ticks_to_draw:
   1419     tick.draw(renderer)

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/axis.py:1343, in Axis._get_ticklabel_bboxes(self, ticks, renderer)
   1341 if renderer is None:
   1342     renderer = self.get_figure(root=True)._get_renderer()
-> 1343 return ([tick.label1.get_window_extent(renderer)
   1344          for tick in ticks if tick.label1.get_visible()],
   1345         [tick.label2.get_window_extent(renderer)
   1346          for tick in ticks if tick.label2.get_visible()])

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/axis.py:1343, in <listcomp>(.0)
   1341 if renderer is None:
   1342     renderer = self.get_figure(root=True)._get_renderer()
-> 1343 return ([tick.label1.get_window_extent(renderer)
   1344          for tick in ticks if tick.label1.get_visible()],
   1345         [tick.label2.get_window_extent(renderer)
   1346          for tick in ticks if tick.label2.get_visible()])

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/text.py:969, in Text.get_window_extent(self, renderer, dpi)
    964     raise RuntimeError(
    965         "Cannot get window extent of text w/o renderer. You likely "
    966         "want to call 'figure.draw_without_rendering()' first.")
    968 with cbook._setattr_cm(fig, dpi=dpi):
--> 969     bbox, info, descent = self._get_layout(self._renderer)
    970     x, y = self.get_unitless_position()
    971     x, y = self.get_transform().transform((x, y))

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/text.py:373, in Text._get_layout(self, renderer)
    370 ys = []
    372 # Full vertical extent of font, including ascenders and descenders:
--> 373 _, lp_h, lp_d = _get_text_metrics_with_cache(
    374     renderer, "lp", self._fontproperties,
    375     ismath="TeX" if self.get_usetex() else False,
    376     dpi=self.get_figure(root=True).dpi)
    377 min_dy = (lp_h - lp_d) * self._linespacing
    379 for i, line in enumerate(lines):

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/text.py:69, in _get_text_metrics_with_cache(renderer, text, fontprop, ismath, dpi)
     66 """Call ``renderer.get_text_width_height_descent``, caching the results."""
     67 # Cached based on a copy of fontprop so that later in-place mutations of
     68 # the passed-in argument do not mess up the cache.
---> 69 return _get_text_metrics_with_cache_impl(
     70     weakref.ref(renderer), text, fontprop.copy(), ismath, dpi)

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/text.py:77, in _get_text_metrics_with_cache_impl(renderer_ref, text, fontprop, ismath, dpi)
     73 @functools.lru_cache(4096)
     74 def _get_text_metrics_with_cache_impl(
     75         renderer_ref, text, fontprop, ismath, dpi):
     76     # dpi is unused, but participates in cache invalidation (via the renderer).
---> 77     return renderer_ref().get_text_width_height_descent(text, fontprop, ismath)

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/backends/backend_agg.py:211, in RendererAgg.get_text_width_height_descent(self, s, prop, ismath)
    209 _api.check_in_list(["TeX", True, False], ismath=ismath)
    210 if ismath == "TeX":
--> 211     return super().get_text_width_height_descent(s, prop, ismath)
    213 if ismath:
    214     ox, oy, width, height, descent, font_image = \
    215         self.mathtext_parser.parse(s, self.dpi, prop)

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/backend_bases.py:566, in RendererBase.get_text_width_height_descent(self, s, prop, ismath)
    562 fontsize = prop.get_size_in_points()
    564 if ismath == 'TeX':
    565     # todo: handle properties
--> 566     return self.get_texmanager().get_text_width_height_descent(
    567         s, fontsize, renderer=self)
    569 dpi = self.points_to_pixels(72)
    570 if ismath:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/texmanager.py:360, in TexManager.get_text_width_height_descent(cls, tex, fontsize, renderer)
    358 if tex.strip() == '':
    359     return 0, 0, 0
--> 360 dvifile = cls.make_dvi(tex, fontsize)
    361 dpi_fraction = renderer.points_to_pixels(1.) if renderer else 1
    362 with dviread.Dvi(dvifile, 72 * dpi_fraction) as dvi:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/texmanager.py:292, in TexManager.make_dvi(cls, tex, fontsize)
    290     with TemporaryDirectory(dir=cwd) as tmpdir:
    291         tmppath = Path(tmpdir)
--> 292         cls._run_checked_subprocess(
    293             ["latex", "-interaction=nonstopmode", "--halt-on-error",
    294              f"--output-directory={tmppath.name}",
    295              f"{texfile.name}"], tex, cwd=cwd)
    296         (tmppath / Path(dvifile).name).replace(dvifile)
    297 return dvifile

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/texmanager.py:251, in TexManager._run_checked_subprocess(cls, command, tex, cwd)
    247     report = subprocess.check_output(
    248         command, cwd=cwd if cwd is not None else cls._texcache,
    249         stderr=subprocess.STDOUT)
    250 except FileNotFoundError as exc:
--> 251     raise RuntimeError(
    252         f'Failed to process string with tex because {command[0]} '
    253         'could not be found') from exc
    254 except subprocess.CalledProcessError as exc:
    255     raise RuntimeError(
    256         '{prog} was not able to process the following string:\n'
    257         '{tex!r}\n\n'
   (...)
    264             exc=exc.output.decode('utf-8', 'backslashreplace'))
    265         ) from None

RuntimeError: Failed to process string with tex because latex could not be found

<Figure size 500x300 with 1 Axes>

# BONUS: print ANOVA table
# import statsmodels.api as sm
# sm.stats.anova_lm(lmw)

Nonparametric tests#

One-sample Wilcoxon signed-rank test#

kombucha = pd.read_csv("../datasets/kombucha.csv")
ksample04 = kombucha[kombucha["batch"]==4]["volume"]

# Zero-centered volumes
zcksample04 = ksample04 - 1000

from scipy.stats import wilcoxon
reswil = wilcoxon(zcksample04)
reswil.pvalue

0.002770629538645153

# Create a new data frame with the signed ranks of the volumes
df_zcsr = pd.DataFrame()
df_zcsr["zcvolume_sr"] = np.sign(zcksample04) * zcksample04.abs().rank()

lmwil = smf.ols("zcvolume_sr ~ 1", data=df_zcsr).fit()
lmwil.pvalues.iloc[0]

0.0022841508459744237

Mann-Whitney U-test#

scoresR = doctors[doctors["loc"]=="rur"]["score"]
scoresU = doctors[doctors["loc"]=="urb"]["score"]

from scipy.stats import mannwhitneyu
resmwu = mannwhitneyu(scoresU, scoresR)
resmwu.pvalue

0.050083369850737636

# Compute the (unsigned) ranks of the scores
doctors["score_r"] = doctors["score"].rank()

# Fit a linear model
lmmwu = smf.ols("score_r ~ 1 + C(loc)", data=doctors).fit()
lmmwu.pvalues.iloc[1]

0.049533887469988734

Kruskal-Wallis analysis of variance by ranks#

from scipy.stats import kruskal
reskw = kruskal(scoresH, scoresC, scoresE)
reskw.pvalue

0.4441051932875236

# Compute the (unsigned) ranks of the scores
doctors["score_r"] = doctors["score"].rank()  

lmkw = smf.ols("score_r ~ 1 + C(work)", data=doctors).fit()
lmkw.f_pvalue

0.44688872149660885

Explanations#

Dummy coding options#

dmatrix("cat", data=catdf)

DesignMatrix with shape (4, 3)
  Intercept  cat[T.B]  cat[T.C]
          1         0         0
          1         1         0
          1         0         1
          1         0         1
  Terms:
    'Intercept' (column 0)
    'cat' (columns 1:3)

dmatrix("C(cat)", data=catdf)

DesignMatrix with shape (4, 3)
  Intercept  C(cat)[T.B]  C(cat)[T.C]
          1            0            0
          1            1            0
          1            0            1
          1            0            1
  Terms:
    'Intercept' (column 0)
    'C(cat)' (columns 1:3)

dmatrix("C(cat, Treatment)", data=catdf)

DesignMatrix with shape (4, 3)
  Intercept  C(cat, Treatment)[T.B]  C(cat, Treatment)[T.C]
          1                       0                       0
          1                       1                       0
          1                       0                       1
          1                       0                       1
  Terms:
    'Intercept' (column 0)
    'C(cat, Treatment)' (columns 1:3)

dmatrix("C(cat, Treatment('B'))", data=catdf)
# ALT. dmatrix("C(cat, Treatment(1))", data=catdf)

DesignMatrix with shape (4, 3)
  Intercept  C(cat, Treatment('B'))[T.A]  C(cat, Treatment('B'))[T.C]
          1                            1                            0
          1                            0                            0
          1                            0                            1
          1                            0                            1
  Terms:
    'Intercept' (column 0)
    "C(cat, Treatment('B'))" (columns 1:3)

Avoiding perfect collinearity#

df_col = pd.DataFrame()
df_col["iscli"] = (doctors["work"] == "cli").astype(int)
df_col["iseld"] = (doctors["work"] == "eld").astype(int)
df_col["ishos"] = (doctors["work"] == "hos").astype(int)
df_col["score"] = doctors["score"]

formula_col = "score ~ 1 + iscli + iseld + ishos"
lm_col = smf.ols(formula_col, data=df_col).fit()
lm_col.params

Intercept    36.718781
iscli         9.826673
iseld        14.396603
ishos        12.495504
dtype: float64

lm_col.condition_number

4594083276761821.0

lm2.params

Intercept    60.452901
alc          -1.800101
weed         -1.021552
exrc          1.768289
dtype: float64

Discussion#

Other coding strategies for categorical variables#

dmatrix("0 + C(cat)", data=catdf)

DesignMatrix with shape (4, 3)
  C(cat)[A]  C(cat)[B]  C(cat)[C]
          1          0          0
          0          1          0
          0          0          1
          0          0          1
  Terms:
    'C(cat)' (columns 0:3)

dmatrix("1 + C(cat, Sum)", data=catdf)

DesignMatrix with shape (4, 3)
  Intercept  C(cat, Sum)[S.A]  C(cat, Sum)[S.B]
          1                 1                 0
          1                 0                 1
          1                -1                -1
          1                -1                -1
  Terms:
    'Intercept' (column 0)
    'C(cat, Sum)' (columns 1:3)

dmatrix("1+C(cat, Diff)", data=catdf)

DesignMatrix with shape (4, 3)
  Intercept  C(cat, Diff)[D.A]  C(cat, Diff)[D.B]
          1           -0.66667           -0.33333
          1            0.33333           -0.33333
          1            0.33333            0.66667
          1            0.33333            0.66667
  Terms:
    'Intercept' (column 0)
    'C(cat, Diff)' (columns 1:3)

dmatrix("C(cat, Helmert)", data=catdf)

DesignMatrix with shape (4, 3)
  Intercept  C(cat, Helmert)[H.B]  C(cat, Helmert)[H.C]
          1                    -1                    -1
          1                     1                    -1
          1                     0                     2
          1                     0                     2
  Terms:
    'Intercept' (column 0)
    'C(cat, Helmert)' (columns 1:3)

Exercises#

EXX: redo comparison of debate and lectures scores#

students = pd.read_csv("../datasets/students.csv")
lmcu = smf.ols("score ~ 1 + C(curriculum)", data=students).fit()
lmcu.tvalues.iloc[1], lmcu.pvalues.iloc[1]

(-1.7197867420465667, 0.10917234443214385)

EXX: model comparison#

formula3w = "score ~ 1 + alc + weed + exrc + C(loc) + C(work)"
lm3w = smf.ols(formula3w, data=doctors).fit()

formula3 = "score ~ 1 + alc + weed + exrc + C(loc)"
lm3 = smf.ols(formula3, data=doctors).fit()

lm3w.compare_f_test(lm3)

(1.5158185269522728, 0.22299549360240853, 2.0)

The result is non-significant which means including the predictor C(work) in the model is not useful.

EXX: run ANOVA test#

# Construct data as a pd.DataFrame
np.random.seed(42)
As = np.random.normal(0, 1, 20)
Bs = np.random.normal(-2, 1, 20)
Cs = np.random.normal(3, 1, 20)
Ds = np.random.normal(1.5, 1, 20)

dfABCD = pd.DataFrame()
dfABCD["group"] = ["A"]*20 + ["B"]*20 + ["C"]*20 + ["D"]*20
dfABCD["value"] = np.concatenate([As, Bs, Cs, Ds])

with plt.rc_context({"figure.figsize":(8,6), "text.usetex":True}):
    ax = plot_lm_anova(dfABCD, x="group", y="value")

from scipy.stats import f_oneway
resABCD = f_oneway(As, Bs, Cs, Ds)
print(resABCD.statistic, resABCD.pvalue)

lmabcd =  smf.ols("value ~ C(group)", data=dfABCD).fit()
lmabcd.fvalue, lmabcd.f_pvalue

107.22963883851017 3.091116115443299e-27

(107.22963883851006, 3.091116115443401e-27)

Error in callback <function _draw_all_if_interactive at 0x7fdbe0cb3d90> (for post_execute), with arguments args (),kwargs {}:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/texmanager.py:247, in TexManager._run_checked_subprocess(cls, command, tex, cwd)
    246 try:
--> 247     report = subprocess.check_output(
    248         command, cwd=cwd if cwd is not None else cls._texcache,
    249         stderr=subprocess.STDOUT)
    250 except FileNotFoundError as exc:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/subprocess.py:421, in check_output(timeout, *popenargs, **kwargs)
    419     kwargs['input'] = empty
--> 421 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
    422            **kwargs).stdout

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/subprocess.py:503, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
    501     kwargs['stderr'] = PIPE
--> 503 with Popen(*popenargs, **kwargs) as process:
    504     try:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/subprocess.py:971, in Popen.__init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask, pipesize)
    968             self.stderr = io.TextIOWrapper(self.stderr,
    969                     encoding=encoding, errors=errors)
--> 971     self._execute_child(args, executable, preexec_fn, close_fds,
    972                         pass_fds, cwd, env,
    973                         startupinfo, creationflags, shell,
    974                         p2cread, p2cwrite,
    975                         c2pread, c2pwrite,
    976                         errread, errwrite,
    977                         restore_signals,
    978                         gid, gids, uid, umask,
    979                         start_new_session)
    980 except:
    981     # Cleanup if the child failed starting.

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/subprocess.py:1863, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, gid, gids, uid, umask, start_new_session)
   1862         err_msg = os.strerror(errno_num)
-> 1863     raise child_exception_type(errno_num, err_msg, err_filename)
   1864 raise child_exception_type(err_msg)

FileNotFoundError: [Errno 2] No such file or directory: 'latex'

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/pyplot.py:279, in _draw_all_if_interactive()
    277 def _draw_all_if_interactive() -> None:
    278     if matplotlib.is_interactive():
--> 279         draw_all()

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/_pylab_helpers.py:131, in Gcf.draw_all(cls, force)
    129 for manager in cls.get_all_fig_managers():
    130     if force or manager.canvas.figure.stale:
--> 131         manager.canvas.draw_idle()

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/backend_bases.py:1891, in FigureCanvasBase.draw_idle(self, *args, **kwargs)
   1889 if not self._is_idle_drawing:
   1890     with self._idle_draw_cntx():
-> 1891         self.draw(*args, **kwargs)

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/backends/backend_agg.py:382, in FigureCanvasAgg.draw(self)
    379 # Acquire a lock on the shared font cache.
    380 with (self.toolbar._wait_cursor_for_draw_cm() if self.toolbar
    381       else nullcontext()):
--> 382     self.figure.draw(self.renderer)
    383     # A GUI class may be need to update a window using this draw, so
    384     # don't forget to call the superclass.
    385     super().draw()

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/artist.py:94, in _finalize_rasterization.<locals>.draw_wrapper(artist, renderer, *args, **kwargs)
     92 @wraps(draw)
     93 def draw_wrapper(artist, renderer, *args, **kwargs):
---> 94     result = draw(artist, renderer, *args, **kwargs)
     95     if renderer._rasterizing:
     96         renderer.stop_rasterizing()

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/artist.py:71, in allow_rasterization.<locals>.draw_wrapper(artist, renderer)
     68     if artist.get_agg_filter() is not None:
     69         renderer.start_filter()
---> 71     return draw(artist, renderer)
     72 finally:
     73     if artist.get_agg_filter() is not None:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/figure.py:3257, in Figure.draw(self, renderer)
   3254             # ValueError can occur when resizing a window.
   3256     self.patch.draw(renderer)
-> 3257     mimage._draw_list_compositing_images(
   3258         renderer, self, artists, self.suppressComposite)
   3260     renderer.close_group('figure')
   3261 finally:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/image.py:134, in _draw_list_compositing_images(renderer, parent, artists, suppress_composite)
    132 if not_composite or not has_images:
    133     for a in artists:
--> 134         a.draw(renderer)
    135 else:
    136     # Composite any adjacent images together
    137     image_group = []

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/artist.py:71, in allow_rasterization.<locals>.draw_wrapper(artist, renderer)
     68     if artist.get_agg_filter() is not None:
     69         renderer.start_filter()
---> 71     return draw(artist, renderer)
     72 finally:
     73     if artist.get_agg_filter() is not None:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/axes/_base.py:3181, in _AxesBase.draw(self, renderer)
   3178 if artists_rasterized:
   3179     _draw_rasterized(self.get_figure(root=True), artists_rasterized, renderer)
-> 3181 mimage._draw_list_compositing_images(
   3182     renderer, self, artists, self.get_figure(root=True).suppressComposite)
   3184 renderer.close_group('axes')
   3185 self.stale = False

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/image.py:134, in _draw_list_compositing_images(renderer, parent, artists, suppress_composite)
    132 if not_composite or not has_images:
    133     for a in artists:
--> 134         a.draw(renderer)
    135 else:
    136     # Composite any adjacent images together
    137     image_group = []

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/artist.py:71, in allow_rasterization.<locals>.draw_wrapper(artist, renderer)
     68     if artist.get_agg_filter() is not None:
     69         renderer.start_filter()
---> 71     return draw(artist, renderer)
     72 finally:
     73     if artist.get_agg_filter() is not None:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/axis.py:1416, in Axis.draw(self, renderer)
   1413 renderer.open_group(__name__, gid=self.get_gid())
   1415 ticks_to_draw = self._update_ticks()
-> 1416 tlb1, tlb2 = self._get_ticklabel_bboxes(ticks_to_draw, renderer)
   1418 for tick in ticks_to_draw:
   1419     tick.draw(renderer)

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/axis.py:1343, in Axis._get_ticklabel_bboxes(self, ticks, renderer)
   1341 if renderer is None:
   1342     renderer = self.get_figure(root=True)._get_renderer()
-> 1343 return ([tick.label1.get_window_extent(renderer)
   1344          for tick in ticks if tick.label1.get_visible()],
   1345         [tick.label2.get_window_extent(renderer)
   1346          for tick in ticks if tick.label2.get_visible()])

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/axis.py:1343, in <listcomp>(.0)
   1341 if renderer is None:
   1342     renderer = self.get_figure(root=True)._get_renderer()
-> 1343 return ([tick.label1.get_window_extent(renderer)
   1344          for tick in ticks if tick.label1.get_visible()],
   1345         [tick.label2.get_window_extent(renderer)
   1346          for tick in ticks if tick.label2.get_visible()])

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/text.py:969, in Text.get_window_extent(self, renderer, dpi)
    964     raise RuntimeError(
    965         "Cannot get window extent of text w/o renderer. You likely "
    966         "want to call 'figure.draw_without_rendering()' first.")
    968 with cbook._setattr_cm(fig, dpi=dpi):
--> 969     bbox, info, descent = self._get_layout(self._renderer)
    970     x, y = self.get_unitless_position()
    971     x, y = self.get_transform().transform((x, y))

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/text.py:373, in Text._get_layout(self, renderer)
    370 ys = []
    372 # Full vertical extent of font, including ascenders and descenders:
--> 373 _, lp_h, lp_d = _get_text_metrics_with_cache(
    374     renderer, "lp", self._fontproperties,
    375     ismath="TeX" if self.get_usetex() else False,
    376     dpi=self.get_figure(root=True).dpi)
    377 min_dy = (lp_h - lp_d) * self._linespacing
    379 for i, line in enumerate(lines):

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/text.py:69, in _get_text_metrics_with_cache(renderer, text, fontprop, ismath, dpi)
     66 """Call ``renderer.get_text_width_height_descent``, caching the results."""
     67 # Cached based on a copy of fontprop so that later in-place mutations of
     68 # the passed-in argument do not mess up the cache.
---> 69 return _get_text_metrics_with_cache_impl(
     70     weakref.ref(renderer), text, fontprop.copy(), ismath, dpi)

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/text.py:77, in _get_text_metrics_with_cache_impl(renderer_ref, text, fontprop, ismath, dpi)
     73 @functools.lru_cache(4096)
     74 def _get_text_metrics_with_cache_impl(
     75         renderer_ref, text, fontprop, ismath, dpi):
     76     # dpi is unused, but participates in cache invalidation (via the renderer).
---> 77     return renderer_ref().get_text_width_height_descent(text, fontprop, ismath)

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/backends/backend_agg.py:211, in RendererAgg.get_text_width_height_descent(self, s, prop, ismath)
    209 _api.check_in_list(["TeX", True, False], ismath=ismath)
    210 if ismath == "TeX":
--> 211     return super().get_text_width_height_descent(s, prop, ismath)
    213 if ismath:
    214     ox, oy, width, height, descent, font_image = \
    215         self.mathtext_parser.parse(s, self.dpi, prop)

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/backend_bases.py:566, in RendererBase.get_text_width_height_descent(self, s, prop, ismath)
    562 fontsize = prop.get_size_in_points()
    564 if ismath == 'TeX':
    565     # todo: handle properties
--> 566     return self.get_texmanager().get_text_width_height_descent(
    567         s, fontsize, renderer=self)
    569 dpi = self.points_to_pixels(72)
    570 if ismath:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/texmanager.py:360, in TexManager.get_text_width_height_descent(cls, tex, fontsize, renderer)
    358 if tex.strip() == '':
    359     return 0, 0, 0
--> 360 dvifile = cls.make_dvi(tex, fontsize)
    361 dpi_fraction = renderer.points_to_pixels(1.) if renderer else 1
    362 with dviread.Dvi(dvifile, 72 * dpi_fraction) as dvi:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/texmanager.py:292, in TexManager.make_dvi(cls, tex, fontsize)
    290     with TemporaryDirectory(dir=cwd) as tmpdir:
    291         tmppath = Path(tmpdir)
--> 292         cls._run_checked_subprocess(
    293             ["latex", "-interaction=nonstopmode", "--halt-on-error",
    294              f"--output-directory={tmppath.name}",
    295              f"{texfile.name}"], tex, cwd=cwd)
    296         (tmppath / Path(dvifile).name).replace(dvifile)
    297 return dvifile

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/texmanager.py:251, in TexManager._run_checked_subprocess(cls, command, tex, cwd)
    247     report = subprocess.check_output(
    248         command, cwd=cwd if cwd is not None else cls._texcache,
    249         stderr=subprocess.STDOUT)
    250 except FileNotFoundError as exc:
--> 251     raise RuntimeError(
    252         f'Failed to process string with tex because {command[0]} '
    253         'could not be found') from exc
    254 except subprocess.CalledProcessError as exc:
    255     raise RuntimeError(
    256         '{prog} was not able to process the following string:\n'
    257         '{tex!r}\n\n'
   (...)
    264             exc=exc.output.decode('utf-8', 'backslashreplace'))
    265         ) from None

RuntimeError: Failed to process string with tex because latex could not be found

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/texmanager.py:247, in TexManager._run_checked_subprocess(cls, command, tex, cwd)
    246 try:
--> 247     report = subprocess.check_output(
    248         command, cwd=cwd if cwd is not None else cls._texcache,
    249         stderr=subprocess.STDOUT)
    250 except FileNotFoundError as exc:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/subprocess.py:421, in check_output(timeout, *popenargs, **kwargs)
    419     kwargs['input'] = empty
--> 421 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
    422            **kwargs).stdout

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/subprocess.py:503, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
    501     kwargs['stderr'] = PIPE
--> 503 with Popen(*popenargs, **kwargs) as process:
    504     try:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/subprocess.py:971, in Popen.__init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask, pipesize)
    968             self.stderr = io.TextIOWrapper(self.stderr,
    969                     encoding=encoding, errors=errors)
--> 971     self._execute_child(args, executable, preexec_fn, close_fds,
    972                         pass_fds, cwd, env,
    973                         startupinfo, creationflags, shell,
    974                         p2cread, p2cwrite,
    975                         c2pread, c2pwrite,
    976                         errread, errwrite,
    977                         restore_signals,
    978                         gid, gids, uid, umask,
    979                         start_new_session)
    980 except:
    981     # Cleanup if the child failed starting.

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/subprocess.py:1863, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, gid, gids, uid, umask, start_new_session)
   1862         err_msg = os.strerror(errno_num)
-> 1863     raise child_exception_type(errno_num, err_msg, err_filename)
   1864 raise child_exception_type(err_msg)

FileNotFoundError: [Errno 2] No such file or directory: 'latex'

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/IPython/core/formatters.py:402, in BaseFormatter.__call__(self, obj)
    400     pass
    401 else:
--> 402     return printer(obj)
    403 # Finally look for special method names
    404 method = get_real_method(obj, self.print_method)

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/IPython/core/pylabtools.py:187, in retina_figure(fig, base64, **kwargs)
    178 def retina_figure(fig, base64=False, **kwargs):
    179     """format a figure as a pixel-doubled (retina) PNG
    180 
    181     If `base64` is True, return base64-encoded str instead of raw bytes
   (...)
    185         base64 argument
    186     """
--> 187     pngdata = print_figure(fig, fmt="retina", base64=False, **kwargs)
    188     # Make sure that retina_figure acts just like print_figure and returns
    189     # None when the figure is empty.
    190     if pngdata is None:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/IPython/core/pylabtools.py:170, in print_figure(fig, fmt, bbox_inches, base64, **kwargs)
    167     from matplotlib.backend_bases import FigureCanvasBase
    168     FigureCanvasBase(fig)
--> 170 fig.canvas.print_figure(bytes_io, **kw)
    171 data = bytes_io.getvalue()
    172 if fmt == 'svg':

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/backend_bases.py:2155, in FigureCanvasBase.print_figure(self, filename, dpi, facecolor, edgecolor, orientation, format, bbox_inches, pad_inches, bbox_extra_artists, backend, **kwargs)
   2152     # we do this instead of `self.figure.draw_without_rendering`
   2153     # so that we can inject the orientation
   2154     with getattr(renderer, "_draw_disabled", nullcontext)():
-> 2155         self.figure.draw(renderer)
   2156 if bbox_inches:
   2157     if bbox_inches == "tight":

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/artist.py:94, in _finalize_rasterization.<locals>.draw_wrapper(artist, renderer, *args, **kwargs)
     92 @wraps(draw)
     93 def draw_wrapper(artist, renderer, *args, **kwargs):
---> 94     result = draw(artist, renderer, *args, **kwargs)
     95     if renderer._rasterizing:
     96         renderer.stop_rasterizing()

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/artist.py:71, in allow_rasterization.<locals>.draw_wrapper(artist, renderer)
     68     if artist.get_agg_filter() is not None:
     69         renderer.start_filter()
---> 71     return draw(artist, renderer)
     72 finally:
     73     if artist.get_agg_filter() is not None:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/figure.py:3257, in Figure.draw(self, renderer)
   3254             # ValueError can occur when resizing a window.
   3256     self.patch.draw(renderer)
-> 3257     mimage._draw_list_compositing_images(
   3258         renderer, self, artists, self.suppressComposite)
   3260     renderer.close_group('figure')
   3261 finally:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/image.py:134, in _draw_list_compositing_images(renderer, parent, artists, suppress_composite)
    132 if not_composite or not has_images:
    133     for a in artists:
--> 134         a.draw(renderer)
    135 else:
    136     # Composite any adjacent images together
    137     image_group = []

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/artist.py:71, in allow_rasterization.<locals>.draw_wrapper(artist, renderer)
     68     if artist.get_agg_filter() is not None:
     69         renderer.start_filter()
---> 71     return draw(artist, renderer)
     72 finally:
     73     if artist.get_agg_filter() is not None:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/axes/_base.py:3181, in _AxesBase.draw(self, renderer)
   3178 if artists_rasterized:
   3179     _draw_rasterized(self.get_figure(root=True), artists_rasterized, renderer)
-> 3181 mimage._draw_list_compositing_images(
   3182     renderer, self, artists, self.get_figure(root=True).suppressComposite)
   3184 renderer.close_group('axes')
   3185 self.stale = False

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/image.py:134, in _draw_list_compositing_images(renderer, parent, artists, suppress_composite)
    132 if not_composite or not has_images:
    133     for a in artists:
--> 134         a.draw(renderer)
    135 else:
    136     # Composite any adjacent images together
    137     image_group = []

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/artist.py:71, in allow_rasterization.<locals>.draw_wrapper(artist, renderer)
     68     if artist.get_agg_filter() is not None:
     69         renderer.start_filter()
---> 71     return draw(artist, renderer)
     72 finally:
     73     if artist.get_agg_filter() is not None:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/axis.py:1416, in Axis.draw(self, renderer)
   1413 renderer.open_group(__name__, gid=self.get_gid())
   1415 ticks_to_draw = self._update_ticks()
-> 1416 tlb1, tlb2 = self._get_ticklabel_bboxes(ticks_to_draw, renderer)
   1418 for tick in ticks_to_draw:
   1419     tick.draw(renderer)

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/axis.py:1343, in Axis._get_ticklabel_bboxes(self, ticks, renderer)
   1341 if renderer is None:
   1342     renderer = self.get_figure(root=True)._get_renderer()
-> 1343 return ([tick.label1.get_window_extent(renderer)
   1344          for tick in ticks if tick.label1.get_visible()],
   1345         [tick.label2.get_window_extent(renderer)
   1346          for tick in ticks if tick.label2.get_visible()])

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/axis.py:1343, in <listcomp>(.0)
   1341 if renderer is None:
   1342     renderer = self.get_figure(root=True)._get_renderer()
-> 1343 return ([tick.label1.get_window_extent(renderer)
   1344          for tick in ticks if tick.label1.get_visible()],
   1345         [tick.label2.get_window_extent(renderer)
   1346          for tick in ticks if tick.label2.get_visible()])

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/text.py:969, in Text.get_window_extent(self, renderer, dpi)
    964     raise RuntimeError(
    965         "Cannot get window extent of text w/o renderer. You likely "
    966         "want to call 'figure.draw_without_rendering()' first.")
    968 with cbook._setattr_cm(fig, dpi=dpi):
--> 969     bbox, info, descent = self._get_layout(self._renderer)
    970     x, y = self.get_unitless_position()
    971     x, y = self.get_transform().transform((x, y))

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/text.py:373, in Text._get_layout(self, renderer)
    370 ys = []
    372 # Full vertical extent of font, including ascenders and descenders:
--> 373 _, lp_h, lp_d = _get_text_metrics_with_cache(
    374     renderer, "lp", self._fontproperties,
    375     ismath="TeX" if self.get_usetex() else False,
    376     dpi=self.get_figure(root=True).dpi)
    377 min_dy = (lp_h - lp_d) * self._linespacing
    379 for i, line in enumerate(lines):

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/text.py:69, in _get_text_metrics_with_cache(renderer, text, fontprop, ismath, dpi)
     66 """Call ``renderer.get_text_width_height_descent``, caching the results."""
     67 # Cached based on a copy of fontprop so that later in-place mutations of
     68 # the passed-in argument do not mess up the cache.
---> 69 return _get_text_metrics_with_cache_impl(
     70     weakref.ref(renderer), text, fontprop.copy(), ismath, dpi)

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/text.py:77, in _get_text_metrics_with_cache_impl(renderer_ref, text, fontprop, ismath, dpi)
     73 @functools.lru_cache(4096)
     74 def _get_text_metrics_with_cache_impl(
     75         renderer_ref, text, fontprop, ismath, dpi):
     76     # dpi is unused, but participates in cache invalidation (via the renderer).
---> 77     return renderer_ref().get_text_width_height_descent(text, fontprop, ismath)

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/backends/backend_agg.py:211, in RendererAgg.get_text_width_height_descent(self, s, prop, ismath)
    209 _api.check_in_list(["TeX", True, False], ismath=ismath)
    210 if ismath == "TeX":
--> 211     return super().get_text_width_height_descent(s, prop, ismath)
    213 if ismath:
    214     ox, oy, width, height, descent, font_image = \
    215         self.mathtext_parser.parse(s, self.dpi, prop)

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/backend_bases.py:566, in RendererBase.get_text_width_height_descent(self, s, prop, ismath)
    562 fontsize = prop.get_size_in_points()
    564 if ismath == 'TeX':
    565     # todo: handle properties
--> 566     return self.get_texmanager().get_text_width_height_descent(
    567         s, fontsize, renderer=self)
    569 dpi = self.points_to_pixels(72)
    570 if ismath:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/texmanager.py:360, in TexManager.get_text_width_height_descent(cls, tex, fontsize, renderer)
    358 if tex.strip() == '':
    359     return 0, 0, 0
--> 360 dvifile = cls.make_dvi(tex, fontsize)
    361 dpi_fraction = renderer.points_to_pixels(1.) if renderer else 1
    362 with dviread.Dvi(dvifile, 72 * dpi_fraction) as dvi:

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/texmanager.py:292, in TexManager.make_dvi(cls, tex, fontsize)
    290     with TemporaryDirectory(dir=cwd) as tmpdir:
    291         tmppath = Path(tmpdir)
--> 292         cls._run_checked_subprocess(
    293             ["latex", "-interaction=nonstopmode", "--halt-on-error",
    294              f"--output-directory={tmppath.name}",
    295              f"{texfile.name}"], tex, cwd=cwd)
    296         (tmppath / Path(dvifile).name).replace(dvifile)
    297 return dvifile

File /opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/matplotlib/texmanager.py:251, in TexManager._run_checked_subprocess(cls, command, tex, cwd)
    247     report = subprocess.check_output(
    248         command, cwd=cwd if cwd is not None else cls._texcache,
    249         stderr=subprocess.STDOUT)
    250 except FileNotFoundError as exc:
--> 251     raise RuntimeError(
    252         f'Failed to process string with tex because {command[0]} '
    253         'could not be found') from exc
    254 except subprocess.CalledProcessError as exc:
    255     raise RuntimeError(
    256         '{prog} was not able to process the following string:\n'
    257         '{tex!r}\n\n'
   (...)
    264             exc=exc.output.decode('utf-8', 'backslashreplace'))
    265         ) from None

RuntimeError: Failed to process string with tex because latex could not be found

<Figure size 800x600 with 1 Axes>

Section 4.4 — Regression with categorical predictors

Contents

Section 4.4 — Regression with categorical predictors#

Notebook setup#

Definitions#

Design matrix for linear model `lm1`#

Design matrix for linear model `lm2`#

Example 1: binary predictor variable#

Encoding#

Dummy coding for categorical predictors#

Example 2: predictors with three levels#

Example 3: improved model for the sleep scores#

Compare to the model without `loc` predictor#

Everything is a linear model#

One-sample t-test as a linear model#

Two-sample t-test as a linear model#

East vs. West electricity prices#

Two-sample t-test with pooled variance#

Linear model approach#

Example 1 (revisited): urban vs. rural doctors#

One-way ANOVA as a linear model#

Nonparametric tests#

One-sample Wilcoxon signed-rank test#

Mann-Whitney U-test#

Kruskal-Wallis analysis of variance by ranks#

Explanations#

Dummy coding options#

Avoiding perfect collinearity#

Discussion#

Other coding strategies for categorical variables#

Exercises#

EXX: redo comparison of debate and lectures scores#

EXX: model comparison#

EXX: run ANOVA test#

Links#

Section 4.4 — Regression with categorical predictors

Contents

Section 4.4 — Regression with categorical predictors#

Notebook setup#

Definitions#

Design matrix for linear model lm1#

Design matrix for linear model lm2#

Example 1: binary predictor variable#

Encoding#

Dummy coding for categorical predictors#

Example 2: predictors with three levels#

Example 3: improved model for the sleep scores#

Compare to the model without loc predictor#

Everything is a linear model#

One-sample t-test as a linear model#

Two-sample t-test as a linear model#

East vs. West electricity prices#

Two-sample t-test with pooled variance#

Linear model approach#

Example 1 (revisited): urban vs. rural doctors#

One-way ANOVA as a linear model#

Nonparametric tests#

One-sample Wilcoxon signed-rank test#

Mann-Whitney U-test#

Kruskal-Wallis analysis of variance by ranks#

Explanations#

Dummy coding options#

Avoiding perfect collinearity#

Discussion#

Other coding strategies for categorical variables#

Exercises#

EXX: redo comparison of debate and lectures scores#

EXX: model comparison#

EXX: run ANOVA test#

Links#

Design matrix for linear model `lm1`#

Design matrix for linear model `lm2`#

Compare to the model without `loc` predictor#