Customizing Matplotlib
Matplotlib is a powerful graphics library for Python that allows the user to customize almost every aspect of a figure.
I’ll first cover some useful customizations for individual plots and then discuss several methods of building templates that are reusable across multiple projects.
But Why?
Matplotlib’s defaults are probably sufficient for exploratory data analysis, but if you’re designing figures for publication or presentation, it’s worth understanding the rich framework for customization that Matplotlib provides.
What About Seaborn?
You can get a lot of mileage out of seaborn, a high-level wrapper for Matplotlib designed for statistical visualization. Seaborn shines with exploratory data analysis and is especially helpful when building multi-plot grids for exploring relationships between several variables.
However, even with seaborn’s unified aesthetic and sane defaults, plots made with seaborn often still require customization. Fortunately, since seaborn is built on top of Matplotlib, most Matplotlib customizations also apply to seaborn.
Customizing Individual Plots
First import Matplotlib’s pyplot
module, as well as NumPy to generate the sample data for the figures.
from matplotlib import pyplot as plt
import numpy as np
Note: while Matplotlib’s pyplot
module is convenient for quickly prototyping figures, I’ll use Matplotlib’s object-oriented interface throughout this post. (Technically, I’m still using the pyplot
module to create the figure canvases, but all plot commands will be via the object-oriented interface.)
Setting Tick Marks and Tick Labels
For the first plot, I’ll use several trigonometric functions as sample data.
π = np.pi
x = 2 * π * np.linspace(-1, 1, 1000)
_, ax = plt.subplots()
ax.plot(x, np.cos(x))
ax.plot(x, np.sin(x))
ax.plot(x, np.cos(x - π))
Using the defaults from the current version of Matplotlib (v. 3.0.2 at the time of this writing) yields the following figure:
Since there is more than one line, let’s add a legend. Here, I’m using Matplotlib’s included miniature TeX distribution for the mathematical symbols. (Matplotlib can also render text elements with an external LaTeX distribution; more on LaTeX below.)
ax.plot(x, np.cos(x), label=r"$ \cos \left( x \right) $")
ax.plot(x, np.sin(x), label=r"$ \sin \left( x \right) $")
ax.plot(x, np.cos(x - π), label=r"$ \cos \left( x - \pi \right) $")
ax.legend(loc="upper right")
It’s often helpful to set the ticks and labels in terms of \(\pi\) when dealing with trigonometric functions. (You can define a custom tick formatter, but for this plot it’s simpler and more readable to list the ticks and labels explicitly.)
xticks = π * np.arange(-2, 3, 1)
ax.set_xticks(xticks)
xlabels = [r"$-2 \pi$", r"$- \pi$", "0", r"$\pi$", r"$2 \pi$"]
ax.set_xticklabels(xlabels)
ax.set_yticks([-1, 0, 1])
I’ll come back to this plot when I introduce style sheets.
Adding Fills
Fills are useful for (among other things) visualizing where multiple distributions overlap.
Let’s start with several Gaussian distributions for illustration. The probability density for a Gaussian distribution is given by
\[p(x) = \frac{1}{ \sqrt{ 2 \pi \sigma^2 } } e^{ - \frac{ \left( x - \mu \right)^2 }{ 2 \sigma^2 } }\]where \(\mu\) is the mean, and \(\sigma\) is the standard deviation.
def gaussian(x, μ=0, σ=1, normalized=True):
u = (x - μ) / σ
g = np.exp(-u**2 / 2)
if normalized:
g /= np.sqrt(2 * π * σ**2)
return g
Let’s generate 3 Gaussian distributions for the plot.
z = np.linspace(-10, 10, 1000)
μ0, μ1, μ2 = -4, 0, 2
y0 = gaussian(z, μ=μ0, σ=1.25)
y1 = gaussian(z, μ=μ1, σ=1.0)
y2 = gaussian(z, μ=μ2, σ=1.5)
You can specify colors from Matplotlib’s color cycler with a “CN” color specification. Since this plot only has a few lines, it’s simpler to explicitly match the fill color to the color of the associated line. (If you have more than a few lines in your plot, iterate over the lines and use line.get_color()
to set the fill color.)
_, ax = plt.subplots()
ax.plot(z, y0, label=r"$G_0$", color="C0")
ax.plot(z, y1, label=r"$G_1$", color="C1")
ax.plot(z, y2, label=r"$G_2$", color="C2")
ax.legend(loc="upper right")
Next, place tick marks at the mean of each distribution.
ax.set_xticks([μ0, μ1, μ2])
ax.set_xticklabels([r"$\mu_0$", r"$\mu_1$", r"$\mu_2$"])
ax.set_yticks([])
Now shade the area between each distribution and the x axis.
ax.fill_between(z, y0, 0, color="C0", alpha=0.2)
ax.fill_between(z, y1, 0, color="C1", alpha=0.2)
ax.fill_between(z, y2, 0, color="C2", alpha=0.2)
Finally, remove unnecessary axis spines.
ax.spines["top"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["right"].set_visible(False)
Minimalist Bar Plots
Suppose we have collected data on the calorie content per 100 g of various foods.
data = {
"French Fries": 607,
"Potato Chips": 542,
"Bacon": 533,
"Pizza": 296,
"Chili Dog": 260
}
Let’s start with Matplotlib’s default bar chart.
_, ax = plt.subplots()
ax.bar(
x=list(data.keys()),
height=list(data.values())
)
The default bar chart isn’t particularly impressive, but we can wrap Matplotlib’s bar chart function to emulate the style of Darkhorse Analytics’ minimalist bar chart 1.
def pretty_barplot(values, labels,
width_fraction=0.45, label_offset=0.1,
bar_color="gray", label_color="white"):
"""d"""
_, ax = plt.subplots()
rectangles = ax.bar(
x=labels,
height=values,
width=width_fraction,
color=bar_color
)
# calculate label offsets in terms of the largest bar
y_offset = label_offset * max(values)
for rectangle in rectangles:
# get dimensions of each rectangle
x0 = rectangle.get_x()
w0 = rectangle.get_width()
h0 = rectangle.get_height()
# label each rectangle with its value
ax.text(
x=x0 + w0 / 2,
y=h0 - y_offset,
s=h0,
color=label_color,
horizontalalignment="center"
)
# remove y axis ticks, since we're labeling the bars directly
ax.set_yticks([])
# remove the tick marks on the x axis
ax.tick_params(bottom=False)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["right"].set_visible(False)
return ax
We can then call our bar chart function on the above data with our specified colors (mostly gray with bacon highlighted in red).
colors = ["gray"] * len(data)
colors[2] = "maroon"
pretty_barplot(
values=list(data.values()),
labels=list(data.keys()),
bar_color=colors
)
ax.title("Calories per 100 g")
Modifying the Defaults with rcParams
So far, our changes have been applied one plot at a time. Modifying Matplotlib’s global rc parameters allows us to easily apply changes to multiple plots.
The rcParams
can be modified dynamically. For example, you can set the default line style to be black with thickness 2 points.
import matplotlib
matplotlib.rcParams["lines.linewidth"] = 2
matplotlib.rcParams["lines.color"] = "k"
Matplotlib also provides an rc
helper function that allows you to change multiple attributes at once.
matplotlib.rc("lines", linewidth=2, color="k")
Finally, if things go horribly wrong, you can always use
matplotlib.rcdefaults()
to restore Matplotlib’s default settings.
The Configuration Directory
Matplotlib provides two methods of storing user-defined customizations - style sheets and the matplotlibrc
file - and they both live in Matplotlib’s configuration directory.
The location of Matplotlib’s configuration directory on your system can be determined with
matplotlib.get_configdir()
By convention, the matplotlibrc
file is stored in the root of the configuration directory, and user-defined style sheets are placed in a subdirectory called stylelib
(you may have to create it).
Using Style Sheets
Style sheets are useful for applying the same customizations to multiple plots.
Included Styles
Matplotlib comes with several pre-defined style sheets.
sorted(plt.style.available, key=str.lower)
will display an alphabetized list of available styles on your system.
For example, the ggplot style is designed to be visually similar to Hadley Wickham’s ggplot2 package from R (a popular open source statistical programming language). Placing
plt.style.use("ggplot")
at the beginning of a script will render all of the plots in the script with the ggplot style.
Rendering the trigonometric plot from the first example with the ggplot
style produces
Everything in Context
For more precise control, Matplotlib provides a context manager.
with plt.style.context(("ggplot")):
...
This allows you to apply different styles to specific plots within the same script or notebook.
Emulating XKCD
You can even emulate the visual style of Randall Monroe’s XKCD webcomic.
with plt.xkcd():
...
Rendering the trigonometric plot from the first example with the XKCD style produces
These sketch-style plots are not just for amusement. Chris Stuccio notes that the hand-drawn aesthetic of this style conveys a sense of the uncertainty in the model to a non-technical audience who might otherwise take the results as exact.
Writing Your Own Style Sheets
Recall that styles are invoked with plt.style.use(<stylename>)
.
Style files in Matplotlib have the form <style_name>.mplstyle
. If you place your style files in the <mpl_configdir>/stylelib
directory, Matplotlib will load them at runtime. (You can also pass the full file path or URL to the style sheet.)
Style sheets can be chained together, e.g.
plt.style.use([style1, style2])
This means you can have one style file to set the margins, another to define line properties, etc.
Rendering All Plot Elements with LaTeX
Let’s create an example style sheet. The following configuration will render all figure text (legend, axes labels, tick marks, etc.) with LaTeX’s Computer Modern font. This is useful for producing publication-quality figures.
# Use LaTeX's Computer Modern font for everything
font.family : 'serif'
font.serif : 'Computer Modern'
text.usetex : True
If you place the above code block in a file called LaTeX_everywhere.mplstyle
(or similar) in the stylelib
directory (see above), you can then invoke it with
with plt.style.context("LaTeX_everywhere"):
...
Rendering the trigonometric plot from the first example with our custom LaTeX style produces
Note that text.usetex : True
requires a full LaTeX distribution to be installed on your system. (If you use macOS, install MacTeX; Windows and Linux users have several options.)
The matplotlibrc File
Matplotlib takes the first matplotlibrc
file it finds. To display the path to the current matplotlibrc
file, use
matplotlib.matplotlib_fname()
Matplotlib also provides a template matplotlibrc
file. This is incredibly useful, not just for writing your own style sheets or matplotlibrc
file, but also for understanding what customizations are available for individual plots.
Meta example: the matplotlibrc
file I used to render the figures in this post was
savefig.format: svg
savefig.transparent: True
Then to render each figure as a transparent SVG, I could just use
plt.savefig("assets/figure1")
rather than
plt.savefig("assets/figure1.svg", transparent=True)
for each figure.
Recommendations
If you find yourself frequently overriding the default graphics backend or file format, specify your preferred backend or format in the matplotlibrc
file.
If you find yourself making the same kinds of adjustments to multiple plots, it’s probably worth creating a style sheet.
Finally, remember that styles can be chained together, so it’s a good idea to make each style sheet atomic so your figure design can be modular.
-
From Darkhorse Analytics’ marvelous Data Looks Better Naked series, which covers bar charts, tables, pie charts (tl;dr - just convert pie charts to bar charts), and chloropleth maps. ↩