Typography

Typography #

~15 minute read


If you love Comic Sans, you don’t know much about typography.

But if you hate Comic Sans, you really don’t know much about typography either.

  • Vincent Conare

I think the above quote encapsulates the selection and usage of font quite nicely. Basically, there is a time and place for each individual font, and understanding when and where to use them is part of using font well.

Below, we are going to discuss typography and font selection. However, much of this page is really devoted to helping you understanding why the conventions are the way they are. If you don’t care to read all this, then here is the TLDR: use a standard sans-serif font and you will create a data visualization that is at least in line with what most people currently expect.

If you want to know why this is the right advice, let’s take a walk through history.

Why Sans-Serif? A Brief History of Typography in Data Visualizations #

Before discussing how to choose a font, we should understand why data visualizations look the way they do…

The earliest data visualizations were hand drawn. The “fonts” were handwritten scripts. There were really no other option at the time, and so making a data visualization with this very old handwritten style will make the data visualization also look very old. One thing to note about these plots is that the lines and markers of the plots were also hand-drawn, and so there are variations in line widths for the data as well.

A plot of astronomical bodies over time, created in 950 CE. The image is hand drawn, and so all the strokes—both for data and text—have huge amounts of variation.
*Data visualization of the planetary movements, from 950CE. Image taken from https://www.datavis.ca

As the production of books moved away from using scribes to using printers, inclusion of graphics was done using engravings, where the data visualization would be engraved into a metal plate or wood block and then printed into the page. The inclusion of graphics was relatively rare, and one could invest a large amount of time into making them. Thus, one still sees flowing scripts, in part as artistic flourish and part to mimic the form of the data visualizations that came before them.

A combined bar chart and line plot from 1786 CE. The lines are regular, but the labeling still mimics that of handwritting.
Data visualization of the price of wheat versus a worker’s wages. The visualization was made by William Playfair and was printed from an engraving in 1786CE. Image taken from https://www.datavis.ca

In the 20th century, scientific journals began to include more and more graphics, enabled by the ability to reproduce photographs in printing. In the age before widespread computers, one still needed to draw out the data, and then take a picture of this drawing. At the time, most places doing large-scale science (Universities and National laboratories) employed full-time drafts-people to produce these figures. To save time and increase uniform appearance, stencils were used to write out the letters. In order to not need to worry about different kerning between all letter pairs, these stencil sets were largely based around monospaced fonts. Many of them also did not have serifs.

A scatter plot made in 1913. The lines are uniform, and now the writing is uniform and clearly stenciled.
Plot of “atomic number” versus the square root of x-ray frequency for the elements known in 1913 CE. “Atomic number” is in quotes here, because it was not yet accepted, though this plot played a large roll in demonstrating its value. Image taken from https://www.researchgate.net/figure/Moseleys-graph-of-frequencies-in-X-ray-spectra-of-chemical-elements-Source-Moseley_fig13_7923211

The other option was to write out the labels using a typewriter, which typically used monospaced serif fonts.

A bar chart produced in 1927 CE, The labeling is done using conventional typewriter fonts, and so is a monospaced serif font.
An image of guild income for different products, created in ca. 1927. Image is taken from https://www.datavis.ca

This is why “old” (i.e., 20th century) data visualizations often use monospace (serif or sans-serif) fonts. Using such fonts in your own data visualizations will give them a “retro” feel because of this. However, if you want to completely reproduce the look, then pay attention to detail! These plots were either hand-drawn using pen or pencil or produced using a typewriter. For the former, pens and pencils have rounded points and so lines do not end with square truncation, but rounded edges. Similarly, the capillary forces of the ink on the physical cutout letters of a typewriter blocks leads to rounded edges on the lettering. Additionally, there can be some very subtle variations in line width and opacity throughout. If you want to completely mimic the old draftsperson style, make sure to pay attention to these details. If you use fonts with crisp edges, it won’t look like a mid-century graph, but something closer to the millennium. The reason for this is explained next.

Towards the end of the 20th, the production of data visualizations became computerized, and drafts people (and practicing scientists) starting making plots on computers, printing them out, and then photographing these printouts to be sent to the journal. During this time, sans-serif monospaced fonts continued to be used, I speculate for two reasons. First, computer displays were low resolution and so the small serif features were hard to reproduce on the display. This means that sans-serif fonts were used, but (second) the reason that monospaced fonts were used was that people were trying to emulate the products of the drafts-people before them. The difference here, of course, is that one could now produce lines with square edges and the printing by machine could be done with more uniform lines. Thus, data visualizations with monospace fonts on graphics with square-edged features and uniform strokes gives a feeling of these more modern, but still retro data visualizations.

A scatterplot made completely in a computer in 1975.  Since the image is completely digital, there is no artifacts of ink on paper, and the strokes of the font are uniform with sharp features.
Plot of information about the draft lottery in the USA, made on a computer in 1975 CE by William Cleveland. Image from https://www.datavis.ca

Finally, we get to the modern era, defined by higher resolution computer displays and journals accepting the data file of plots, rather than photographs. Additionally, the production of data visualizations was largely democratized, as programs like Sigmaplot, Origin, and even Excel enabled practicing scientists to make data visualizations with the press of a button. This combination meant that more fonts could be seen on the screen and (if set to a default) would be commonly used and seen. As computer display resolution improved, more standard fonts could be used, but there was still a time that sans-serif fonts were better looking. For a long time “Arial” was the default option in Excel, and it is probably no surprise that Arial (or other similar sans-serif fonts) is a font that is ‘standard’ looking on modern data visualizations.

A scatterplot made in 2005. This was done using displays very similar to modern displays, and so the fonts are all high resolution.
A plot made on a computer using a high resolution display in 2005 CE. Part of the gapminder project at www.gapminder.org

The takeaway from this history is that there was almost no period where non-monospace serif fonts were the standard. They look out of place. For this reason using a common sans serif font will largely be a reasonable choice. Only deviate from this when you have a compelling and well-thought out reason to do so.

A data visualization produced in 2025. It features non-monospaced serif fonts.
Modern data visualization taken from The Journal of the American Chemical Society. Even though modern desktops can make advanced graphics, such as those shown to the right, a “standard” data visualization still has sans-serif fonts.

Special considerations for data visualizations #

Of course, there are still a large number of sans serif fonts to choose from. Here is a practical checklist for choosing a good one.

1. Prioritize Tabular, Lining Numbers #

In typography, numbers are called “figures.” There are two different ways to broadly classify them, based on the horizontal distribution of the numbers and the vertical stretch of the numbers.

For horizontal distribution the classifications are:

  • Proportional: The ‘1’ is narrower than the ‘8’. This looks good in a paragraph but can lead to significant alignment issues in data visualizations (see figure below).
  • Tabular: Every number glyph occupies the same horizontal space. This is essential for aligning vertically numbers (for instance, as encountered on a $y$-axis).

Always choose a font with tabular numbers.

Two plots of the same data.  The only difference is that the plot on the left used a font with proportional numbers, while the plot on the right used tabular numbers. The numbers on the y-axis are aligned on the right , but not on the left plot.
The plot on the left uses proportional figures (numbers), while that on the right uses tabular figures. You can see how the numbers don’t perfectly align on the left, but they do on the right. Note: the numbers are probably a bit larger than I would normally use, and there are certainly more tick marks, but this is done to help illustrate the point regarding tabular numbers.

For vertical stretch of the numbers:

  • Lining numbers. Lining numbers have no ascending or descending features. Thus, the glyphs for numbers all stretch from the cap height to the baseline. This makes things look a bit cleaner.
  • Old-style numbers are not this way, and many numbers (such as a “9”) will break these lines. Most modern fonts use lining numbers—a result of the web and the fact that Google used to not display old style numbers. However, this is probably a fine choice for data visualizations as well.

2. Check your “Greeks and Vees” #

It is worth paying particular attention to the glyphs for Greek letters and Latin letters of similar shape. For instance, the Greek letter $\nu$ can look very similar to the lower case “v”, depending on font. In the popular font Arial, they are essentially identical: v and ν. Can you tell which is which? On the other had some fonts do a good job distinguishing them. In Segoe UI, they are distinct: v and ν.

There are other examples as well, for instance, one might wish to ensure that the lattin I and the Greek iota are distinct.

3. Use the Real Minus Sign #

Not all horizontal lines are created equal. There are a large number of typographic symbols that are horizontal lines, each with their own use:

  • The “minus” key on your keyboard is actually a hyphen (‐). The hyphen is for joining words.
  • The minus sign (−) is the correct symbol for the subtraction operation or indicating a negative number. It is designed to be at the same height as the plus symbol (+).
  • The figure dash (‒) is the horizontal line that is used to indicate ranges of numbers or separating numbers (as in a phone number). For tabular figures, it is the same width as the individual numbers and so is useful when attempting to align numbers vertically.
  • The en-dash (–) is used to indicate parenthetical statements in text, and is largely used in countries speaking British English.
  • the em-dash (—) is also used to indicate parenthetical statements, but is largely used in countries speaking American English.

Of course, though there are many horizontal lines, there is only a single key for them on most keyboards. So how do you get the correct symbol? There are many ways to do this, but the simplest is to Google these and then copy the character and paste it where you want, or come to this website and copy the correct symbol above.

Advanced Deep Dive: the Language of Type #

If you are enjoying the dive into type, and want to go deeper, this section is for you. Here is the terminology you will need:

  • character refers to the abstraction of the elements of written language. For instance, the idea of the lowercase “a” or the idea of the symbol ampersand “&”.
  • glyph refers to the individual rendering of a character. This is the real-world instantiation of the abstract character. There can be multiple forms of these glyphs for the same character. For instance, the lower case “a” can be represented as one-story (a), two-story (a), serif (a), sans-serif (a), bold (a), and italic (a), to name some options. Even within a single such style, glyphs can change drastically. Examples of sans-serif ampersands are: &, &, &, &, &, &, &, and &.
  • font is a collection of glyphs used to represent language. Thus Arial is one example of a font.
  • typeface (sometimes called “type”) is the collection of a series of related fonts. The Arial typeface contains the base Arial font, but also Arial Black, Arial Italics, Arial condensed, etc.

The Anatomy of Type #

There are a huge number of different attributes of font, all of which contribute to realizing this contrast, though we will ultimately focus on a few. The image below illustrates a subset of these attributes.

An image diagraming many aspects of font. Shown are examples taken from the Spectra font, where the heights of elements and the names of elements are shown.

Before we get to discussing contrast between fonts, let us think a bit about some of these attributes and how they lead to contrast between Fonts.

  • baseline is the line on which almost all of the lower case and upper case letters sit. This is the strongest horizontal line in text, and so it is important to pay attention to this when thinking about alignment. Some glyphs have components that extend below the baseline. These aspects are called “descenders”.
  • cap-height is the height at which most of the capital letters end. Some glyphs have aspects that extend above this height. These aspects are called “ascenders”.
  • point is $\frac{1}{72}$ of an inch, and is the base unit on which font sizes are measured. This is the “point” in “12-point Times New Roman” that many of us had to use when producing high school papers in.
  • font size is the size (in points) of a square that is able to encapsulate the glyph of capital letter “M” for a font. This is called the “em box”. Why is this the measure of size? Well, it all comes back to the days of physical movable type. Each glyph was on its own type block, and the one for the capital “M” was often the largest block. It is just slightly larger than the actual letter M, because there needed to be a bit of space on each side.
  • x-height is the height of the lower case ‘x’, which is also the height of most lower case letters. Two fonts that have very different x-heights will look different to us. Calibri has a low x-height, while Arial has a relatively high x-height.
  • weight is the size of the line (also called “stroke”) used to draw the glyph. Century Gothic has a relatively light stroke width, while the stroke width for Impact is relatively heavy.
  • stroke contrast refers to the degree to which the weight (or “stroke width”) varies throughout the glyph. High stroke contrast describes large changes, while low stroke contrast implies little (or no) variation. For instance, Arial has no stroke contrast, while Bondi has a relatively high stroke contrast.
  • stress refers to orientation of the stroke contrast. It is most easily seen by considering the letter “o”. For fonts with non-zero stroke contrast, the orientation of the stress is found by drawing a line through the two regions of thinnest stroke. Fonts with no stroke contrast have no stress. Garamond has a vertical stress, while the stress in Adobe Devangari is slanted. When stress is present, we are most used to seeing stress that is somewhat vertical. Fonts, such as Playbill have horizontally running stress and look strange (or old-western) to us.
  • serifs are the “feet” at the bottom of some fonts. A common serif font is Times New Roman, while Calibri is a common sans-serif font.
    250
  • spacing describes the average space that each letter occupies. Most fonts have proportional spacing, so that the letter “i” occupies less horizontal space than does the capital “M”. However, some fonts are “monospace” and each letter and number occupies the exact same space. This makes it easy to align text vertically between lines. Courier New is a common monospace font, while Arial is a proportional font.
  • kerning refers to the the spacing between individual letter pairings. For instance, in the image showing the many aspects of fonts, the “n” and “l” glyphs are separated by a small amount of space, but the “v” and the “n” are separated by a smaller amount of space, while the A" and the “v” has negative space between them (the “A” runs a bit below the “v”)! Different fonts can treat this separation—or kerning—differently. Monospace versus non-monospaced fonts are an extreme example of these differences, with monospace fonts, like Courier New, having uniform kerning, while the proportional font, Arial, has differences in the spacing between letters.
  • letter geometry describes how some fonts treat the fundamental shape of letters. Some fonts treat the “O” as oval, while others treat it as a circle, and others as square. Century Gothic is a font where the letter “O” is a circle,Agency is a font where it is sort of square-ish, and Tahoma is one where it is more oval.

There are, of course, other aspects of font that can give rise to contrast, but the above are the most obvious, from my perspective.

Concluding thoughts and resources #

That is all for typography. If you want to dig deeper, there are a number of interesting sources on the history and usage of type. My recommendations are:

But for now, if you simply select a sans serif font with tabular numbers you will have a nice looking plot. And if you can also ensure you use minus signs and figure dashes, you will be a step above the majority of people.