Typography #

If you love Comic Sans, you don’t know much about typography. But if you hate Comic Sans, you really don’t know much about typography either.

Vincent Conare

I think the above quote encapsulates the selection and usage of font quite nicely. Basically, there is a time and place for each individual font, and understanding when and where to use them is part of being a good designer.

Below, we are going to discuss typography and font selection. However, much of this page is really devoted to helping you understanding why the conventions are the way they are. If you don’t care to read all this, then here is the TLDR: use a standard sans-serif font and you will create a data visualization that is at least in line with what most people currently expect.

If you want to know more about why this is, and also a few more details on how to use font carefully, read on. However, also know that the point of this page is not to get you to be an expert in font—which is a lifetime pursuit. However, I do think with a bit of reading, one can learn to understand why fonts look different, and so understand how to achieve [consistency and contrast] when using fonts. Additionally, one can begin to understand the conventions used in data visualizations, and select fonts to meet those.

Terminology #

Before diving in deep to typography, it is worth a bit of disambiguation of terms typically used:

character refers to the abstraction of the elements of written language. For instance, the idea of the lowercase “a” or the idea of the symbol ampersand “&”.
glyph refers to the individual rendering of a character. This is the real-world instantiation of the abstract character. There can be multiple forms of these glyphs for the same character. For instance, the lower case “a” can be represented as one-story (a), two-story (a), serif (a), sans-serif (a), bold (a), and italic (a), to name some options. Even within a single such style, glyphs can change drastically. Examples of sans-serif ampersands are: &, &, &, &, &, &, &, and &.
font is the collection of glyphs. Thus Arial is one example of a font.
typeface (sometimes called “type”) is the collection of a a series of related fonts. The Arial typeface contains the base Arial font, but also Arial Black, Arial Italics, Arial condensed, etc.

Aspects of font used to realize contrast between fonts #

There are a huge number of different part of font, all of which contribute to realizing this contrast, though we will ultimately focus on a few.

Before we get to discussing contrast between fonts, let us think a bit about other common aspects.

baseline is the line on which almost all of the lower case and upper case letters sit. This is the strongest horizontal line in text, and so it is important to pay attention to this when thinking about [alignment]. Some glyphs have components that extend below the baseline. These aspects are called “descenders”.
cap-height is the height at which most of the capital letters end. Some glyphs have aspects that extend above this height. These aspects are called “ascenders”.
point is $\frac{1}{72}$ of an inch, and is the base unit on which font sizes are measured. This is the “point” in “12-point Times New Roman”.
font size is the size (in points) of a square that is able to encapsulate the capital letter “M”. This is called the “em box”. Why this? Well, it all comes back to the days of physical movable type. Each glyph was on its own type block, and the one for the capital “M” was often the largest block. It is just slightly larger than the actual letter M, because there needed to be a bit of space on each side.

Moving on to the aspects of font lead to contrast we will focus on the following:

x-height is the height of the lower case ‘x’, which is also the height of most lower case letters. Fonts that have very different x-heights look different to us. Calibri has a low x-height, while Arial has a relatively high x-height.
weight is the size of the line (also called “stroke”) used to draw the glyph. Century Gothic has a relatively light stroke width, while the stroke width for Impact is relatively light.
stroke contrast refers to the degree to which the weight (or “stroke width”) varies throughout the glyph. High stroke contrast implies large changes, while low stroke contrast implies little (or no) variation. For instance, Arial has no stroke contrast, while Bondi has a relatively high stroke contrast.
stress refers to orientation of the stroke contrast. It is most easily seen by considering the letter “o”. For fonts with non-zero stroke contrast, the orientation of the stress is found by drawing a line through the two regions of thinnest stroke. Fonts with no stroke contrast have no stress. Garamond has a vertical stress, while the stress in Adobe Devangari is slanted. When stress is present, we are most used to seeing stress that is somewhat vertical. Fonts, such as Playbill have horizontally running stress and look strange (or old-western) to us.
serifs are the “feet” at the bottom of some fonts. A common serif font is Times New Roman, while Calibri is a common sans-serif font.
spacing describes the average space that each letter occupies. Most fonts have proportional spacing, so that the letter “i” occupies less horizontal space than does “M”. However, some fonts are “monospace” and each letter and number occupies the exact same space. This makes it easy to line text up vertically across lines. Courier New is a common monospace font, while Arial is a proportional font.
kerning refers to the the spacing between individual letter pairings. For instance, in the image above, the “n” and “l” glyphs are separated by a small amount of space, but the “v” and the “n” are separated by a smaller amount of space, while the A" and the “v” has negative space between them (the “A” runs a bit below the “v”)! Different fonts can treat this separation—or kerning—differently. Monospace versus non-monospaced fonts are an extreme example of these differences, with monospace fonts, like Courier New, having uniform kerning, while the proportional font, Arial, has differences in the spacing between letters.
letter geometry describes how some fonts treat the fundamental shape of letters. Some fonts treat the “O” as oval, while others treat it as round, and others as square. Century Gothic is a font where the letter “O” is a circle,Agency is a font where it is sort of square-ish, and Tahoma is one where it is more oval.

There are, of course, other aspects of font that can give rise to contrast, but the above are the most common.

Historical and current uses of fonts in data visualizations #

Before considering how to choose fonts, we should also consider some of the historical uses of fonts in data visualizations.

The earliest data visualizations were hand drawn. The “fonts” used to label them looked like scripts—because they were! There were really no other choices at the time, and so making a data visualization with this very old handwritten style will make the data visualization also look very old. One thing to note about these plots is that the lines and markers of the plots were also hand-drawn, and so there are variations in line widths for the data as well.

*Data visualization of the planetary movements, from 950CE. Image taken from https://www.datavis.ca

As movable type was developed, and the production of books moved away from scribes to printers, inclusion of graphics was done using engravings. So, a data visualization would be engraved into a metal plate or wood block and then printed into the page. The inclusion of graphics was relatively rare, and one could invest a large amount of time into making them. Thus, one still sees flowing scripts, in part as artistic flourish and part to mimic the form of the data visualizations that came before them.

Data visualization of the price of wheat versus a worker’s wages. The visualization was made by William Playfair and was printed from an engraving in 1786CE. Image taken from https://www.datavis.ca

In the 20th century, scientific journals began to include more and more graphics, enabled by the ability to reproduce photographs in printing. In the age before widespread computers, one still needed to draw out the data, and then take a picture of this drawing. At the time, most places doing large-scale science (Universities and National laboratories) employed full-time drafts-people to produce these figures. To save time and increase uniform appearance, stencils were used to write out the letters. In order to not need to worry about different kernings between all letter pairs, these stencil sets were largely based around monospaced fonts. Many of them also did not have serifs.

Plot of “atomic number” versus the square root of x-ray frequency for the elements known in 1913 CE. “Atomic number” is in quotes here, because it was not yet accepted, though this plot played a large roll in demonstrating its value. Image taken from https://www.researchgate.net/figure/Moseleys-graph-of-frequencies-in-X-ray-spectra-of-chemical-elements-Source-Moseley_fig13_7923211

The other option was to write out the labels using a typewriter, in which case the kearning was also monospace, but could readily use a serif font.

An image of guild income for different products, created in ca. 1927. Image is taken from https://www.datavis.ca

This is why “old” (i.e., 20th century) data visualizations often use monospace (serif or sans-serif) fonts. Using monospace fonts. Using such fonts in your own data visualizations will give them a “retro” feel because of this. However, if you want to completely reproduce the look, then pay attention to detail! These plots were hand drawn using pen or pencil. These instruments have rounded points and so lines do not end with square truncation, but rounded edges. Similarly, the capillary forces of the ink on the physical cutout letters of a typewriter leads to rounded edges on the lettering. Additionally, there can be some very subtle variations in line width and opacity throughout. If you want to completely mimic the old draftsperson style, make sure to pay attention to this.

However, towards the end of the 20th, the production of data visualizations became computerized, and drafts people (and practicing scientists) starting making plots on computers, printing them out, and then photographing these printouts to be sent to the journal. During this time, sans-serif monospaced fonts continued to be used, I speculate for two reasons. First, computer displays were low resolution and so the small serif features were hard to reproduce on the display. This means that sans-serif fonts were used, but (second) the reason that monospaced fonts were used was that people were trying to emulate the products of the drafts people before them. The different here, of course, is that one could now produce lines with square edges and printing by machine could be done with more uniform lines. Thus, data visualizations with monospace fonts on graphics with square-edged features and uniform lines gives a feeling of these more modern, but still retro data visualizations.

Plot of information about the draft lottery in the USA, made on a computer in 1975 CE by William Cleveland. Image from https://www.datavis.ca

Finally, we get to the modern era, defined by higher resolution computer displays and journals accepting the data file of plots, rather than photographs. Additionally, the production of data visualizations was largely democratized, as programs like Sigmaplot, Origin, and even Excel enabled practicing scientists to make data visualizations with the press of a button. This combination meant that more standard fonts could be seen on the screen and (if set to a default) would be commonly used and seen. As computer display resolution improved, more standard fonts could be used, but there was still a time that sans-serif fonts were better looking. For a long time “Arial” was the default option in Excel, and it is probably no surprise that Arial (or other similar sans-serif fonts) is a font that is ‘standard’ looking on modern data visualizations.

A plot made on a computer using a high resolution display in 2005 CE. Part of the gapminder project at www.gapminder.org

One thing that you will note from this discussion is that there was never really a time that people were using non-monospace serif fonts. For this reason, they largely look out of place. There can be, of course, a time and place for such fonts and professional designers can easily make data visualizations were this font choice is not out of place. However, for those new to working with fonts, the end result is this: using a common sans serif font will largely be a reasonable choice. Only deviate from this when you have a compelling and well-thought out reason to do so.

Modern data visualization taken from The Journal of the American Chemical Society. Even though modern desktops can make advanced graphics, such as those shown to the right, a “standard” data visualization still has sans-serif fonts.

Of course, there are still a large number of sans serif fonts to choose from, and so it is worth some special considerations when thinking about how to select your sans serif font.

Special considerations for data visualizations #

Numbers #

Dealing with data means dealing with numbers, and so it is worth considering a few properties of them. It is also worth noting that, in the world of typography, the characters for numbers are also called “figures.”

Tabular versus proportional numbers. Tabular numbers are those for which each glyph occupies the same horizontal space. This makes it easy to align numbers when they are arranged vertically—such as in a spreadsheet or the y-axis of a plot. Proportional numbers are those where “thinner” numbers occupy less horizontal space (i.e., a “1” occupies less space than an “8”). In some sense, you might consider this behavior as similar to monospace font, but the distinction here is that tubular refers to just the numbers. One can then have a non-monospaced font that has tabular numbers. My advice is to use fonts with tabular numbers, because it will help with alignment along the y-axis, and if you are annotating your data visualization with numbers.

The plot on the left uses proportional figures (numbers), while that on the right uses tabular figures. You can see how the numbers don’t perfectly align on the left, but they do on the right. Note: the numbers are probably a bit larger than I would normally use, and there are certainly more tick marks, but this is done to help illustrate the point regarding tabular numbers.

Lining versus old style numbers. Lining numbers have no ascending or descending features. Thus, the glyphs for numbers all stretch from the cap height to the baseline. This makes things look a bit cleaner. Old style numbers are not this way, and many numbers (such as a “9”) will break these lines. Most modern fonts use lining numbers—a result of the web and the fact that Google used to not display old style numbers. However, this is probably a fine choice for data visualizations as well.

Greek letters #

It is worth paying particular attention to the glyphs for Greek letters and Latin letters of similar shape. For instance, the Greek letter $\nu$ can look very similar to the lower case “v”, depending on font. In the popular font Arial, they are essentially identical: v and ν. Can you tell which is which?. On the other had some fonts do a good job distinguishing them. In Segoe UI, they are distinct: v and ν. So, pay attention to the glyphs for Greek letters you will use.

Horizontal lines #

Though this does not play into font selection as much as the above two considerations, it is worth mentioning that there are a myriad of horizontal lines, for instance, the most common lines include the hyphen, figure-dash, minus sign, en-dash, and em-dash, these are represented as (respectively): ‐, ‒, −, –, and —. Each of these serve a purpose. The hyphen is used for breaking words across lines or forming compound words. The figure dash is designed for use in separating numbers (i.e., in a telephone number) and is designed such that it occupies the same horizontal space as a number in tabular numbers. The minus sign is the dash placed at the same height as the plus sign (+) and is designed to look appropriate when combined with this symbol. The en dash is a dash the same width as the capital letter “N” and the em dash is a dash the width of the capital letter “M”. These are used to help separate out asides in text—such as this one. American usage typically uses an en dash – with the dash separated from the words by a space, while over in England, they recommend the em dash—with no spaces between the words and the dash.

The main reason to worry about these dashes here is two fold. First, the default behavior of the “minus” sign key on your keyboard is actually to render a hyphen. Second, if you are writing math or working with numbers you might want to ensure you are using the minus sign and figure dash instead. There are many ways to do this, but the simplest is to google these and then copy the character and paste it where you want.

Conclusions #

That is all for typography. If you want to dig deeper, there are a number of interesting sources on the history and usage of type. My recommendations are:

Thinking with Type by Ellen Lupton This book has some nice introductions to how designers think about font.
The Visual History of Type by Paul McNeil. This book has a very large number of type examples, with discussions of their history and the families of font that they fit into. Definitely a good read if you want to get a sense of how fonts evolved to serve different purposes.
Glyph: A Visual Exploration of Punctuation Marks and Other Typographic Symbols by Adriana Caneva and Shiro Nishimoto a cute book that has a short history of many of the specialized glyphs we use. Just a fun read.
Shady Characters by Keith Huston. This is legitimately one of the best books I have read. A very interesting history of the characters we use in writing.

But for now, if you simply select a sans serif font with tabular numbers you will have a nice looking plot. And if you can also ensure you use minus signs and figure dashes, you will be a step above the majority of people.