Small multiples

Small multiples #


A perineal problem in data visualizations is having too much data to show. A perineal mistake in data visualizations is trying to show all the data at once.


Actually, the problem with above is trying to show all your data, and making it all distinguishable at once. But what do you do if you have lots of data, it is all important, and you do want to show it all?

The incorrect approach to doing so, is simply to plot all the data, and then try to make each element distinguishable. Consider this plot of median housing price, state by state, from 1940–2000.

In this image, it is nearly impossible to pick out any specific state. In part, this is because some lines are heavily overlapping. But also, there are just simply too many colors. For instance, consider the state with the largest increase. Matching the blue of that line to the correct blue in the key is a difficult—if not impossible—task. There is simply too much data to show in a single plot.

Perhaps the best solution of all time to this problem is to use what are called “small multiples.” Implementing this correctly can yield a plot like this:

Constructing small multiples #

The idea behind small multiples is rather simple. You make a plot that shows one set of data, and then for each other set of data you want to show, you make the exact same style plot. Then you arrange all of these plots into a grid. This allows you to show a large amount of data, without it being completely overwhelming.

Showing data separately #

In the simplest case, the small multiples shows one series of data per plot, labeling them appropriately. This might yield a plot like the following:

One thing to note is that we have avoided clutter in the small multiples, by removing much of the axes labeling. Since, in this case, the axes were identical across all the plots, we could restrict labeling to just the left and bottom. There are other ways to accomplish this as well, for instance, we could have labeled only the top left, and left the reader to assume everything else is consistent with this.

This does allow all the data to be seen, but doesn’t really provide context for each data set. One way to overcome this is to show all the data in each plot, but then to highlight only one data series per plot—perhaps by making all traces grey, except the one that is being highlighted in the small multiple.

If you have a particular data series you want to call attention to, you can use contrast to do this. Perhaps by changing the color of a particular plot line.

The above is a pretty nice plot, but could certainly be improved. There are also other considerations one could make in its design. For instance, sometimes, people would make one of the small multiples large, and increase the amount of labeling on it. However, even though there are many ways to approach making a figure with small multiples, the idea is the same: to take a large amount of data and spread it out, so that it is easier to parse.