Ordering

Ordering of objects #

~8 minute read

If you do nothing, your data will be in some sort of ‘default’ order.

Often, it’s just the order it was entered into the spreadsheet. This is a random order, and it’s a wasted opportunity to convey meaning.

When you intentionally order your data, you create meaning. You turn a random list into a clear story. The good news is, while the data is infinite, the ways to logically order it are not. In fact, there is a surprisingly small number of possibilities, often called the “five hat racks."$^\dagger$

Five hat racks #

This expresses the idea that, given a set of objects, there are only 5 ways to logically order them:

  1. Chronologically. We could arrange all the athletes by their age, or the date at which they started competing in their event.

    A pictogram with icons representing 6 athletes. Numbers representing their age are given below the icons and the icons are arranged from oldest to youngest.

  2. Alphabetically. We could arrange all the athletes by their last name, alphabetically.

    A pictogram with icons representing 6 athletes.  The names associated with the people are given below the icons, and the icons are arranged in alphabetical order.

  3. By magnitude. We could arrange the athletes by height, or weight, most medals won, or furthest distance travelled. As long as we can quantify something about them, we can order by that quantity.

    A pictogram with icons representing 6 athletes. The icons are different sizes, and the icons are arranged from largest to smallest.

  4. Geographically. One could take a map of the world, and then place each athlete where they currently reside. This is an ordering of the people.

    A pictogram with icons representing 6 athletes.  The athletes are arranged on a map of the USA, with a line pointed to their place of current residence.

  5. Categorically. We could arrange them by some sort of non-quantitative property.. Perhaps male and female, or by the type of medal they won, or by the country of origin.

    A pictogram with icons representing 6 athletes.  The are ordered into two groups by their gender

Now, spend some time trying to think of other ways that you could organize the athletes, and for each one you arrive at, ask if this arrangement falls within one of the 5 hat racks. You will almost invariably find out that they do. Almost (see below).

Five hat racks in Data visualization #

Consider in-state tuition at the Universities belonging to the BIG10 conference in the USA. Imagine you wanted to make a bar chart, and you pulled the data for this, and plotted it without considering the order. You would get a randomly ordered set of bars, much like the following.

Bar chart, showing the tuition costs associated with each of the universities in the BIG10 conference. The bars are in apparently random order.

This chart has all the information, but its default (random) order carries no meaning. It makes comparisons difficult and hides any potential story. So, let’s apply the five hat racks to tell different stories.

Chronological #

One way to think about these schools is when they were founded, or when they joined the big ten. We can order the bars in chronological order, by founding date, and we have the following:

Bar chart, showing the tuition costs associated with each of the universities in the BIG10 conference. The bars are ordered by the year the university was founded.
Now, one can look at this plot and determine if there is a pattern in terms of year founded. There clearly is not, but we can see this directly, in a way that we could not before.

By magnitude #

Perhaps we think the most important aspect of the plot is understanding which Universities are the most and least expensive. The simplest way to do this is to arrange the universities by cost. Ordering by this magnitude results in the following:

Bar chart, showing the tuition costs associated with each of the universities in the BIG10 conference. The bars are ordered from most expensive to least expensive.
Whereas the two different orderings above made it very hard to tell if, for instance, Wisconsin was more expensive than Maryland, now we can just directly read this from the chart!

Alphabetical #

What if we thought that the person reading the data visualization would be most interested in a particular school, but we didn’t know ahead of time which one? Then the simplest way to find it might be by sorting them alphabetically.

Bar chart, showing the tuition costs associated with each of the universities in the BIG10 conference. The bars are ordered alphabetically by school name.

Now it is much easier to find the school of interest, but it is harder to tell what is more expensive than any other school

Geographical #

Of course, alphabetical may not be the simplest way for someone to find the University of interest. For instance, how should one sort them. Is “The University of Indiana” be sorted by “T”, or “U”, or “I”? Probably “I” is the most useful, but there is some ambiguity. However, for someone with a reasonable knowledge of the geography of the United States, there is no ambiguity about where Indiana is on the map. By placing the Universities over their location on the map, one can understand where to find them, perhaps with even less thought.

A map in which the location of the BIG10 universities are indicated.
Also, such a map demonstrates the historical origins of the BIG10 among the Public Universities of the midwest. So, this ordering also shows additional information.

Categorical #

For this specific case, categorical may not be quite as useful as the other orderings, however, the BIG10 does contain both public and private universities, and so one could also imagine making two plots, one that allows comparison between public universities, and one that allows comparison between private universities. This might look like the following:

Bar chart, showing the tuition costs associated with each of thepublic universities in the BIG10 conference.
Bar chart, showing the tuition costs associated with each of the private universities in the BIG10 conference.

This is a powerful and common use of categorical ordering: you use proximity and separation to group related items, or you “facet” the data into small multiples. This makes the comparisons within a category much clearer.

A special sixth case for data visualizations #

We’ve just established that the default random order is meaningless. However, there is one advanced case where intentional randomness is the correct and powerful choice: to solve overplotting.

For instance, imagine we arranged the tuition of the schools on a 1D line, just so we can see the spread. The default behavior for a plot might be to use the $x$-values to reflect cost, and assign all points with a common $y$-value, as shown below.

Dot plot showing the tuition for all the universities in the BIG10. Since many of the universities have similar tuition costs, there are many overlapping point.

Though it is a bit easier to see the clustering around values than it was for the bar charts, it is hard to see all the points. If, however, we add in random displacement along the $y$-axis, we attain the plot below.

Dot plot showing the tuition for all the universities in the BIG10. “Jitter” has been added to the y-axis, so that points with similar tuition can be seen.

In this “jitter plot,” we are deliberately adding random, meaningless displacement (a.k.a. “jitter”) to make all the points visible. The random values along the $y$-axis has no meaning, but does serve our communication of meaning, by allowing all the data to be seen.

This isn’t the same as the default random order we started with. That was accidental meaninglessness. This is intentional meaninglessness, used as a tool to reveal a clearer pattern.

The take home message here is the same as above: do not accept the default ordering, but apply ordering to communicate meaning.

Concluding thoughts #

Order is not a passive choice; it is your most powerful tool for telling a story.

The default order of your data is rarely the best one. By choosing an intentional order—whether by magnitude to build a clear hierarchy, alphabetically to aid lookup, or even intentionally random to reveal a pattern—you are guiding your viewer and making your message clear.

Always ask: “What story am I trying to tell?” and “Which of the ‘hat racks’ tells that story best?”


$^\dagger$ This is also sometimes called LATCH: Location, Alphabet, Time, Category, Hierarchy/Magnitude.