The line chart is a way to express correlation between two quantities and is, therefore quite similar to a scatter plot. However, in a line chart, data points are connected by a line, implying a connection between data points or ordering of data points that need not be implied in a scatterplot. Because of this implied connection, line charts are often used when charting a change over time. The line connecting points implies the chronological ordering of data, or the connection that some data comes before and some after a particular point. You might find such relationships in plots of the stock market performance over time.

This is not the only place, however, that line charts are used. They are common, for instance, in spectroscopy, where spectra are commonly plotted as lines. However, this can lead to some logical confusion on the behalf of viewers that are not intimately familiar with the underlying means of measurement. In the case of a UV-visible spectrum, it would be much more accurate to represent the spectrum as a histogram, since what is really being reported in detected photons binned into discrete wavelengths. The use of lines masks this underlying truth. However, underlying this is critical if we wish to convert the x-axis from, say, wavelength to energy. Doing so, one cannot simply scale the x-axis. One must also scale the y-values depending on wavelength, using a Jacobian transformation. Failing to do so will yield incorrect representations of the data. However, the use of a line obscures this enough that there are articles written even for experts on this topic.

Even though the above is true, the use of lines for spectra persists for many reason. Using a histogram (the most accurate representation) could lead to visual clutter, and would make it challenging to plot several spectra on the same chart. For many spectra, there are two many points to use just markers (which would at last show the discrete nature of the values). And, of course, there are now cultural expectations that spectra be plotted using lines.

All of this is to say that you will often find yourself using line charts, and so it is worth consider how to construct and design them.

How to construct/interpret a line chart #

The line chart shows the relationship between two quantitative variables. Thus, you need data with two quantitative values. When plotting such data, it is convention to place the “independent” variables on the x-axis and the “dependent” variables on the y-axis. Here, “independent” and “dependent” implies a sort of causal relationship. Number of games of golf I play depends on the number of years I have been playing golf (and the frequency at which I play). There are philosophical discussions one could have regarding if any such relationship truly exists, but in practical terms, most people accept this idea of “independent” and “dependent” variables, and so it is worth respecting them.

The above is shared by other plots, such as scatter plots. But line charts have another important feature: since each point in a line is connected to, at most, two other points, there is an ordering of these points that must be chosen when plotting. If I were to plot my golfing over the last 20 years, it would make no sense to order them first as even years and then odd years. They should come in strict chronological ordering.

Once you have the appropriate, ordered, data, a line chart is made by drawing a line segment from one data point to the next. Though some programs give you the option to choose curved connections between data points, I strongly suggest only using straight line connections.

Doing so for education and housing expenses, and income trends over time will yield a plot like the following:

Design of line charts #

A default line chart, as exported by the plotting program Plotly might look like the following:

This plot probably looks standard to many reading this, but I think there are actually are a large number of things that can be improved to make the plot better.

Add a title #

Not every instance of line charts will let us add a title, but if we can, we should, and as discussed on the page regarding titles, we should make a claim that the data supports.

Remove the legend and directly label the data #

Though legends are a standard feature in plots, we should try our best to avoid them. I think the majority of the time, we can just direct labeling. The advantage of doing so is explained on the page discussing proximity and separation. And when we are directly labeling data, we can color code our labels.

In addition to the advantages for proximity and separation, this also gives us a bit more room for the data.

Consider if you need grid lines and colored plot areas. #

Another common feature of line charts is the use of grid lines. But, as discussed on the page considering grid lines and tick marks, they are often not needed, and just add visual clutter. Thus, I think a plane white background without these grid lines looks better.

Consider the thickness of the lines #

When you have a plot area that is differentiated only by the axes lines, I think it can be nice to make a clear distinction between the axes lines and data lines. In this case, the data lines are colored and the axes line are black. This helps. But we can also use differences in line width. By increasing the thickness of the data lines, we make them more noticeable.

Consider the axes titles #

Most of the time, you will need axes titles on line charts. However, in some cases, they may not be necessary. For instance, if change the title slightly to indicate we are dealing with time, then most people will correctly interpret the numbers on the x-axis as “years”. Thus, we probably don’t need this title.

Of course, if you are concerned this may not be clear enough, you can always add it back in.

Consider tick mark labels #

If we consider the $y$-axis, the title helps us understand that this is representing percent change. However, it does require us to read the title. It also requires us to connect that the “percent” in the title is modifying the numbers on the $y$-axis tick marks.

Similar to how we used direct labeling of the data, we can directly add the meaning to the data, by adding a “%” after the values.

You can find other examples of this when you are dealing with currency, weights, etc. There are many cases where simply adding the meaning to the numbers is a simple and direct solution.

Consider the aspect ratio #

It is not as common as it once one, but it is worth keeping an eye on the aspect ratio of your plot. It will rare that you want a square. So, consider other aspect ratios that will be more pleasing for your viewer.

Consider the placement of axes #

If a particular axes position has a meaningful value, then you can consider placing axes as these values. For instance, in this case, the value of 0% on the $y$-axis has a special meaning, and so we can place the $x$-axis there.

Of course, making such a plot might result in overlap between data and axes labels. If you don’t like this, you can always separate out the idea of the zero-line and the axis scale. To do so, you can place a line at 0 on the $y$-axis, but still leave the tick marks and labels down below.

Conclusions #

And that is it! Certainly there are many other small details you could consider, but I do think that these ideas will get you to the point that you are making line charts that are more clear and better designed than the majority.

Line Charts