Friday, May 6, 2011

Figure 2a is a hand-drawn graph created by the French engineer Ibry in 1885. It represents a schedule of train trips in France.

Figure 2a

The times are listed along the top and bottom (x-axis), and the train stations are listed along the left side (y-axis). Each train route is represented by a diagonal line. The left end point of the diagonal line represents the departure of that train with the departure station on the y-axis and the departure time on the x-axis. The right endpoint of the diagonal line tells us when and where the train arrives at its destination. Using this graph, it’s not difficult to find the schedule of all trains leaving a given station each day. For example, in Figure 2b, I’ve highlighted one train trip that leaves Paris shortly after noon and arrives in Tonnerre around 6PM.

Figure 2b

Figure 2c is a chart created by the statistician William Playfair.

Figure 2c

The strength of this graph is that it displays 3 series of data over the time period: The average wages in England, the average price of wheat in England each decade, and the reign of each monarch is shown on the same time scale, covering about 4 centuries. Presenting multiple series like this allows the viewer to quickly determine correlations between the series.

A map can be an effective data presentation tool, as evidenced by Figure 2d, which shows economic data from the 1960 census.

Figure 2d

Each map shows every county in the United States. The top map shows the concentration of very poor families in each county and the bottom map shows the concentration of very rich families. High percentages are represented by very dark shading, low percentages are represented by very light shading and the percentage of shading increases regularly with the increase of percentage. A map such as this aggregates millions of data points. Because it is so intuitive, the viewer can quickly form observations (lots of poor families in the southeastern US in 1960) and ask questions (why are there so many rich families and poor families in central Alaska?)

No discussion of historical graphical excellence would be complete without Minard’s diagram shown in Figure 2e.

Figure 2e

Tufte described this drawing – which shows Napoleon’s advance to and retreat from Moscow in the winter of 1812-1913 – as “the best statistical graph ever”. The tan line represents Napoleon’s march from the Polish-Russian border on the left to Moscow on the right, while the black line below it represents his retreat back into Poland. The width of each line represents the size of Napoleon’s army. From this information alone, we can see the disaster of this campaign – Napoleon entered Russia with 400,000 troops but arrived in a deserted Moscow with only 100,000 men. By the time he left Russia months later, he had barely 10,000 men. The retreating line is tied to a graph below showing the time and temperature during the march. The extreme cold undoubtedly was a factor in the decimation of this army. With a minimal amount of ink, this chart shows army size, location, direction of movement, time, and temperature – a startling amount of information.

In this article, we looked at some historical charts, graphs and maps that visualize data in a way that is more meaningful and more quickly grasped by the viewer than the data represented. In the next section, we will explore some common problems with visualizations.

This is an ongoing series discussing the research of Dr. Edward Tufte on Data Visualization.

Friday, May 6, 2011 1:49:00 PM (GMT Daylight Time, UTC+01:00)