Friday, May 13, 2011

In the last section, we described the difference between Data-Ink and non-Data-Ink. You will recall that Data-Ink is the part of the graph that directly represents the underlying data. If we remove any of the Data-Ink, the graph will contain less information. Non-Data-Ink consists of everything else and, typically, does not add to the information in the graph.

Dr. Tufte describes the extraneous drawings on a graph as “Chart Junk”. Chart Junk consists of gridlines, vibrations, and “ducks”.

We described gridlines in the last section. Gridlines can typically be removed (or at least reduced in quantity and weight) without making the graph any less meaningful.

Vibrations are caused by excessive use of patterns in a graph. Most graphing software allows you to fill a shape with a pattern, such as diagonal parallel lines or wavy lines. These patterns can create a sense of motion in the graph (which is probably why artists like them), but that motion illusion can be very distracting (which is why they should be avoided).

Below are several examples of graphs with excess shading. If we look at these graphs for a minute, they appear to shimmer or vibrate – an effect known as the Moire’ Effect. Replacing the patterns with shading or colors simplifies them and makes them much cleaner.

Figure 6a

Figure 6b

Figure 6c

Figure 6c fills each bar with a pattern to distinguish one from another; then identifies the patterns with a Legend to the right of the graph. This forces the user to look between the graph and the legend to figure out what each means. A much better graph would simply label the bars directly, eliminating the need for the legend (Figure 6d).

Figure 6d

The chart in Figure 6e is presented in 3 dimensions and with a rounded top to each bar. Neither of these properties are necessary to represent the underlying data and are therefore Chart Junk. Eliminating these properties would reduce the ink without reducing the Data-Ink, increasing the Data-Ink ratio.

Figure 6e

Edward Tufte had a name for chart elements that added no value whatsoever, other than to distract or entertain the viewer. He called them “Ducks". The term “Duck” came from a building Tufte found in Flanders, NY (Figure 6f). The building was built to look like a duck, even though that shape did not enhance the building’s functionality in any way – The shape called attention to the building and served no other purpose.

Figure 6f

Figure 6g represents data about irrigation and its uses in California. Unfortunately, the large number of colors makes the graph difficult to quickly grasp – the viewer must look back between the blocks of color and the color key to interpret the meaning of each square. Tufte considered these blocks of color to be Ducks.

Figure 6g

Similarly, Figure 7c uses a confusing colors scheme to represent its data. The user must memorize these colors or constantly look up their meaning in a legend.

Figure 6h

If you find yourself adding a legend to a chart, it is a good time to ask yourself if that legend is necessary and clear. Can the data ink be made more clear (by shading or by labeling), eliminating the need for the legend?

Finally, we present the graph in Figure 6i.

Figure 6i

According to Tufte, this may be the worst graph ever presented. Only 5 pieces of data are represented in the graph, yet a 3-D graph was chosen for no apparent reason and the bright colors have no meaning whatsoever. It isn’t even obvious that the top and bottom numbers total 100%. This same data would be better presented in a simple table (Table 6a).

 Year % Students 25 and older 1972 28.0 1973 29.2 1974 32.8 1975 33.6 1976 33.0

Table 6a

As much as possible, you should eliminate chart junk and ducks. These seldom enhance a graph and often distract the viewer from the actual data.

This is an ongoing series discussing the research of Dr. Edward Tufte on Data Visualization.

Friday, May 13, 2011 3:54:00 PM (GMT Daylight Time, UTC+01:00)
Thursday, April 19, 2012 1:24:31 AM (GMT Daylight Time, UTC+01:00)
Good tip, don't know why I always went the hard road and added them all then leteded the ones I didn't want. D'oh!Also handy when you want to highlight specific points in a graph, for instance if you wanted to mark Jan-11 in your example to show that the Budget was higher than Actual (different from the rest of the data points). If there are multiples, and it's hard to tell which is which, I tend to format the label font the same color as the series to make it easy to tell which label goes with which data.