# Thursday, 08 May 2008

Edward Tufte has spent a lifetime turning data into pictures and studying the best way to do so.

In his first (self-published) book The Visual Display of Quantitative Information, he describes what makes an excellent graph or map. 

Not all data sets are good candidates for charts.  For small data sets with exact values, Tufte recommends using tables.  However to compare values or present many pieces of data simultaneously, a graph is far superior.  Graphs, Tufte asserts, are most useful when showing complex data and displaying trends or observations that are not immediately obvious when the data is displayed in tabular form.  An excellent graph is one that is clear, precise and efficient - that is it "gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space."

Tufte provides some advice to accomplish this graphical excellence.  He introduces the concept of "Data-Ink" ratio.  This is the amount of information conveyed by a chart, relative to the amount of ink required to print that chart.  Generally, a graph can be improved by increasing its Data-Ink Ratio.  This can be accomplished by erasing non-data ink, such as unnecessary gridlines and labels; by erasing redundant data; and by labeling data directly, rather than forcing users to look up information in a legend.

Related to the Data-Ink ratio is his push for high data density - graphics that have maximum data per page, maximum data per square inch, and maximum data per amount of ink used.  As long as a graphic does not appear confusing, cluttered or overwhelming, you should pack as much information as you can into it.

Tufte warns against "chartjunk", his term for irrelevant text, lines, pictures or other decorations that contain no actual information.  This is ink that can be erased from a chart without reducing the amount of information in the chart.  Many graphs contain pictures, 3D effects and colors that don’t relate to the data.  Rather than enhancing the user’s understanding of the data, this “junk” distracts the user’s attention from the data, making the graph harder to understand.  Erasing chartjunk increases the Data-Ink ratio, which should be the goal of every designer of data graphics.

I appreciate that the book provides numerous examples of both the right way and the wrong way to represent data visually and that most of these examples came from real-world publication.  Tufte pulls no punches in his criticism of those who do things the wrong way.  In describing one graph published in American Education magazine - a confusing 3D graph that shows only 5 pieces of data and uses 5 different colors that in no way relate to that data - he writes "This may well be the worst graphic ever to find its way into print."

This is an excellent book for anyone who needs to present data to an audience.  Business analysts, managers and software developers can all increase their effectiveness by implementing Tufte’s ideas.

This book on Amazon