Monday, May 9, 2011

Episode 155

Monday, May 9, 2011 3:45:00 PM (GMT Daylight Time, UTC+01:00)
Friday, May 6, 2011

Figure 2a is a hand-drawn graph created by the French engineer Ibry in 1885. It represents a schedule of train trips in France.

Figure 2a

The times are listed along the top and bottom (x-axis), and the train stations are listed along the left side (y-axis). Each train route is represented by a diagonal line. The left end point of the diagonal line represents the departure of that train with the departure station on the y-axis and the departure time on the x-axis. The right endpoint of the diagonal line tells us when and where the train arrives at its destination. Using this graph, it’s not difficult to find the schedule of all trains leaving a given station each day. For example, in Figure 2b, I’ve highlighted one train trip that leaves Paris shortly after noon and arrives in Tonnerre around 6PM.

Figure 2b

Figure 2c is a chart created by the statistician William Playfair.

Figure 2c

The strength of this graph is that it displays 3 series of data over the time period: The average wages in England, the average price of wheat in England each decade, and the reign of each monarch is shown on the same time scale, covering about 4 centuries. Presenting multiple series like this allows the viewer to quickly determine correlations between the series.

A map can be an effective data presentation tool, as evidenced by Figure 2d, which shows economic data from the 1960 census.

Figure 2d

Each map shows every county in the United States. The top map shows the concentration of very poor families in each county and the bottom map shows the concentration of very rich families. High percentages are represented by very dark shading, low percentages are represented by very light shading and the percentage of shading increases regularly with the increase of percentage. A map such as this aggregates millions of data points. Because it is so intuitive, the viewer can quickly form observations (lots of poor families in the southeastern US in 1960) and ask questions (why are there so many rich families and poor families in central Alaska?)

No discussion of historical graphical excellence would be complete without Minard’s diagram shown in Figure 2e.

Figure 2e

Tufte described this drawing – which shows Napoleon’s advance to and retreat from Moscow in the winter of 1812-1913 – as “the best statistical graph ever”. The tan line represents Napoleon’s march from the Polish-Russian border on the left to Moscow on the right, while the black line below it represents his retreat back into Poland. The width of each line represents the size of Napoleon’s army. From this information alone, we can see the disaster of this campaign – Napoleon entered Russia with 400,000 troops but arrived in a deserted Moscow with only 100,000 men. By the time he left Russia months later, he had barely 10,000 men. The retreating line is tied to a graph below showing the time and temperature during the march. The extreme cold undoubtedly was a factor in the decimation of this army. With a minimal amount of ink, this chart shows army size, location, direction of movement, time, and temperature – a startling amount of information.

In this article, we looked at some historical charts, graphs and maps that visualize data in a way that is more meaningful and more quickly grasped by the viewer than the data represented. In the next section, we will explore some common problems with visualizations.

This is an ongoing series discussing the research of Dr. Edward Tufte on Data Visualization.

Friday, May 6, 2011 1:49:00 PM (GMT Daylight Time, UTC+01:00)
Thursday, May 5, 2011

Look at the four series of data below.

I II III IV
x y   x y   x y   x y
10 8.04   10 9.14   10 7.46   8 6.58
8 6.95   8 8.14   8 6.77   8 5.76
13 7.58   13 8.74   13 12.74   8 7.71
9 8.81   9 8.77   9 7.11   8 8.84
11 8.33   11 9.26   11 7.81   8 8.47
14 9.96   14 8.1   14 8.84   8 7.04
6 7.24   6 6.13   6 6.08   8 5.25
4 4.26   4 3.1   4 5.39   19 12.5
12 10.84   12 9.13   12 8.15   8 5.59
7 4.82   7 7.26   7 6.42   8 7.91
5 5.68   5 4.74   5 5.72   8 6.89

Is there a pattern to the data in each series? How do the series relate to one another? It’s difficult to answer these questions looking only at the raw data.

However, if we display the data as 4 scatter graphs on the same page (Figure 1a), we can quickly see the pattern in each series and we can use that pattern to predict the next value in the series. We can also see outliers in series III and IV and ask questions about why those outliers occur.

[Figure 1a]

Figure 1a is a good representation of the data because it allows us to understand the data quickly and easily and because it answers questions and sparks follow-up questions about the data.

As a software developer, I spend a lot of time writing software to maintain data. There are many tools and training to help us store, update data and retrieve data. But few people talk about the best way to present data in a meaningful way.

Professor Edward Tufte of Yale University is one person who is doing research in this area and writing about it. Tufte studied graphical representations of data to find out what makes an excellent visualization and what problems occur in data visualization. He has written several books on the topic, describing guidelines to follow and common traps to avoid. In my opinion, his best book on this subject is The Visual Display of Quantitative Information (ISBN 0961392142).

This series will review Dr. Tufte ‘s research, ideas and conclusions on Data Visualization.

Over the next couple weeks, I’ll explore excellent charts created throughout history and identify what makes them so excellent; graphs that lack integrity and serve to mislead the viewer; and some guidelines that Dr. Tufte suggests for improving data visualization.

This is an ongoing series discussing the research of Dr. Edward Tufte on Data Visualization.

Thursday, May 5, 2011 4:31:00 PM (GMT Daylight Time, UTC+01:00)
Wednesday, May 4, 2011

The Kalamazoo X conference isn’t like other conferences. Although it is targeted at technical people and the audience is filled with software developers, the content presented is typically not technical. Instead, sessions highlight soft skills, such as team building and education.

Another major difference between Kalamazoo X and other conferences is the session format: The length of each presentation is limited to 30 minutes – much shorter than the 60-90 minute presentations seen at most technical conferences. This serves to keep the audience focused. It’s rare to see any audience member get up out of his or her chair and walk out of a session, partly because they will miss a significant part of it and partly because the session is always close to the end.

The final major difference is that Kalamazoo X offers only one track. This provides all attendees the same shared experience, that they can discuss and compare afterwards. One never has to choose or feel he is missing something.

This year’s conference took place last Saturday at Kalamazoo Valley Community College and featured something for everyone. Nine speakers delivered ten presentations and the day ended with a panel discussion on Interviewing. A fishbowl exercise during lunch got the crowd excited. 5 chairs were placed in the middle of the room and a topic was thrown out. The ground rules of the fish bowl were: You must be seated in one of the chairs in order to ask a question; and one chair must always be empty. Attendees entered and exited the fishbowl area frequently and the conversation grew excited as ideas fired back and forth.

Kalamazoo X is the brainchild of Michael Eaton, who envisioned a conference that fill gaps he saw in the education of software developers. Technical information is readily available to technical people from a variety of venues, but soft skill training is much more rare and this lack of training often shows up in the lack of soft skills displayed by the developer community.

Kalamazoo X is now in its third year. I have attended all three – including the one last Saturday. I have spoken at two of them. Each time, the success was evident – The room was full, the content was excellent, and the atmosphere was electric. I’ve learned about leadership from Jim Holmes, about Community from Mike Wood and Brian Prince, about self-promotion from Jeff Blankenburg, and about life from Leon Gersing.

Photos from 2011 Kalamazoo X

Photos from 2010 Kalamazoo X

Wednesday, May 4, 2011 3:20:00 PM (GMT Daylight Time, UTC+01:00)
Monday, May 2, 2011

Monday, May 2, 2011 3:48:00 PM (GMT Daylight Time, UTC+01:00)
Saturday, April 30, 2011

Below are slides from the Data Visualization talk I delivered at the Kalamazoo X conference today

Saturday, April 30, 2011 3:34:07 PM (GMT Daylight Time, UTC+01:00)
Monday, April 25, 2011

Monday, April 25, 2011 3:45:00 PM (GMT Daylight Time, UTC+01:00)
Thursday, April 21, 2011

I do a lot of technical presentations and those presentations often contain code demos. As a general rule, I favor creating my code demos in advance over typing them in during my presentation. If a demo involves more than a few seconds typing, no one wants to sit and watch the presenter type (or, worse, debug code that he mistyped).

Often I'll have a number of related demos in the same project or the same class. Each demo will be a little more complex or show off a slightly different feature than the prior demo.

In the past, I've added code and commented it out, then commented / uncommented during  the presentation. Here is a sample of this technique.

Unfortunately, this method is error-prone. It’s too easy to accidentally uncomment or comment the wrong line or too many lines, causing errors that you will need to debug quickly and with the pressure of an audience staring at you.

But I've found a different simple technique that works very well for Console application demos. I prompt the user to enter a number - then capture the user's input and run the method corresponding with that number. The Console.ReadLine method allows me to capture the user's input and a switch statement allows me to easily translate a number entered into the appropriate method call. A few Console.WriteLine statements clarify what the numbers mean. Below is an example.

This technique allows you to prepare all of your demos in advance, so you don’t need to change anything during your presentation. I like the fact that the technique does not excessively complicate the code you are presenting. Simply focus on the code in one individual case statement at a time.

Thursday, April 21, 2011 1:20:00 PM (GMT Daylight Time, UTC+01:00)
Wednesday, April 20, 2011

One thing I really enjoy is speaking at conferences and user groups. I learn a lot and I get a chance to interact with other developers around the country, and I get a rush when I can pull off a really good presentation. Unfortunately, traveling can be expensive and I need to limit my talks to what can fit in my budget.

One thing Telerik enjoys is supporting the developer community. They have great products and presentations at user groups and conferences are a good way to let people know about those products. Unfortunately, Telerik does not employ an army of professional presenters to cover all the events they’d like.

Telerik recently solved both those problems by forming the Telerik Insiders Program. The program consists of people in the community – like me – who enjoy speaking at developer events. The deal is that Telerik will sponsor our trip to a conference or user group and all we need to do in exchange is give away a bundle of their software. This is a great deal for me because I’ve been a fan of their products for a long time and because I love giving away stuff that someone else paid for.

Telerik has recruited a number of outstanding speakers to this program, including John Petersen, Lee Brandt, and Malcolm Sheridan.