# Thursday, 09 June 2011

I’m a big fan of the Deep Fried Bytes podcast. I’ve listened to every episode since Keith Elder and Chris “Woody” Woodruff began recording years ago. So I was thrilled when they asked me to be a guest on Deep Fried Bytes this month.

We talked about Data Visualization. I recently developed a presentation titled “Data Visualization – The Ideas of Edward Tufte” that I’ve delivered at the Kalamazoo X and at Codestock. I’m scheduled to deliver it again at Devlink.

Keith was packing for a fishing trip the night we recorded, so Woody and I spoke via Skype for the better part of an hour. I think it turned out very well. I had a blast and I hope I get invited back.

You can hear the interview and download it at http://deepfriedbytes.com/podcast/episode-71-talking-data-visualization-on-an-audio-podcast/.

Thursday, 09 June 2011 15:52:00 (GMT Daylight Time, UTC+01:00)
# Wednesday, 08 June 2011

This year's Codestock was my third and it did not disappoint. I was scheduled to deliver two presentations - Visual Studio 2010 Database Tools and An Introduction to Object-Oriented Programming. These were two talks I had not given for some time and I altered both considerably since I last delivered them. I stayed up most of Thursday night preparing to deliver them during the first two time slots Friday.

By 11AM, I was finished presenting and prepared to relax and enjoy the conference. After a leisurely lunch, I attended Seth Juarez's 2-hour presentation on Machine Learning. I heard about this talk last year and was determined not to miss it this year. Seth described algorithms that allow computers to predict results after observing a set of sample data. I was impressed enough with this talk to invite Seth onto Technology and Friends.

The keynote address was Friday evening at the nearby Bijou Theater. Charles Petzold - one of the world's most famous computer science authors - delivered an impressive narrative about scientists of the 19th century. He began with the work of William Thomson (who later became Lord Kelvin) and his analog computer designed to predict the height of tides. Petzold expanded the talk to cover Thomson's clashes with the geologist of his time and with naturalist Charles Darwin. Petzold was informative and entertaining and delivered one of the best keynotes I've ever heard. I was thrilled when he agreed to appear on my TV show the next day.

Saturday was supposed to be spent taking in sessions and open spaces. But a speaker canceled at the last-minute and I was asked to fill in. I chose to do a talk on Data Visualization, which I originally delivered at the Kalamazoo X conference and which I am scheduled to deliver at Devlink in August. Originally, this talk was only 30 minutes but there were so many good questions that it lasted almost 60 minutes.

Later in the day, Mike Eaton asked me to help him deliver a presentation on user interfaces. I stood near the stage and made a few contributions, but he did not need my assistance. Mike showed off some impressive WPF applications he has built and described why he made the design decisions in these applications.

I brought some work with me - a problem with Microsoft Windows Identity Foundation with which I had been struggling - and Microsoft Evangelist Brian Prince was kind enough to sit with me and patiently answer my questions. This assistance alone was worth the trip.

I brought my video camera and recorded 5 episodes of Technology and Friends, which will air over the next few weeks. I also filmed some spots for a user group project I’m assembling. The final result will be published in October. Improbably, I did not take any photos at the conference.

As always, the best part of this conference was meeting and interacting with smart people, exchanging ideas and business cards. It’s funny how I can attend a conference, sit in only one session, attend no open spaces and still manage to learn a lot

Wednesday, 08 June 2011 18:40:00 (GMT Daylight Time, UTC+01:00)
# Monday, 06 June 2011
Monday, 06 June 2011 15:17:00 (GMT Daylight Time, UTC+01:00)
# Tuesday, 31 May 2011
Tuesday, 31 May 2011 20:29:00 (GMT Daylight Time, UTC+01:00)
# Wednesday, 25 May 2011

I’ve spent nearly 20 years working in technology. From my university days studying Computer Engineering; through my years managing a Lan Manager® network and writing FoxPro applications; to my time consulting with companies to help them build scalable applications to solve their business problems. I work with a wide variety of software and hardware tools. I’ve become proficient with some and I’ve developed the ability to quickly get up to speed on most tools.

But am I a technologist? Is the focus of my job to use computers, software and languages? Am I paid because of my expertise in a specific technology? Do customers value my computer skills over my other skills?

I never describe my professional self as an “expert” in anything. Instead, I emphasize experience, my learning abilities, and my problem-solving skills. Occasionally, a salesperson will tout my deep, technical knowledge on a topic, but I caution them against this, because it is not my greatest strength. My greatest strengths are the abilities to understand problems, to learn almost anything, to apply knowledge appropriately to a problem, and to share with others what I have learned.

I would argue that I am not a technologist – at least not primarily. As a consultant, my primary purpose is to add value to the customer. I do this by solving business problems. Some of the tools I use to solve those problems are types of computer hardware and software. But those are not the most important tools. The most important tools I use are communication skills and reasoning ability. It may be that the solution to my customer’s problem involves very little technical changes or even none at all. If it does involve software (which is usually the case), my application of that software is far more important than the bits within it.

I’ve seen a number of consultants who are focused on their technology of choice that they don’t seek a solution outside that area. If all you know is BizTalk or SharePoint or Lotus Notes, it’s very tempting to define business problems in terms that can be associated with your favorite tool. The popular expression to define this attitude is: “If all you have is a hammer, everything looks like a nail.”

For me, the solution is the important thing. Maybe it’s an advantage that I never immersed myself in a single technology. Maybe this keeps my mind more open to alternative solutions. If I need expertise in with a particular tool, I can either learn it or find someone who knows it well.

Does this mean that there is no value in deep technical knowledge of a topic? Of course not! There is great value in learning technology. The more we know, the more we can apply that knowledge to business problems. But it is the application of the knowledge that adds the most value – not the knowledge itself.

This mind-set becomes even more important when you consider the how international the software business has become. You may be a very good C# programmer. But, if you live in America, there is likely to be a very good C# programmer in India who is willing to do the same work for much less. And if you live in India, there is probably a very good C# programmer in China who is willing to work for much less. And if you live in China, keep your eyes open, because other parts of the world are developing these skills and they are anxious to penetrate this market and are able to charge even lower rates. It’s no longer possible to compete only on price (and still make a decent living) and it’s not enough to compete only on technical skill. The ability to solve complex business problems and apply the right technology can be the differentiator that allows you to compete in a global market.

Keep this in mind as you look for solutions to problems presented by your customer or employer. Focus on adding value to the business, rather than on applying a particular set of skills.

But in the end, I think I serve my customers better because I think of myself as a problem-solver rather than as a technologist.

Wednesday, 25 May 2011 00:52:00 (GMT Daylight Time, UTC+01:00)
# Monday, 23 May 2011
Monday, 23 May 2011 16:39:00 (GMT Daylight Time, UTC+01:00)
# Monday, 16 May 2011
Monday, 16 May 2011 15:26:00 (GMT Daylight Time, UTC+01:00)
# Sunday, 15 May 2011

It’s Sunday morning and I’ll be checking out of this hotel and heading home soon. I’m digesting what I learned and experienced yesterday at the Chicago Code Camp in Grayslake, OH.

The code camp offered 5 session slots. I sat in on 2 sessions and spoke at 1.

I began the day watching John Petersen describe Dependency Injection in ASP.Net MVC 3. With version 3 of MVC, Microsoft introduced a pattern for wiring up Dependency Injection Frameworks as the application starts up. John explained how to wire up a DI framework, such as StructureMap or Unity in your application. His examples helped to clarify this concept. I do a lot of work in ASP.Net MVC, but I have not had a chance to explore many of the new features included in version 3. John was kind enough to sit with me after his session and answer a few of my questions. A few months ago, I wrote and submitted a book chapter on MVC and I was recently asked to update this chapter to reflect the changes in version 3, so this topic helped me directly.

Session 2 was on Coded UI testing. Eric Boyd went out of his way to get the crowd involved. He brought gifts of books and t-shirts to give away to anyone who contributed a good anecdote or question. Using Visual Studio Ultimate, Eric walked through recording a UI test, showed the generated code, then showed the generated code and added assertions and other modifications to the test. He also briefly demoed the Microsoft Test Professional,  a tool for testers to script and record the results of manual tests; and data-driven tests, which allows you to run a single test multiple times with a variety of inputs.

I was scheduled to speak in slot 4 and, as is my practice, I hid away during session 3 to prepare for my talk. This was a brand new talk for me titled “How I Learned To Stop Worrying and Love jQuery”. I tried to show how much easier a framework like jQuery can make JavaScript development. I prepared and showed a lot of code demos on using selectors and binding events. Unfortunately, I ran out of time and had to rush through my Ajax demo and did not get to my jQueryUI demos. Still, the room was full (30+ people) and the audience was engaged. I’m scheduled to give the talk again at MADExpo, so I will tighten it up before then.

During the final session of the day, I recorded two episodes of Technology and Friends. I was introduced to Micah Martin, Dave Hoover, and Ashish Dixit via Twitter and got to meet them after my session. They have a passion for mentoring and apprenticeship programs and we talked about this on camera for a half hour. Next, I recorded a show on the SOLID principles with Chander Dhal, who is a fellow Telerik insider and a world-class yoga practitioner.

The best thing about this conference was the new people I met. Most are not from my geographic area or from my area of knowledge, so I felt my boundaries expand.

It’s time now to pack up, pick up Mike, and drive back to Michigan. I need to prepare for my next trip.

Sunday, 15 May 2011 13:04:59 (GMT Daylight Time, UTC+01:00)
# Friday, 13 May 2011

In the last section, we described the difference between Data-Ink and non-Data-Ink. You will recall that Data-Ink is the part of the graph that directly represents the underlying data. If we remove any of the Data-Ink, the graph will contain less information. Non-Data-Ink consists of everything else and, typically, does not add to the information in the graph.

Dr. Tufte describes the extraneous drawings on a graph as “Chart Junk”. Chart Junk consists of gridlines, vibrations, and “ducks”.

We described gridlines in the last section. Gridlines can typically be removed (or at least reduced in quantity and weight) without making the graph any less meaningful.

Vibrations are caused by excessive use of patterns in a graph. Most graphing software allows you to fill a shape with a pattern, such as diagonal parallel lines or wavy lines. These patterns can create a sense of motion in the graph (which is probably why artists like them), but that motion illusion can be very distracting (which is why they should be avoided).

Below are several examples of graphs with excess shading. If we look at these graphs for a minute, they appear to shimmer or vibrate – an effect known as the Moire’ Effect. Replacing the patterns with shading or colors simplifies them and makes them much cleaner.

Figure 6a

Figure 6b

Figure 6c

Figure 6c fills each bar with a pattern to distinguish one from another; then identifies the patterns with a Legend to the right of the graph. This forces the user to look between the graph and the legend to figure out what each means. A much better graph would simply label the bars directly, eliminating the need for the legend (Figure 6d).

Figure 6d

The chart in Figure 6e is presented in 3 dimensions and with a rounded top to each bar. Neither of these properties are necessary to represent the underlying data and are therefore Chart Junk. Eliminating these properties would reduce the ink without reducing the Data-Ink, increasing the Data-Ink ratio.

Figure 6e

Edward Tufte had a name for chart elements that added no value whatsoever, other than to distract or entertain the viewer. He called them “Ducks". The term “Duck” came from a building Tufte found in Flanders, NY (Figure 6f). The building was built to look like a duck, even though that shape did not enhance the building’s functionality in any way – The shape called attention to the building and served no other purpose.

Figure 6f

Figure 6g represents data about irrigation and its uses in California. Unfortunately, the large number of colors makes the graph difficult to quickly grasp – the viewer must look back between the blocks of color and the color key to interpret the meaning of each square. Tufte considered these blocks of color to be Ducks.

Figure 6g

Similarly, Figure 7c uses a confusing colors scheme to represent its data. The user must memorize these colors or constantly look up their meaning in a legend.

Figure 6h

If you find yourself adding a legend to a chart, it is a good time to ask yourself if that legend is necessary and clear. Can the data ink be made more clear (by shading or by labeling), eliminating the need for the legend?

Finally, we present the graph in Figure 6i.

Figure 6i

According to Tufte, this may be the worst graph ever presented. Only 5 pieces of data are represented in the graph, yet a 3-D graph was chosen for no apparent reason and the bright colors have no meaning whatsoever. It isn’t even obvious that the top and bottom numbers total 100%. This same data would be better presented in a simple table (Table 6a).

Year % Students 25 and older
1972 28.0
1973 29.2
1974 32.8
1975 33.6
1976 33.0

Table 6a

As much as possible, you should eliminate chart junk and ducks. These seldom enhance a graph and often distract the viewer from the actual data.

This is an ongoing series discussing the research of Dr. Edward Tufte on Data Visualization.

Friday, 13 May 2011 15:54:00 (GMT Daylight Time, UTC+01:00)
# Thursday, 12 May 2011

All graphs, charts and other data visualization pictures consist of “ink”. At one time, “ink” referred to physical ink because at one time all charts were printed on paper. Now, we can think of ink as anything drawn on either paper or the screen, even if that drawing is never printed to a sheet of paper.

Data-Ink is that part of the ink that represents the actual data. Another way to think of data ink is: the ink that, if we erased it, would reduce the amount of information in the graphic.

So, if only some of the ink represents data, what is the rest of the ink? The rest of the ink is taken up with metadata, redundant data, and decorations.
Generally, the more data-ink in a graphic, the more effective that graphic will be. Tufte defines the “Data-Ink Ratio” as [The Amount of Data-Ink in a graphic] divided by [The total Ink in the Graphic]. When creating charts and graphics, our goal should be to maximize the Data-Ink Ratio, within reason.
Consider the single data point represented by a bar chart in Figure 5a.

Figure 5a

The value of that point is represented by the following
•    The height of the vertical line along the left side of the bar;
•    The height of the vertical line along the right side of the bar;
•    The height of the horizontal line along the top of the bar;
•    The height of the the colored area within the bar;
•    The height of the number atop the bar; and
•    The value of the number atop the bar.

Six different elements in this graph all represent the same value – a tremendous amount of redundant data. This graph has a very low Data-Ink Ratio.

The problem is even worse if we make the bar chart 3-dimensional as in Figure 5b.

Figure 5b

Let’s look at an example of a graph with a low Data-Ink Ratio and try to fix it. Figure 5c reports some linear data points on a surface that looks like graph paper.

Figure 5c

In this figure, the dark gridlines compete with data points for the viewer’s attention. We can eliminate some of these gridlines and lighten the others to reduce the Data-Ink Ratio and make the data more obvious.

Figure 5d

Spreadsheet makers discovered this a long time ago when they decided to lighten the borders between cells in order to make these borders (metadata) less obvious than the numbers inside the cells (data). In the case of this graph, we probably don’t need gridlines at all. Eliminating them entirely (Figure 5e) reduces the Data-Ink Ratio with no further loss of information.

Figure 5e

If we look around the remaining parts of the graph, we can find more non-Data-Ink that is a candidate for elimination. The top and right borders certainly don’t provide any information. And the axes are just as readable if we eliminate half the numbers.

Figure 5f

Figure 5g shows a graph by chemist Linus Pauling, mapping the Atomic Number and Atomic Volume of a number of different elements.

Figure 5g

Pauling has removed the gridlines, but he has left in the grid intersections – tiny crosses that distract from the data. We can safely eliminate these crosses to reduce the Data-Ink Ratio and make the graph more readable (Figure 5h)

Figure 5h

One could argue that the dashed lines between the data points are metadata and that removing them would increase the Data-Ink Ratio. However, if we do so (Figure 5i), the graph becomes less clear, because the lines help group together elements in the same Periodic Table row.

Figure 5i

This is why our goal is to increase the Data-Ink Ratio, within reason. Sometimes it is necessary to add back some non-Data-Ink in order to enhance the graph.
Figure 5j shows another example when redundant data can enhance a graph’s readability.

Figure 5j

The top picture is the train schedule from part 2 of this series. Notice that some of the diagonal lines stop at the right edge and continue from the left edge of the chart. These are scheduled train rides that leave a station before 6AM, but don’t arrive at a destination until after 6AM. In the bottom picture, I have copied the first 12 hours of the chart and pasted it on the right, ensuring that every route line appears at least without a break.

Figure 5k

Now, if we could just get rid of those gridlines…

This is an ongoing series discussing the research of Dr. Edward Tufte on Data Visualization.

Thursday, 12 May 2011 13:30:00 (GMT Daylight Time, UTC+01:00)