# Wednesday, May 25, 2011

I’ve spent nearly 20 years working in technology. From my university days studying Computer Engineering; through my years managing a Lan Manager® network and writing FoxPro applications; to my time consulting with companies to help them build scalable applications to solve their business problems. I work with a wide variety of software and hardware tools. I’ve become proficient with some and I’ve developed the ability to quickly get up to speed on most tools.

But am I a technologist? Is the focus of my job to use computers, software and languages? Am I paid because of my expertise in a specific technology? Do customers value my computer skills over my other skills?

I never describe my professional self as an “expert” in anything. Instead, I emphasize experience, my learning abilities, and my problem-solving skills. Occasionally, a salesperson will tout my deep, technical knowledge on a topic, but I caution them against this, because it is not my greatest strength. My greatest strengths are the abilities to understand problems, to learn almost anything, to apply knowledge appropriately to a problem, and to share with others what I have learned.

I would argue that I am not a technologist – at least not primarily. As a consultant, my primary purpose is to add value to the customer. I do this by solving business problems. Some of the tools I use to solve those problems are types of computer hardware and software. But those are not the most important tools. The most important tools I use are communication skills and reasoning ability. It may be that the solution to my customer’s problem involves very little technical changes or even none at all. If it does involve software (which is usually the case), my application of that software is far more important than the bits within it.

I’ve seen a number of consultants who are focused on their technology of choice that they don’t seek a solution outside that area. If all you know is BizTalk or SharePoint or Lotus Notes, it’s very tempting to define business problems in terms that can be associated with your favorite tool. The popular expression to define this attitude is: “If all you have is a hammer, everything looks like a nail.”

For me, the solution is the important thing. Maybe it’s an advantage that I never immersed myself in a single technology. Maybe this keeps my mind more open to alternative solutions. If I need expertise in with a particular tool, I can either learn it or find someone who knows it well.

Does this mean that there is no value in deep technical knowledge of a topic? Of course not! There is great value in learning technology. The more we know, the more we can apply that knowledge to business problems. But it is the application of the knowledge that adds the most value – not the knowledge itself.

This mind-set becomes even more important when you consider the how international the software business has become. You may be a very good C# programmer. But, if you live in America, there is likely to be a very good C# programmer in India who is willing to do the same work for much less. And if you live in India, there is probably a very good C# programmer in China who is willing to work for much less. And if you live in China, keep your eyes open, because other parts of the world are developing these skills and they are anxious to penetrate this market and are able to charge even lower rates. It’s no longer possible to compete only on price (and still make a decent living) and it’s not enough to compete only on technical skill. The ability to solve complex business problems and apply the right technology can be the differentiator that allows you to compete in a global market.

Keep this in mind as you look for solutions to problems presented by your customer or employer. Focus on adding value to the business, rather than on applying a particular set of skills.

But in the end, I think I serve my customers better because I think of myself as a problem-solver rather than as a technologist.

Wednesday, May 25, 2011 12:52:00 AM (GMT Daylight Time, UTC+01:00)
# Monday, May 23, 2011
Monday, May 23, 2011 4:39:00 PM (GMT Daylight Time, UTC+01:00)
# Monday, May 16, 2011
Monday, May 16, 2011 3:26:00 PM (GMT Daylight Time, UTC+01:00)
# Sunday, May 15, 2011

It’s Sunday morning and I’ll be checking out of this hotel and heading home soon. I’m digesting what I learned and experienced yesterday at the Chicago Code Camp in Grayslake, OH.

The code camp offered 5 session slots. I sat in on 2 sessions and spoke at 1.

I began the day watching John Petersen describe Dependency Injection in ASP.Net MVC 3. With version 3 of MVC, Microsoft introduced a pattern for wiring up Dependency Injection Frameworks as the application starts up. John explained how to wire up a DI framework, such as StructureMap or Unity in your application. His examples helped to clarify this concept. I do a lot of work in ASP.Net MVC, but I have not had a chance to explore many of the new features included in version 3. John was kind enough to sit with me after his session and answer a few of my questions. A few months ago, I wrote and submitted a book chapter on MVC and I was recently asked to update this chapter to reflect the changes in version 3, so this topic helped me directly.

Session 2 was on Coded UI testing. Eric Boyd went out of his way to get the crowd involved. He brought gifts of books and t-shirts to give away to anyone who contributed a good anecdote or question. Using Visual Studio Ultimate, Eric walked through recording a UI test, showed the generated code, then showed the generated code and added assertions and other modifications to the test. He also briefly demoed the Microsoft Test Professional,  a tool for testers to script and record the results of manual tests; and data-driven tests, which allows you to run a single test multiple times with a variety of inputs.

I was scheduled to speak in slot 4 and, as is my practice, I hid away during session 3 to prepare for my talk. This was a brand new talk for me titled “How I Learned To Stop Worrying and Love jQuery”. I tried to show how much easier a framework like jQuery can make JavaScript development. I prepared and showed a lot of code demos on using selectors and binding events. Unfortunately, I ran out of time and had to rush through my Ajax demo and did not get to my jQueryUI demos. Still, the room was full (30+ people) and the audience was engaged. I’m scheduled to give the talk again at MADExpo, so I will tighten it up before then.

During the final session of the day, I recorded two episodes of Technology and Friends. I was introduced to Micah Martin, Dave Hoover, and Ashish Dixit via Twitter and got to meet them after my session. They have a passion for mentoring and apprenticeship programs and we talked about this on camera for a half hour. Next, I recorded a show on the SOLID principles with Chander Dhal, who is a fellow Telerik insider and a world-class yoga practitioner.

The best thing about this conference was the new people I met. Most are not from my geographic area or from my area of knowledge, so I felt my boundaries expand.

It’s time now to pack up, pick up Mike, and drive back to Michigan. I need to prepare for my next trip.

Sunday, May 15, 2011 1:04:59 PM (GMT Daylight Time, UTC+01:00)
# Saturday, May 14, 2011
# Friday, May 13, 2011

In the last section, we described the difference between Data-Ink and non-Data-Ink. You will recall that Data-Ink is the part of the graph that directly represents the underlying data. If we remove any of the Data-Ink, the graph will contain less information. Non-Data-Ink consists of everything else and, typically, does not add to the information in the graph.

Dr. Tufte describes the extraneous drawings on a graph as “Chart Junk”. Chart Junk consists of gridlines, vibrations, and “ducks”.

We described gridlines in the last section. Gridlines can typically be removed (or at least reduced in quantity and weight) without making the graph any less meaningful.

Vibrations are caused by excessive use of patterns in a graph. Most graphing software allows you to fill a shape with a pattern, such as diagonal parallel lines or wavy lines. These patterns can create a sense of motion in the graph (which is probably why artists like them), but that motion illusion can be very distracting (which is why they should be avoided).

Below are several examples of graphs with excess shading. If we look at these graphs for a minute, they appear to shimmer or vibrate – an effect known as the Moire’ Effect. Replacing the patterns with shading or colors simplifies them and makes them much cleaner.


Figure 6a


Figure 6b


Figure 6c

Figure 6c fills each bar with a pattern to distinguish one from another; then identifies the patterns with a Legend to the right of the graph. This forces the user to look between the graph and the legend to figure out what each means. A much better graph would simply label the bars directly, eliminating the need for the legend (Figure 6d).


Figure 6d

The chart in Figure 6e is presented in 3 dimensions and with a rounded top to each bar. Neither of these properties are necessary to represent the underlying data and are therefore Chart Junk. Eliminating these properties would reduce the ink without reducing the Data-Ink, increasing the Data-Ink ratio.


Figure 6e

Edward Tufte had a name for chart elements that added no value whatsoever, other than to distract or entertain the viewer. He called them “Ducks". The term “Duck” came from a building Tufte found in Flanders, NY (Figure 6f). The building was built to look like a duck, even though that shape did not enhance the building’s functionality in any way – The shape called attention to the building and served no other purpose.


Figure 6f

Figure 6g represents data about irrigation and its uses in California. Unfortunately, the large number of colors makes the graph difficult to quickly grasp – the viewer must look back between the blocks of color and the color key to interpret the meaning of each square. Tufte considered these blocks of color to be Ducks.


Figure 6g

Similarly, Figure 7c uses a confusing colors scheme to represent its data. The user must memorize these colors or constantly look up their meaning in a legend.


Figure 6h

If you find yourself adding a legend to a chart, it is a good time to ask yourself if that legend is necessary and clear. Can the data ink be made more clear (by shading or by labeling), eliminating the need for the legend?

Finally, we present the graph in Figure 6i.


Figure 6i

According to Tufte, this may be the worst graph ever presented. Only 5 pieces of data are represented in the graph, yet a 3-D graph was chosen for no apparent reason and the bright colors have no meaning whatsoever. It isn’t even obvious that the top and bottom numbers total 100%. This same data would be better presented in a simple table (Table 6a).

Year % Students 25 and older
1972 28.0
1973 29.2
1974 32.8
1975 33.6
1976 33.0

Table 6a

As much as possible, you should eliminate chart junk and ducks. These seldom enhance a graph and often distract the viewer from the actual data.


This is an ongoing series discussing the research of Dr. Edward Tufte on Data Visualization.

Friday, May 13, 2011 3:54:00 PM (GMT Daylight Time, UTC+01:00)
# Thursday, May 12, 2011

All graphs, charts and other data visualization pictures consist of “ink”. At one time, “ink” referred to physical ink because at one time all charts were printed on paper. Now, we can think of ink as anything drawn on either paper or the screen, even if that drawing is never printed to a sheet of paper.

Data-Ink is that part of the ink that represents the actual data. Another way to think of data ink is: the ink that, if we erased it, would reduce the amount of information in the graphic.

So, if only some of the ink represents data, what is the rest of the ink? The rest of the ink is taken up with metadata, redundant data, and decorations.
Generally, the more data-ink in a graphic, the more effective that graphic will be. Tufte defines the “Data-Ink Ratio” as [The Amount of Data-Ink in a graphic] divided by [The total Ink in the Graphic]. When creating charts and graphics, our goal should be to maximize the Data-Ink Ratio, within reason.
Consider the single data point represented by a bar chart in Figure 5a.


Figure 5a

The value of that point is represented by the following
•    The height of the vertical line along the left side of the bar;
•    The height of the vertical line along the right side of the bar;
•    The height of the horizontal line along the top of the bar;
•    The height of the the colored area within the bar;
•    The height of the number atop the bar; and
•    The value of the number atop the bar.

Six different elements in this graph all represent the same value – a tremendous amount of redundant data. This graph has a very low Data-Ink Ratio.

The problem is even worse if we make the bar chart 3-dimensional as in Figure 5b.


Figure 5b

Let’s look at an example of a graph with a low Data-Ink Ratio and try to fix it. Figure 5c reports some linear data points on a surface that looks like graph paper.


Figure 5c

In this figure, the dark gridlines compete with data points for the viewer’s attention. We can eliminate some of these gridlines and lighten the others to reduce the Data-Ink Ratio and make the data more obvious.


Figure 5d

Spreadsheet makers discovered this a long time ago when they decided to lighten the borders between cells in order to make these borders (metadata) less obvious than the numbers inside the cells (data). In the case of this graph, we probably don’t need gridlines at all. Eliminating them entirely (Figure 5e) reduces the Data-Ink Ratio with no further loss of information.


Figure 5e

If we look around the remaining parts of the graph, we can find more non-Data-Ink that is a candidate for elimination. The top and right borders certainly don’t provide any information. And the axes are just as readable if we eliminate half the numbers.


Figure 5f

Figure 5g shows a graph by chemist Linus Pauling, mapping the Atomic Number and Atomic Volume of a number of different elements.


Figure 5g

Pauling has removed the gridlines, but he has left in the grid intersections – tiny crosses that distract from the data. We can safely eliminate these crosses to reduce the Data-Ink Ratio and make the graph more readable (Figure 5h)


Figure 5h

One could argue that the dashed lines between the data points are metadata and that removing them would increase the Data-Ink Ratio. However, if we do so (Figure 5i), the graph becomes less clear, because the lines help group together elements in the same Periodic Table row.


Figure 5i

This is why our goal is to increase the Data-Ink Ratio, within reason. Sometimes it is necessary to add back some non-Data-Ink in order to enhance the graph.
Figure 5j shows another example when redundant data can enhance a graph’s readability.


Figure 5j

The top picture is the train schedule from part 2 of this series. Notice that some of the diagonal lines stop at the right edge and continue from the left edge of the chart. These are scheduled train rides that leave a station before 6AM, but don’t arrive at a destination until after 6AM. In the bottom picture, I have copied the first 12 hours of the chart and pasted it on the right, ensuring that every route line appears at least without a break.


Figure 5k

Now, if we could just get rid of those gridlines…


This is an ongoing series discussing the research of Dr. Edward Tufte on Data Visualization.

Thursday, May 12, 2011 1:30:00 PM (GMT Daylight Time, UTC+01:00)
# Wednesday, May 11, 2011

Unless a graph provides context, it can fail to give a complete picture of the data it represents. For example, Figure 4a shows the deaths due to traffic accidents in Connecticut in 1955 and 1956.


Figure 4a

These periods were chosen because the state of Connecticut chose to increase the enforcements of speed limits. From the graph, it appears that this increased enforcement saved about 40 lives. However, it’s not possible to make this conclusion because we don’t know what happened prior to 1955 or after 1956. Were traffic deaths in Connecticut already on the increase before the increased enforcement? Did deaths go up again in the years following 1956? The graph during the rest of the decade could have looked like any of the following


Figure 4b

In fact, the graph looked a lot like Figure 4c, which shows traffic deaths on the rise prior to 1955 and continuing to fall after 1956.


Figure 4c

Figure 4d shows even more context for the data.


Figure 4d

In this graph, we see the number of deaths per 100,000 for the entire decade for each state contiguous to Connecticut. While traffic deaths in New York, Massachusetts, and Rhode Island tended to increase or remain steady after 1956, Connecticut’s traffic death rate went down. This context provides strong evidence that Connecticut’s speeding enforcement was effective in its goal of saving lives.

To maximize the meeting supplied by a data graphic, always provide context for that data.


This is an ongoing series discussing the research of Dr. Edward Tufte on Data Visualization.

Wednesday, May 11, 2011 1:10:00 PM (GMT Daylight Time, UTC+01:00)
# Tuesday, May 10, 2011

The primary goal of creating a chart or graph should always be to accurately represent the data on which that chart is based. In his research on data visualization, Dr. Edward Tufte found numerous charts that misled viewers about the underlying data.
The first example is from the annual report of a mining company and it illustrates net income of that company during a 5-year stretch.


Figure 3a

The income (or loss) each year is represented by a tall, vertical bar. What is not obvious from this picture is that the company lost $11,014 in 1970. That loss is represented in the picture by a tall bar because the company chose an arbitrary baseline of about -$4 million. Showing $0 on the graph would have made it far more credible. It’s difficult for me to imagine any reason the company chose to represent this number in this way, other than to hide the loss and mislead potential investors. But if I notice something like this, I am inclined to doubt every graphic in the report.

Figure 3b shows a graphic from the New York Times that represents the average automobile fuel economy mandated by the US government each year between 1978 and 1985.


Figure 3b

The mandated miles per gallon increased each year as shown by the numbers along the right side of the drawing. The problem with this picture is that those numbers are represented by horizontal lines and those lines are not nearly proportional to the numbers. For example, the line representing 18 is 0.6 inches long, yet the line representing 27.5 is 5.3 inches long.

Tufte created a formula to quantify this kind of misleading graphic. He called it The Lie Factor. The Lie Factor is equivalent to the Size of the effect shown in the graphic, divided by the size of the effect in the data (Figure 3c)


Figure 3c

In the fuel economy example, the Data Increase is 53%, but the Graphical Increase is 783%, resulting in a Lie Factor of 14.8!


Figure 3d

Figure 3e below shows a more accurate representation of the fuel economy standards, which increase each year, but at a much less dramatic rate than shown in the NY Times graphic.


Figure 3e

Another way that a chart can distort the underlying data it is by attempting to represent 1-dimensional data points with 2-dimensional objects. Figure 3f shows an example of this.


Figure 3f

This figure shows that the percentage of doctors devoted to Family Practice dropped from 27% in 1984 to 12% in 1990. The number on the far left (27%) is a little over double the number on the far right (12%), so the picture of doctor on the left is a little more than twice as tall as the doctor on the right. The problem is that these doctors have width in addition to height and that the size of each doctor is proportional to both its width and its height. So the size of the doctor on the left is far more than twice the size of the doctor on the right. The data increases from left to right by 125%, but the picture increases by 406%, which is a lie factor of 406/125 = 3.8!
This problem is magnified when we try to represent 1-dimensional data with 3-dimensional drawings. In Figure 3g, each data point (the price of a barrel of oil in a given year) is represented by a picture of a barrel of oil.


Figure 3g

If we just looked at this as a 2D drawing, the lie factor would be about 9; But the metaphor presented by a 3D barrel causes the viewer to think about the volume capacity of each barrel. The capacity of the 1979 barrel is 27,000% more than the 1973 barrel, even though the price only increased by 554% during that time – a Lie Factor of 27,000 / 554 = 48.8!

Figure 3g has one other problem. The dollars are presented in Nominal Dollars – that is dollars that have not been adjusted for inflation. However, a dollar in 1979 was not nearly as valuable as a dollar in 1973. The data would be more realistic if it were presented in Real Dollars – dollars adjusted for inflation. Figure 3h is from the London Evening Times and shows similar data, but presents it with both Real Dollars and Nominal Dollars. You can see that the difference between the two lines is significant.


Figure 3h

In general, if you present monetary data across an extended period of time, you should adjust the monetary units for inflation during that time.

Figure 3i shows another way to mislead viewers.


Figure 3i

This bar chart shows commissions paid to travel agents by 4 different airlines during 3 consecutive periods. We can see that those commissions increased slightly from period 1 to period 2 and dropped significantly in period 3 for all 4 airlines. However, it is not at all obvious from this graph that period 3 is only 6 months long, while periods 1 and 2 are each 12 months long. It would be shocking if payments did not drop in the abbreviated period 3! This graph would be more accurate if the same units were used for all periods – either by annualizing Period 3 or by splitting the other periods into 6-month increments.

Takeaways

The key takeaways of Graphical Integrity are
•    Make sure that images are in proportion to the data it represents
•    #Dimensions in graph = #Dimensions in data
•    Use Real dollars, instead of deflated dollars


This is an ongoing series discussing the research of Dr. Edward Tufte on Data Visualization.

Tuesday, May 10, 2011 3:12:00 PM (GMT Daylight Time, UTC+01:00)
# Monday, May 9, 2011
Monday, May 9, 2011 3:45:00 PM (GMT Daylight Time, UTC+01:00)