Tag Archives: Chart

Bar chart with a log axis, “NEVER”! says the Biz Intel Guru

I’m a big fan of the team at SAS that works on the SG (statistical graph) procedures. Their work enables others to tell richly detailed stories by leveraging SG procedures. The team is led by Sanjay Mantange. Just 3 days ago I attended a session at the SAS Global Forum (SASGF12) where Sanjay spoke about the work he and his team have done for SAS version 9.3. It was obvious from the meeting that Sanjay and his team are incredibly user-focused and are really good at what they do.

So I was surprised today when I read Sanjay’s most recent blog update and saw this chart.

Bar chart with a log axis

There are a handful of ways you can ruin a bar chart. One way is to make them 3D. Why is 3D bad, read this for details. Another way to wreck a bar chart is to start out the numeric axis at a value that isn’t zero. Bar charts are only effective when we can use the length of each bar to make rapid comparisons. If one bar is twice as long as another bar, then we expect the value to be twice as much as the other bar. By starting a bar chart with something other than zero, you are telling a visual lie because we can’t use the length of the bar to compare the magnitude of the differences. When Sanjay created a bar chart with a log axis, he violated the expectation of anyone who reads the chart because we can’t use the length of the bars to directly compare values. A simple table would’ve worked much better. And sorting the table by horsepower would be an even better option, as you can see below.

Table showing horsepower comparison

Horsepower comparision

What Sanjay did came from a good place. He says in his blog post that a few people mentioned to him that they wanted to create a bar chart with a log axis. But just because people want something, doesn’t mean you should give it to them. Sanjay is an expert in his field. Rather than satisfying the customer’s request, he might have offered up a better alternative, like a dot plot.

Dotplot

Better, a dotplot alternative to log scale bar chart

The dot plot doesn’t have the same problem as the bar chart, we’re not comparing lengths of bars, we’re looking at the position of the dot along the X axis. Stephen Few has a great guest post by Info Viz superstar, Dr. Naomi Robbins, about dot plots, and how, in the right circumstances, they can be a great alternative to bar charts. That paper can be found here.

In this instance SAS would’ve better served their customers by offering up the dot plot as an alternative to a log scaled bar chart. As information visualizers, it’s our job to help people see things clearly. It’s not an easy thing to do, but there are consequences when we get it wrong. Those consequences range from wasting people’s time in meetings, to missing important opportunities, to the destruction of the space shuttle challenger and the death of the 7 astronauts aboard (thanks Edward Tufte).

When it comes to creating clear and insightful graphs, the Customer isn’t always right.

So, what do you think? Are there exceptions to the bar chart rules laid out above? Was SAS right in giving the customer what they wanted?

Old Spice Guy’s popularity on Twitter charted

Old Spice recently released about 14 ads with The Old Spice Guy (OSG) personally responding to Tweets from 14 celebrities. Some of the celebs are Hollywood types, others are Web Celebs like Guy Kawasaki, Biz Stone, Kevin Rose. You can see OSG’s video replies here. They are great.

I put together a chart showing the number of Tweets that mention the words ‘old’ and ‘spice’. The chart shows just how quickly the Twitterverse filled up with Tweets about the OSG. Before 9am on July 13th, there was hardly any mention of the OSG, but then, within 6 hours, there’s a spike of about 2,300 tweets per hour about Old Spice. Alas, nothing lasts forever, and after peaking at 4,500 Tweets per hour, the Twitterverse quieted down and settled at around 400 Tweets per hour about the OSG.

BTW, the OSG says he’s hung up his towel.

Chart of the Old Spice Guy's popularity on Twitter

OSG Trend

Bar graphs with a non-zero baseline? “Never”! says Biz Intel Guru. Here’s why…

Trying to understand the economy is tough business. Publishing your predictions about the economy on the web is even more difficult. So I was surprised when I came across a paper on economy.com’s website titled, “The Economic Impact of the American Recovery and Reinvestment Act” and noticed this chart.

zandi_unemployement

unemployment rate bar chart

The graph in question was taken from page 13 of the paper, written by Mark Zandi. It’s also featured on his homepage, here. Dr. Zandi is the chief-economist and co-founder of economy.com with a knack for verbally explaining complex things so clearly that non-economists can understand them. He is often heard on NPR and quoted in the WSJ and NYTimes weighing in on the economy. I’ve followed his career for over 15 years and respect his insights and success. It is out of that respect and admiration that I critique this graph.

The main problem with this bar chart is that it is telling two visual lies. The first one is quite serious, the second one, less so.

Bar charts must have a zero-based axis because we use the length of the bars to compare one bar to another bar. By breaking this rule economy.com’s unemployment rate chart makes it look like the unemployment rate will increase 6 fold from 2008Q3 to 2010Q4 without the stimulus, when in fact, the estimated increase is from roughly 6% to 11%, less than a 2x. The lack of a zero baseline also adds a false visual comparison between the ‘economic stimulus’ and ‘no economic stimulus bars’. For that let’s look at  bars in 10Q4. The ‘no economic stimulus bar’ (blue bar) is about 11.2% versus the ‘economic stimulus’ (black bar) of 8.5%. The actual difference between the two percentages is 1.3x, but take a look at the length of the bars and the difference appears to be 2x.

I know Dr. Zandi had good intentions when he went with 5 as his starting value on the Y axis. His intent was make the chart better show the trend over time, but in using a bar chart to display the data, he choose the wrong chart. What should he have used? Read on.

The second visual lie being told here is caused by the third dimension on the graph. Can we tell what the unemployment rate is expected to be in Q4 of 2010 with and without the stimulus? Looks to me like the no stimulus unemployment rate is expected to come in at 11.2% and the unemployment rate with stimulus is expected to be 8.5%. The angling of the Y axis makes it hard for the eye to track over to the value of the bar. To add insult to injury, the angle at the top of each bar makes it difficult to figure out where the ending value of the bar is. Should we reference the front side of the bar or the backside? Unfortunately, the corresponding data this graph is drawn from are not available from economy.com, so we can’t tell for sure where the points are. But we can try a little experiment.

bad_3d

3d bar chart is misleading

I whipped up the chart on the right using MS Excel 2007. The values for A, B, C, D are 10, 20, 30, 40 respectively. I’ve added the actual values to the top of each bar to make it a little easier to read. This 3D chart is actually insightful because it illustrates a serious problem with 3D charts–the bars misrepresent the data. Column D should line up with 40, but it doesn’t, it’s more like 38. If you’re telling a story as important as what’s going to happen to the economy after spending nearly $800 billion in taxpayer money, you should stay away from 3D bar charts because they tell lies about the data they represent.

And that brings us to the final flaw with this chart. Bar charts are generally best used for categorical or grouped data. For time-series data we usually want to go with a line chart, not a bar chart. The lines in the line chart help our eyes see trends in the data better than the individual bars in the bar chart. Line charts also allow us to start from a non zero baseline which allows the graph’s creator to show the trend by setting the min and max values slightly above and slightly below the max and min values of the data.

Now let’s compare a non 3D bar chart to a line chart. Same data on each chart. I don’t have quarterly data in either graph, just yearly because the only hard data available in Dr. Zandi’s paper was yearly.

zandi_bars_final

Bar charts must have a zero baseline

zandi_lines_final

the BI Guru's improved line chart for time series data

I obeyed the cardinal rule of the zero baseline on the bar chart, and you can see that the magnitude of the difference between stimulus and non stimulus unemployment isn’t nearly as overstated as it was on the original chart. Even more important, the trend is much easier to grasp from the line chart than the bar chart. Notice how it just about leaps off the chart? With the bar chart, you need to go back and forth one or two times to discern the trend.

Lastly, I chose a soft, somewhat natural color pallete to draw these charts. They’re much more pleasing to the eyes than black and blue.

–John

The Business Intelligence Guru

Reblog this post [with Zemanta]

Bar chart with a non-zero baseline? “Never”! says Biz Intel Guru. Here’s why…

Trying to understand the economy is tough business. Publishing your predictions about the economy on the web is even more difficult. So I was surprised when I came across a paper on Economy.com’s website titled, “The Economic Impact of the American Recovery and Reinvestment Act” and noticed this bar chart.

zandi_unemployement

unemployment rate bar chart

The bar chart in question was taken from page 13 of a paper, written by Mark Zandi. It’s also featured on his homepage, here. Dr. Zandi is the chief-economist and co-founder of Economy.com with a knack for verbally explaining complex things so clearly that non-economists can understand them. He is often heard on NPR and quoted in the WSJ and NYTimes weighing in on the economy. I’ve followed his career for over 15 years and respect his insights and success. It is out of that respect and admiration that I critique this 3D bar chart.

The main problem with this bar chart is that it is telling two visual lies. The first one is quite serious, the second one, less so.

A bar chart must have a zero-based axis because we use the length of the bars to compare one bar to another bar.
By breaking this rule Economy.com’s unemployment rate chart makes it look like the unemployment rate will increase 6 fold from 2008Q3 to 2010Q4 without the stimulus. In fact, the estimated increase is from roughly 6% to 11%, less than a 2x. The lack of a zero baseline also adds a false visual comparison between the ‘economic stimulus’ and ‘no economic stimulus bars’. For that let’s look at the two bars in 10Q4. The ‘no economic stimulus bar’ (blue bar) is about 11.2% versus the ‘economic stimulus’ (black bar) of 8.5%. The actual difference between the two percentages is 1.3x, but take a look at the length of the bars and the difference appears to be 2x.

I know Dr. Zandi had good intentions when he went with 5 as his starting value on the Y axis. His intent was make the bar chart better show the trend over time, but in using a bar chart to display the data, he choose the wrong chart. What should he have used? We’ll answer that question in a minute.

The second visual lie being told here is caused by the third dimension on the bar chart. Can we tell what the unemployment rate is expected to be in Q4 of 2010 with and without the stimulus? Looks to me like the no stimulus unemployment rate is expected to come in at 11.2% and the unemployment rate with stimulus is expected to be 8.5%. The angling of the Y axis makes it hard for the eye to track over to the value of the bar. To add insult to injury, the angle at the top of each bar makes it difficult to figure out where the ending value of the bar is. Should we reference the front side of the bar or the backside? Unfortunately, the corresponding data this graph is drawn from are not available from Economy.com, so we can’t tell for sure where the points are. But we can try a little experiment.

bad_3d

3d bar chart is misleading

I whipped up the chart on the right using Excel 2007. The values for A, B, C, D are 10, 20, 30, 40 respectively. I’ve added the actual values to the top of each bar to make it a little easier to read. This 3D chart is actually insightful because it illustrates a serious problem with a 3D bar chart–the bars misrepresent the data. Column D should line up with 40, but it doesn’t, it’s more like 38. If you’re telling a story as important as what’s going to happen to the economy after spending nearly $800 billion in taxpayer money, you should stay away from 3D bar charts because they tell lies about the data they represent.

And that brings us to the final flaw with this bar chart. Bar charts are generally best used for categorical or grouped data. For time-series data we usually want to go with a line chart, not a bar chart. The lines in the line chart help our eyes see trends in the data better than the individual bars in the bar chart. Line charts also allow us to start from a non zero baseline which allows the graph’s creator to show the trend by setting the min and max values slightly above and slightly below the max and min values of the data.

Now let’s compare a non 3D bar chart to a line chart. Same data on each chart. I don’t have quarterly data in either graph, just yearly because the only hard data available in Dr. Zandi’s paper was yearly.

zandi_bars_final

Bar charts must have a zero baseline

zandi_lines_final

the BI Guru's improved line chart for time series data

I obeyed the cardinal rule of the zero baseline on the bar chart, and you can see that the magnitude of the difference between stimulus and non stimulus unemployment isn’t nearly as overstated as it was on the original chart. Even more important, the trend is much easier to grasp from the line chart than the bar chart. Notice how it just about leaps off the chart? With the bar chart, you need to go back and forth one or two times to discern the trend.

Lastly, I chose a soft, somewhat natural color palette to draw these charts. They’re much more pleasing to the eyes than black and blue.

–John

The Business Intelligence Guru

Reblog this post [with Zemanta]