Tag Archives: SAS graph

Bar chart with a log axis, “NEVER”! says the Biz Intel Guru

I’m a big fan of the team at SAS that works on the SG (statistical graph) procedures. Their work enables others to tell richly detailed stories by leveraging SG procedures. The team is led by Sanjay Mantange. Just 3 days ago I attended a session at the SAS Global Forum (SASGF12) where Sanjay spoke about the work he and his team have done for SAS version 9.3. It was obvious from the meeting that Sanjay and his team are incredibly user-focused and are really good at what they do.

So I was surprised today when I read Sanjay’s most recent blog update and saw this chart.

Bar chart with a log axis

There are a handful of ways you can ruin a bar chart. One way is to make them 3D. Why is 3D bad, read this for details. Another way to wreck a bar chart is to start out the numeric axis at a value that isn’t zero. Bar charts are only effective when we can use the length of each bar to make rapid comparisons. If one bar is twice as long as another bar, then we expect the value to be twice as much as the other bar. By starting a bar chart with something other than zero, you are telling a visual lie because we can’t use the length of the bar to compare the magnitude of the differences. When Sanjay created a bar chart with a log axis, he violated the expectation of anyone who reads the chart because we can’t use the length of the bars to directly compare values. A simple table would’ve worked much better. And sorting the table by horsepower would be an even better option, as you can see below.

Table showing horsepower comparison

Horsepower comparision

What Sanjay did came from a good place. He says in his blog post that a few people mentioned to him that they wanted to create a bar chart with a log axis. But just because people want something, doesn’t mean you should give it to them. Sanjay is an expert in his field. Rather than satisfying the customer’s request, he might have offered up a better alternative, like a dot plot.

Dotplot

Better, a dotplot alternative to log scale bar chart

The dot plot doesn’t have the same problem as the bar chart, we’re not comparing lengths of bars, we’re looking at the position of the dot along the X axis. Stephen Few has a great guest post by Info Viz superstar, Dr. Naomi Robbins, about dot plots, and how, in the right circumstances, they can be a great alternative to bar charts. That paper can be found here.

In this instance SAS would’ve better served their customers by offering up the dot plot as an alternative to a log scaled bar chart. As information visualizers, it’s our job to help people see things clearly. It’s not an easy thing to do, but there are consequences when we get it wrong. Those consequences range from wasting people’s time in meetings, to missing important opportunities, to the destruction of the space shuttle challenger and the death of the 7 astronauts aboard (thanks Edward Tufte).

When it comes to creating clear and insightful graphs, the Customer isn’t always right.

So, what do you think? Are there exceptions to the bar chart rules laid out above? Was SAS right in giving the customer what they wanted?

Do you know the simplest, yet most overlooked lesson of Business Intelligence?

Below is a data set with 4 groupings of data and 2 columns for each grouping. The summary statistics–mean, variance, correlation, sum of squares, r², and linear regression line are the same for all 4 groupings of X and Y values. If we stopped our analysis here we could move forward confidently knowing that the 4 groups of data are the same. And we’d be dead wrong.

anscombes quartet

visualize these data

In my 15 years in analytics I’ve seen good analysts, time and again, stop their analytical efforts when their data summaries don’t tell a compelling story. I’ve sat through hours of meetings, going through page after page of data related to critical financial forecasts, looking at historical trends going back years, without seeing a single graph to show a trend. For whatever reason, data exploration for many analysts starts and ends with a table of summary statistics describing the data. What a shame. In relying on summary statistics we give short thrift to one of our most powerful assets–our eyes.

To see what I mean, click here.

For years Edward Tufte and Stephen Few have been telling the BI community to, “above all else, show the data”. Make your intelligence visible. Go beyond the summary look of your data and show it, warts and all. In fact, the Business Intelligence Guru recommends looking at graphic representations of your data before you even look at summary statistics. There are tools available today that make looking at graphic distributions of data easier than ever. I have years of experience using JMP (link will take you to a fully functional 30 day free trial), from SAS, which has a distribution engine that makes it a snap to look at distributions. Even SAS graph, with its new statistical graph (sg) procedures in version 9.2 make it a snap to view your data up close and personal.

Lastly, I didn’t invent the data that I’m using to make my point. I came across two references last week that made me think that I should write about it. I watched an info viz legendJeff Heer, tell a story making the case for info viz. I didn’t realize it then, but that story he told actually dated back to 1973 and also appeared on the first page of Chapter 1 in Edward Tufte’s book, “The Visual Display of Quantitative Information” published in 2001. The story goes to the heart of why we need to show the data.

The credit for this eye-opening example goes to F.J. Anscombe, a statistician who created this data set in 1973 to make the case for graphing data before analyzing data. He was a man ahead of his time.