Slate.com confuses readers with an awful graph

In his June 12, 2012 post titled, “The Rise of Chrome and the Fall of Internet Explorer“, Slate.com’s author, Matthew Yglesias took some data and made the following chart.

Confusing chart showing browser market share over time

This chart is supposed to show browser market share over time. Instead, it confused his readers to the level that, out of the 55 comments the post received (as of June 14, 2012), more than half were complaints about the bad chart.

Mr. Yglesias’ mistake isn’t uncommon. Excel has made it easy for anyone to create a graph, but Excel hasn’t made it easy for anyone to create a good graph. The Business Intelligence Guru wants all of you graph makers out there to KEEP IT SIMPLE. The only exceptions to the KEEP IT SIMPLE charting rule are for Charles Minard and Amanda Cox, they’ve got the chops to mix it up a little.

With the understanding that simpler is usually better, here’s a simple line chart reinterpreting Mr. Yglesias’ confusing graph. One addition, or subtraction, there’s no need to go out two decimal places on the Y axis. Those extra digits add no value to the graph, in fact, they eat up valuable space and lower the data to ink ratio.

Tableau to the rescue! How to improve Sunlight Foundation’s scatterplot showing that Congress speaks like Juveniles

On June 4th, Stephen Colbert started off his show by discussing a report by the Sunlight Foundation.

The Colbert Report

 

The report showed that Congress is getting dumber. Ok, that’s not exactly what the report showed, it showed that the speech levels of Congress have been declining since 2005. The Sunlight Foundation’s analysis of Congressional speech included this interesting scatterplot,

Ideology and grade level for Congress

Ideology and grade level for congress

This scatterplot does a few things well. First, it shows us the data. Every point is a current representative. Second, is uses color appropriately, red for Republicans, and blue for Democrats. Third, the fitted lines over grade level of speech add value. They show no correlation for the Democrats and they show a negative correlation for Republicans–that is, the grade level speech of Republicans declines as their voting record becomes more conservative. The scatterplot was made in R. A writeup on how it was made is here.

But the scatterplot also leaves some things to be desired. First off, none of the points are labeled. At the very least the outliers should have labels associated with them. We want to know, for example, who is that red dot speaking 5 grade levels above the average (it’s Dan Lungren)? And who are those dots on the far left and far right of each party? Labeling specific points in R probably isn’t easy. Also, it might be interesting to see if there’s a relationship between grade level speech, ideology, and tenure, so the points should be sized by the number of years in Congress.

After seeing the scatterplot, I wondered what it would look like in Tableau. So I put together the interactive viz below.

While I’m a capable Tableau user, I needed help from Tableau experts to keep the trendlines separate between the two parties. So I reached out to the Tableau Community and got help from Tableau experts Jonathan Drummey who came up with the idea of computing separate trendlines on each viz and then combining the vizs on a dashboard. Shawn Wallwork liked that idea and suggested adding confidence bands to the trendlines. Shawn also added quadrants to each graph, which I think was a brilliant move. I included those quadrants in my viz below. The horizontal sections of the quadrants show us the difference in grade level of speech, with the Democrats speaking at a 11.7 grade level and the Republicans speaking at a 11.2 grade level. Tableau Legend Joe Mako also chimed in with an elegant solution that allowed me to plot both charts with trendlines on a single chart. I think Joe’s solution is great. Having all the data on one chart allows the user to select data across both groups. Had I used 2 separate charts and pieced them together via a dashboard in Tableau, then the user wouldn’t be able to select points on both charts. Thank you Joe, Jonathan, and Shawn (DataViz Dude).

Also, with the Tableau viz the user can hover over a point and see which representative the point is associated with. In addition, the reader can also select a group of points and view the data in tabular format. That’s a really useful feature. Oh, also, Tableau Public (that’s what I’m using to show you the viz) is as inexpensive as R, as in, free.

Tableau is the better tool for this viz. It’s interactive, which gives the reader the ability to explore the data on their own. For example, go ahead and use the slider on top of the viz and exclude all representatives with less than 5 years tenure.

Bar chart with a log axis, “NEVER”! says the Biz Intel Guru

I’m a big fan of the team at SAS that works on the SG (statistical graph) procedures. Their work enables others to tell richly detailed stories by leveraging SG procedures. The team is led by Sanjay Mantange. Just 3 days ago I attended a session at the SAS Global Forum (SASGF12) where Sanjay spoke about the work he and his team have done for SAS version 9.3. It was obvious from the meeting that Sanjay and his team are incredibly user-focused and are really good at what they do.

So I was surprised today when I read Sanjay’s most recent blog update and saw this chart.

Bar chart with a log axis

There are a handful of ways you can ruin a bar chart. One way is to make them 3D. Why is 3D bad, read this for details. Another way to wreck a bar chart is to start out the numeric axis at a value that isn’t zero. Bar charts are only effective when we can use the length of each bar to make rapid comparisons. If one bar is twice as long as another bar, then we expect the value to be twice as much as the other bar. By starting a bar chart with something other than zero, you are telling a visual lie because we can’t use the length of the bar to compare the magnitude of the differences. When Sanjay created a bar chart with a log axis, he violated the expectation of anyone who reads the chart because we can’t use the length of the bars to directly compare values. A simple table would’ve worked much better. And sorting the table by horsepower would be an even better option, as you can see below.

Table showing horsepower comparison

Horsepower comparision

What Sanjay did came from a good place. He says in his blog post that a few people mentioned to him that they wanted to create a bar chart with a log axis. But just because people want something, doesn’t mean you should give it to them. Sanjay is an expert in his field. Rather than satisfying the customer’s request, he might have offered up a better alternative, like a dot plot.

Dotplot

Better, a dotplot alternative to log scale bar chart

The dot plot doesn’t have the same problem as the bar chart, we’re not comparing lengths of bars, we’re looking at the position of the dot along the X axis. Stephen Few has a great guest post by Info Viz superstar, Dr. Naomi Robbins, about dot plots, and how, in the right circumstances, they can be a great alternative to bar charts. That paper can be found here.

In this instance SAS would’ve better served their customers by offering up the dot plot as an alternative to a log scaled bar chart. As information visualizers, it’s our job to help people see things clearly. It’s not an easy thing to do, but there are consequences when we get it wrong. Those consequences range from wasting people’s time in meetings, to missing important opportunities, to the destruction of the space shuttle challenger and the death of the 7 astronauts aboard (thanks Edward Tufte).

When it comes to creating clear and insightful graphs, the Customer isn’t always right.

So, what do you think? Are there exceptions to the bar chart rules laid out above? Was SAS right in giving the customer what they wanted?

SAS and Twitter–how to harness SAS to grab data from Twitter in 2 easy steps

I recently published a post titled, “4 Key Tweeting Attributes of Guy Kawasaki in one Infographic.” I made extensive use of SAS to gather and manipulate the data from Twitter. Turns out, SAS is pretty awesome for this type of work. In this post I’m going to document how to use SAS to gather data from Twitter’s API. My next post on SAS and Twitter will build off of this one and teach you how to gather data about your subject’s followers, find ReTweets, and listen in on conversations. Click here to get that post delivered to your inbox as soon as it’s published.

First off, you might wonder, why do this? Well, successful analyzers of the future will be adept at analyzing all sorts of data, including data from social networks, like Twitter. Also, if you’re looking to market your analytical skills, what hiring manager wouldn’t be impressed with someone who gathered data from Twitter’s API with SAS, then mined, analyzed, and presented the data in a compelling way. Oh, almost forgot, because you’re analyzing a current event (it’s on Twitter, right?) and mentioning Twitter in your post, your analysis will be more search engine friendly, so you’ll likely get a wider and more targeted audience than if you analyzed something outside of the Twitterverse. Some smart analyzers have even been known to analyze Tweets about their target employer and use the analysis to help get themselves hired. On a larger scale, this is almost exactly what Seth Godin has done with Brands in Public.

Before we get started I have to tell you a little about Twitter’s rate limiting policy. Unfortunately, the search area of Twitter’s API doesn’t have a hard rate limit. Rather, Twitter says they allow a rate limit quite a bit higher than their standard 150 hits/hour, but they decline to say how much. Full documentation can be found here, about 1/2 down the page. I have run afoul of the limit before and guess that it’s around 600 hits per hour or more than 30 per minute. When you exceed the unpublished rate, you have to wait between 1-3 hours for your ip address to be allowed to his Twitter again. If you’re just searching for someone’s post, like we’re doing with Guy Kawasaki, you needn’t worry about getting anywhere near Twitter’s rate limit.

Ok, so now let’s get started.

Step 1:
After you figure out what you want to search for (this site is a good start to find trends, and they graph them out for you), you’ll need to plug your search term into the url string that your SAS program will use. If you’re searching for a person, like I did, your string will look like this:

http://search.twitter.com/search.atom?q=from%3Aguykawasaki&rpp=100

The ‘q=from’ tells Twitter that you’re searching for Tweets from a specific user. The ‘%3A’ is url encoding for a ‘:’. And the ‘&rpp’ tells Twitter to return the maximum (100) items per page. You can copy and paste that string into your browser right now and get back some nicely formatted xml representing Guy’s last 100 Tweets.

Step 2: Ok, you know what you’re searching for and how to format the url string to get your results. But Twitter returns a paltry 100 results at a time. You’re a SAS user, you don’t work with 100 record data sets! You want more, so you wrap your code in a macro, key off of Twitter’s page= parameter to get older results, and append the new results to your master dataset. Twitter will generally allow you to pull down 1 week’s worth of search results. The code to do this is located here.

That’s enough to get you started. You now have a SAS data set with lots of Twitter data, including text to mine, dates and times to trend out, and, hopefully, an interesting topic to help show showcase your analytical prowess to your audience.

You can access the full code here.

Don’t forget to come back in about 2 weeks to read my post on how to wrangle and append other data from Twitter to your search dataset. Or, better yet, click here and get all of my posts in your inbox as soon as they’re published.

Old Spice Guy’s popularity on Twitter charted

Old Spice recently released about 14 ads with The Old Spice Guy (OSG) personally responding to Tweets from 14 celebrities. Some of the celebs are Hollywood types, others are Web Celebs like Guy Kawasaki, Biz Stone, Kevin Rose. You can see OSG’s video replies here. They are great.

I put together a chart showing the number of Tweets that mention the words ‘old’ and ‘spice’. The chart shows just how quickly the Twitterverse filled up with Tweets about the OSG. Before 9am on July 13th, there was hardly any mention of the OSG, but then, within 6 hours, there’s a spike of about 2,300 tweets per hour about Old Spice. Alas, nothing lasts forever, and after peaking at 4,500 Tweets per hour, the Twitterverse quieted down and settled at around 400 Tweets per hour about the OSG.

BTW, the OSG says he’s hung up his towel.

Chart of the Old Spice Guy's popularity on Twitter

OSG Trend

Watch as The Biz Intel Guru fixes a poorly designed WSJ graphic

A friend of mine pointed me to a story in today’s WSJ (no subscription needed) with a hard to understand graphic in it. I’ve pasted the graphic below.

The designer chose to use the entire background of the chart to represent the number of sudden cardiac deaths in a given year. They used squares of different sizes to represent the number of explained and unexplained deaths from cardiac arrest. In this case, I think the designer was trying to give the reader an easy way to compare the parts to the whole, but it doesn’t work. Also, there are over 100 words of annotation on this otherwise skimpy graphic, which makes me think they could have done away with the graphic and just used the words instead.

Here’s the WSJ graphic:

Poorly designed WSJ graphic

WSJ graphic

Here’s what I think the chart should look like:

What do you think? Is my graphic clearer than the WSJ’s? What would you do differently? I’d love to hear your comments.

How to build a Twitter Empire like Guy Kawasaki–4 simple steps–Infographic



Infographic is at the bottom of this post.

Photo of GuySo, you want to be a Twitter legend like Guy Kawasaki ? You want 250,000 followers. You want to make lots of money and Tweet all day long. Well, the insights in this dashboard won’t turn you into Guy Kawasaki, but they will help you understand the 4 most important things that make Guy such a success on Twitter.

Guy Tweets like a Firehose
Guy tweets about 3 times an hour, generating about 83 Tweets per day. Half of Guy’s Tweets are published between 9am and 6pm, Eastern time. Guy repeats his Tweets 3 times, 8 hours apart because he knows that his repeat Tweets will bring in about 75% of his total clicks. So do what Guy does and repeat your Tweets.

Guy Tweets to be ReTweeted
Just about all of Guy’s Tweets have a link to his website, Alltop.com. Guy publishes lots of interesting content, and his 250,000 followers ReTweet Guy’s stuff about 1,500 times per day. By getting others to ReTweet his Tweets, Guy’s audience spans well beyond his 250,000 followers.

Guy’s optimal time to Tweet for ReTweets is 5pm Eastern. If you’re looking for ReTweets, try Tweeting when Guy does, and also read this. While you’re doing that, make sure you pay attention to Guy’s next attribute.

Guy Tests and Tracks to refine his Twitter Strategy
Guy tested his Tweet repeat strategy before deciding on the 3 repeats, 8 hours apart. Why not go one step further and use Twitter data to predict how many ReTweets Guy’s post will get? I’ve constructed a model showing that that we can predict, based on the first 15 minutes of ReTweets, how many total ReTweets Guy will get from his initial Tweet in the following 24 hours. Guy could use this early indicator to alter his Tweeting strategy for the day, or to shuffle around advertising, or to change his repeat Tweet strategy on the fly. You should do the same.

Guy Tweets Great Content
This is the most important thing of all. Tweet all you want, but if you don’t put out interesting stuff, who will want to follow or ReTweet you?

The data for this analysis were gathered using various APIs (YQL, BackTweet, Twitter Search, and longurlplease). SAS was used to gather and manipulate the data and JMP was used to build the predictive model. The data in this analysis span Guy’s Tweets from the first two weeks of June 2010. Weekend Tweets were excluded.

Infographic


Single click image for full screen version.
Download a high-resolution pdf of this infographic here.

Not all of Guy’s tweets were used in this analysis. @Replies were excluded, as were tweets which didn’t have a link to Alltop.com.

Customer Insight Dashboard for debt collectors

In today’s economy your collectors need the best customer insights they can get. That means giving them the the right information at the right time in the right format. Forget working off of mainframe green screens, or bolted on front-ends–those tools aren’t made to provide maximum insights to your collectors.

Your collectors need a Customer Insight Dashboard like the one below*. The dashboard shows, in detail, information that your collectors need to maximize their debt collection efforts. Across the top of the dashboard is the customer’s financial trend information and pertinent scores about their risk level and ability to pay you back. Along the left-hand side of the dashboard we provide your collectors with the ability to listen to prior interactions with the customer as well as access information they might use to locate a customer who is avoiding your calls. In addition, your staff could locate customer’s nearby your target customer for aid in tracking them down.

Dashboard

Click image for high resolution version

On the bottom left-hand side of the dashboard your collectors have access to the customer’s most recent credit bureau data. This is a critical component to making sure you get paid first. We’ve parsed the information from the credit bureau to show your collectors which of your customer’s credit card lines they could use to balance transfer their bad debt off of your books and onto your competitors books.

If a picture’s worth a thousand words, an image of the customer’s house or business might be worth $10,000. We find this information very useful to collectors in helping them figure out what makes each customer tick. Your collectors can then use their skills of persuasion and apply the information to help them collect the debt that’s due to you.

Lastly, we show some recent transactions on the customer’s account. Seeing how they spent the money they owe you can also help your collectors be more persuasive in the collections efforts.

A dashboard like the one above could be implemented in your system in a few weeks. The dashboard itself is all done in Excel 2003 (2007 works too) with a $250 add in.

*The data presented in this dashboard are not real. They are provided for illustrated purposes only.

200+ things you need to know about unemployment in the US, all presented on one insightful dashboard

There are 208 charts on the dashboard below. Each one is loaded with information from the Bureau of Labor statistics. Check it out, you’re bound to learn something you didn’t know before you came here.

The unemployment insight dashboard is now updated with May’s unemployment figures from the BLS. The unemployment rate dropped from 9.9% to 9.7%, in part due to the fact that approximately 200,000 people stopped looking for work and stopped being counted by the BLS as unemployed.

The long-term unemployment population, those out of work for 6 months or more, grew by an additional 47,000 people and account for 46% of all unemployed. That’s the equivalent to all the people (men, women, and children) in the entire state of Washington.

Note: click the picture below to bring up a large version. Then click again to get a crystal clear look at the dashboard.

Dashboard of Joblessness in the U.S.-May 2010

What everybody ought to know about unemployment in the U.S.

The unemployment insight dashboard is now updated with April’s unemployment figures from the BLS. While the unemployment rate is essentially unchanged, the nasty trend in the long-term unemployment continues.

The numbers for April show the long-term unemployed group grew by another 200,000. Now, more than 6.7MM Americans, that’s the equivalent to the entire state of Washington (men, women, and children), have been jobless for more than 6 months. This population now accounts for 46% of all unemployed.

Also, if you’re wondering why the unemployment rate increased despite the fact that the number of people who found new jobs increased, a good explainer can be found here, at the WSJ blog.

In my update last month I said I’d try to get more insights about the long-term unemployed. It turns out there’s a fair amount of information for this group, but the data are updated annually, not monthly. Nonetheless, in the coming weeks I will generate some supplemental posts analyzing the long-term unemployed from the new found data. Until then, here’s a link to a story about the long-term unemployed in the Huffington Post.

I welcome your comments, both positive and negative. I especially want to hear your thoughts on improving this dashboard. In particular, I’m considering getting rid of and/or dramatically altering the bar chart on the left side of the dash showing the number of un/underemployed Americans. I think the scaling of the chart makes differences in the blue bars hard to pick up, I also don’t like the lack of context in the chart. Perhaps indexing it to 1 year ago might be better.

If you’d like to print out or save a copy of a beautiful, high-res, 11 x 17 pdf version of this dashboard, just click here.
Dashboard of Joblessness in the U.S.-April 2010