Tag Archives: Twitter

SAS and Twitter–how to harness SAS to grab data from Twitter in 2 easy steps

I recently published a post titled, “4 Key Tweeting Attributes of Guy Kawasaki in one Infographic.” I made extensive use of SAS to gather and manipulate the data from Twitter. Turns out, SAS is pretty awesome for this type of work. In this post I’m going to document how to use SAS to gather data from Twitter’s API. My next post on SAS and Twitter will build off of this one and teach you how to gather data about your subject’s followers, find ReTweets, and listen in on conversations. Click here to get that post delivered to your inbox as soon as it’s published.

First off, you might wonder, why do this? Well, successful analyzers of the future will be adept at analyzing all sorts of data, including data from social networks, like Twitter. Also, if you’re looking to market your analytical skills, what hiring manager wouldn’t be impressed with someone who gathered data from Twitter’s API with SAS, then mined, analyzed, and presented the data in a compelling way. Oh, almost forgot, because you’re analyzing a current event (it’s on Twitter, right?) and mentioning Twitter in your post, your analysis will be more search engine friendly, so you’ll likely get a wider and more targeted audience than if you analyzed something outside of the Twitterverse. Some smart analyzers have even been known to analyze Tweets about their target employer and use the analysis to help get themselves hired. On a larger scale, this is almost exactly what Seth Godin has done with Brands in Public.

Before we get started I have to tell you a little about Twitter’s rate limiting policy. Unfortunately, the search area of Twitter’s API doesn’t have a hard rate limit. Rather, Twitter says they allow a rate limit quite a bit higher than their standard 150 hits/hour, but they decline to say how much. Full documentation can be found here, about 1/2 down the page. I have run afoul of the limit before and guess that it’s around 600 hits per hour or more than 30 per minute. When you exceed the unpublished rate, you have to wait between 1-3 hours for your ip address to be allowed to his Twitter again. If you’re just searching for someone’s post, like we’re doing with Guy Kawasaki, you needn’t worry about getting anywhere near Twitter’s rate limit.

Ok, so now let’s get started.

Step 1:
After you figure out what you want to search for (this site is a good start to find trends, and they graph them out for you), you’ll need to plug your search term into the url string that your SAS program will use. If you’re searching for a person, like I did, your string will look like this:

http://search.twitter.com/search.atom?q=from%3Aguykawasaki&rpp=100

The ‘q=from’ tells Twitter that you’re searching for Tweets from a specific user. The ‘%3A’ is url encoding for a ‘:’. And the ‘&rpp’ tells Twitter to return the maximum (100) items per page. You can copy and paste that string into your browser right now and get back some nicely formatted xml representing Guy’s last 100 Tweets.

Step 2: Ok, you know what you’re searching for and how to format the url string to get your results. But Twitter returns a paltry 100 results at a time. You’re a SAS user, you don’t work with 100 record data sets! You want more, so you wrap your code in a macro, key off of Twitter’s page= parameter to get older results, and append the new results to your master dataset. Twitter will generally allow you to pull down 1 week’s worth of search results. The code to do this is located here.

That’s enough to get you started. You now have a SAS data set with lots of Twitter data, including text to mine, dates and times to trend out, and, hopefully, an interesting topic to help show showcase your analytical prowess to your audience.

You can access the full code here.

Don’t forget to come back in about 2 weeks to read my post on how to wrangle and append other data from Twitter to your search dataset. Or, better yet, click here and get all of my posts in your inbox as soon as they’re published.

Old Spice Guy’s popularity on Twitter charted

Old Spice recently released about 14 ads with The Old Spice Guy (OSG) personally responding to Tweets from 14 celebrities. Some of the celebs are Hollywood types, others are Web Celebs like Guy Kawasaki, Biz Stone, Kevin Rose. You can see OSG’s video replies here. They are great.

I put together a chart showing the number of Tweets that mention the words ‘old’ and ‘spice’. The chart shows just how quickly the Twitterverse filled up with Tweets about the OSG. Before 9am on July 13th, there was hardly any mention of the OSG, but then, within 6 hours, there’s a spike of about 2,300 tweets per hour about Old Spice. Alas, nothing lasts forever, and after peaking at 4,500 Tweets per hour, the Twitterverse quieted down and settled at around 400 Tweets per hour about the OSG.

BTW, the OSG says he’s hung up his towel.

Chart of the Old Spice Guy's popularity on Twitter

OSG Trend

How to build a Twitter Empire like Guy Kawasaki–4 simple steps–Infographic



Infographic is at the bottom of this post.

Photo of GuySo, you want to be a Twitter legend like Guy Kawasaki ? You want 250,000 followers. You want to make lots of money and Tweet all day long. Well, the insights in this dashboard won’t turn you into Guy Kawasaki, but they will help you understand the 4 most important things that make Guy such a success on Twitter.

Guy Tweets like a Firehose
Guy tweets about 3 times an hour, generating about 83 Tweets per day. Half of Guy’s Tweets are published between 9am and 6pm, Eastern time. Guy repeats his Tweets 3 times, 8 hours apart because he knows that his repeat Tweets will bring in about 75% of his total clicks. So do what Guy does and repeat your Tweets.

Guy Tweets to be ReTweeted
Just about all of Guy’s Tweets have a link to his website, Alltop.com. Guy publishes lots of interesting content, and his 250,000 followers ReTweet Guy’s stuff about 1,500 times per day. By getting others to ReTweet his Tweets, Guy’s audience spans well beyond his 250,000 followers.

Guy’s optimal time to Tweet for ReTweets is 5pm Eastern. If you’re looking for ReTweets, try Tweeting when Guy does, and also read this. While you’re doing that, make sure you pay attention to Guy’s next attribute.

Guy Tests and Tracks to refine his Twitter Strategy
Guy tested his Tweet repeat strategy before deciding on the 3 repeats, 8 hours apart. Why not go one step further and use Twitter data to predict how many ReTweets Guy’s post will get? I’ve constructed a model showing that that we can predict, based on the first 15 minutes of ReTweets, how many total ReTweets Guy will get from his initial Tweet in the following 24 hours. Guy could use this early indicator to alter his Tweeting strategy for the day, or to shuffle around advertising, or to change his repeat Tweet strategy on the fly. You should do the same.

Guy Tweets Great Content
This is the most important thing of all. Tweet all you want, but if you don’t put out interesting stuff, who will want to follow or ReTweet you?

The data for this analysis were gathered using various APIs (YQL, BackTweet, Twitter Search, and longurlplease). SAS was used to gather and manipulate the data and JMP was used to build the predictive model. The data in this analysis span Guy’s Tweets from the first two weeks of June 2010. Weekend Tweets were excluded.

Infographic


Single click image for full screen version.
Download a high-resolution pdf of this infographic here.

Not all of Guy’s tweets were used in this analysis. @Replies were excluded, as were tweets which didn’t have a link to Alltop.com.