Monday, February 22, 2010

How I built FeederTweeter in less than a day (appengine/oauth/tweepy/feedparser/bitly)

In this post, I'll look at some basics of what it took to stitch together a simple appengine web app which checks an atom feed and posts new entries to a twitter account. This is mostly an exercise, though I do intend to use it personally. I realize there are existing sites (and blogging services) that provide this type of functionality out of the box.

I'm not planning on making the service itself publicly available, but I am open sourcing it. You can easily setup and deploy your own instance on appengine for your own purposes. It's not ready for multiple users or a public site, but I do think it is ready for you to use it personally on your own appengine account.

Let's get started.

Part 1: Parsing the atom feed

Universal Feed Parser is an excellent python library for parsing atom/rss feeds and ever since google added transperant urllib support to appengine (in v1.1.9), it's been even simpler to use this 3rd party lib. Here's an example of using feedparser:
 import feedparser
atomxml = feedparser.parse('http://netsmith.blogspot.com/feeds/posts/default')
entries = atomxml['entries']
You can then pull out titles, links, etc with expressions like these:
 print entries[0]['title'] # title of the first entry
print entries[1].link # link from the second entry
print [entry.title for entry in entries] # titles for all entries
Universal feed parser provides plenty of ways to get at the structured results of the Atom/rss feed. See Universal Feed Parser for some excellent concise examples.

To check for new entries, we could write a pubsubhub integration or poll the rss/atom feed every so often and check for new entries. Since my blog is still on blogger.com (crazy, I know!), we'll use a polling strategy (booo!) by creating a cron.yaml file that looks like:
cron:
- description: Check for new posts and tweet them
schedule: every 5 minutes
url: /tasks/poller
and then creating a recurring task that retrieves the feed and checks to see if there are any new posts like this:
...
url = 'http://netsmith.blogspot.com/feeds/posts/default'
d = feedparser.parse(url)
idsThatHaveAlreadyBeenTweeted = getAlreadyTweetedEntries(url) # retrieve from datastore

for entry in d['entries']:
if not entry.id in idsThatHaveAlreadyBeenTweeted:
tweet(entry)
updateTweetedEntries(url, entry) # updates the datastore
...
As far as fetching the feed and checking for new entries, that's pretty much it!

Part 2: Posting to twitter

Posting to twitter is a relatively straightforward process, and I was able to hack something together very quickly by building on the shoulders of some excellent libraries and examples (specifically tweepy and tweepy-examples/app-engine). Coincidentally, right as I reached the point where I got it all figured out, I saw Nick Johnson (google) post this article in which he outlines how to authenticate a user with twitter using appengine-oauth on appengine.

Rather than re-hash what he and others have explained fairly well (OAuth), I want to just recommend you either check-out the tweepy app engine example and start tweaking/experimenting from there or follow through Nick's article for a little bit more explanation. Either way, you need to be writing some code around oauth to really get your hands around it and either place is an excellent start.

I do want to point out an interesting wrinkle though when it comes to oauth on appengine.

Since you can specify a callback url for oauth, you'll want to use code like the following to set the appropriate oauth callback url depending on whether you're testing your app locally or running it in the cloud on appengine:

import os
if os.environ.get('SERVER_SOFTWARE', '').startswith('Devel'): # running on local server
TWITTER_CALLBACK = 'http://127.0.0.1:8080/oauth/callback'
else:
TWITTER_CALLBACK = 'http://yourapp.appspot.com/oauth/callback'
If you run into any 'unauthorized' messages while testing twitter oauth,
try:
  1. resetting you twitter api key
  2. ensuring that the twitter method you're using is one your authorized for -- i.e. - make sure you're set to read/write access at the twitter api key level if you're attempting to post a tweet.
Once you've successfully authorized through OAuth, post the tweet with tweepy:

# after a successful oauth authentication w/twitter
import tweepy

tweet = 'Hello twitter world'
auth = tweepy.OAuthHandler(TWITTER_CONSUMER_KEY, TWITTER_CONSUMER_SECRET)
auth.set_access_token(thisUser.access_token_key, thisUser.access_token_secret)

api = tweepy.API(auth_handler=auth, secure=True, retry_count=3)
api.update_status(tweet) # post the tweet

When it comes to integrating with the twitter api on appengine, it's really easy and straightforward with tweepy, once you've figured out OAuth (which should only take a couple of hours at most).

Part 3: Shortening url's

Once you can parse atom feeds, authenticate users, and post to twitter, the only piece that's really missing is shortening those long (thanks again blogger!) blog urls. For this we'll use bit.ly. Here's a simple method for shortening a url with bit.ly:

from django.utils import simplejson
from google.appengine.api import urlfetch

class BitLy():
def __init__(self, login, apikey):
self.login = login
self.apikey = apikey

def shorten(self,param):
url = "http://" + param
request = "http://api.bit.ly/shorten?version=2.0.1&longUrl="
request += url
request += "&login=" + self.login + "&apiKey=" +self.apikey

result = urlfetch.fetch(request)
json = simplejson.loads(result.content)
return json

(btw, I had this clipped from somewhere but can't find the source, so if you know where it came from, please let me know and I'll add credit)

To use it write something like this:
bitly = BitLy(BITLY_LOGIN, BITLY_API_KEY)
shortUrl = bitly.shorten('www.yahoo.com')['results']['http://www.yahoo.com']['shortUrl']
If you don't want to use bit.ly, You can always quickly build and host your own url shortener on appengine (like this).

Part 4: Putting it all together

So far we've covered how to grab the feed, post to twitter, and shorten a url prior to posting. That+oauth is the bulk of what is necessary to understand how to build this simple web app.

To put it all together into a working appengine app is relatively simple once you have built an appengine project or two. If you haven't (or if you're impatient like me), it can help to have a good, simple, webapp appengine project template (for reference, mine is here) to get up an running quickly.

Wrapping up, I want to mention that most of this project was developed in a test driven manner using continuous testing to keep myself honest and verify and test behavior of external api's. I'm using Ale in conjunction with my project starter template for this.

Here's the source for FeederTweeter in a working appengine web app. It includes a simple (and horrible looking!) web ui that is set to be only accessible by the admin of the appengine account. There may be some slight differences from what is shown here -- usually for the sake of brevity in this blog post.

13 comments:

Matt said...

Whoops! Just realized I misread the pubsubhub docs and pubsubhub _is_ available for blogger blogs. Just an FYI.

Unknown said...

You might also want to know that you can use PubSubHubbub on feeds that don't support it - the hub will poll for you! (Plug: http://blog.notdot.net/2010/02/Consuming-RSS-feeds-with-PubSubHubbub)

generic cialis said...

Hi, well be sensible, well-all described

amoxicillin to buy said...

Thank you! I didn't know they picked up on it until I saw your comment.

lesbian bondage said...

I was interested know about it.

bondage sex said...

Useful information ..I am very happy to read this article..thanks for giving us this useful information. Fantastic walk-through. I appreciate this post.

Rank Checker said...

I hate feedertweeter

Anonymous said...

The first part of starting any new site is picking the niche you want it to be in hotel marrakech. My biggest criteria for this case study was finding a niche that should be fairly easy to get some traction in rapidleech servers, so I went for something pretty obscure pnr status. I don’t know how much money is here, so I’m taking a chance therescrapebox. But all techniques stay the same adwords coupon.

adventure theater said...

What an amazing post that I have ever come through. I have no words for this great post such an awe-some information i got gathered. Thanks for the chance to add to my collection.

Titanium Block said...

That is very impressive!

Loading Arm said...

I am very happy to read this article..thanks for giving us this useful information.

Titanium Metal Supplier said...

Interesting article. Learnt news things today.

Indonesia Metal Supplier said...

Appreciated for sharing your knowledge.