In this post, I'll look at some basics of what it took to stitch together a simple appengine web app which checks an atom feed and posts new entries to a twitter account. This is mostly an exercise, though I do intend to use it personally. I realize there are existing sites (and blogging services) that provide this type of functionality out of the box.
I'm not planning on making the service itself publicly available, but I am open sourcing it. You can easily setup and deploy your own instance on appengine for your own purposes. It's not ready for multiple users or a public site, but I do think it is ready for you to use it personally on your own appengine account.
Let's get started.
Part 1: Parsing the atom feed
import feedparser
atomxml = feedparser.parse('http://netsmith.blogspot.com/feeds/posts/default')
entries = atomxml['entries']
You can then pull out titles, links, etc with expressions like these:
print entries[0]['title'] # title of the first entry
print entries[1].link # link from the second entry
print [entry.title for entry in entries] # titles for all entries
Universal feed parser provides plenty of ways to get at the structured results of the Atom/rss feed. See Universal Feed Parser for some excellent concise examples.
To check for new entries, we could write a pubsubhub integration or poll the rss/atom feed every so often and check for new entries. Since my blog is still on blogger.com (crazy, I know!), we'll use a polling strategy (booo!) by creating a cron.yaml file that looks like:
cron:
- description: Check for new posts and tweet them
schedule: every 5 minutes
url: /tasks/poller
and then creating a recurring task that retrieves the feed and checks to see if there are any new posts like this:
...
url = 'http://netsmith.blogspot.com/feeds/posts/default'
d = feedparser.parse(url)
idsThatHaveAlreadyBeenTweeted = getAlreadyTweetedEntries(url) # retrieve from datastore
for entry in d['entries']:
if not entry.id in idsThatHaveAlreadyBeenTweeted:
tweet(entry)
updateTweetedEntries(url, entry) # updates the datastore
...
As far as fetching the feed and checking for new entries, that's pretty much it!
Part 2: Posting to twitter
Posting to twitter is a relatively straightforward process, and I was able to hack something together very quickly by building on the shoulders of some excellent libraries and examples (specifically tweepy and tweepy-examples/app-engine). Coincidentally, right as I reached the point where I got it all figured out, I saw Nick Johnson (google) post this article in which he outlines how to authenticate a user with twitter using appengine-oauth on appengine.
Rather than re-hash what he and others have explained fairly well (OAuth), I want to just recommend you either check-out the tweepy app engine example and start tweaking/experimenting from there or follow through Nick's article for a little bit more explanation. Either way, you need to be writing some code around oauth to really get your hands around it and either place is an excellent start.
I do want to point out an interesting wrinkle though when it comes to oauth on appengine.
Since you can specify a callback url for oauth, you'll want to use code like the following to set the appropriate oauth callback url depending on whether you're testing your app locally or running it in the cloud on appengine:
import os
if os.environ.get('SERVER_SOFTWARE', '').startswith('Devel'): # running on local server
TWITTER_CALLBACK = 'http://127.0.0.1:8080/oauth/callback'
else:
TWITTER_CALLBACK = 'http://yourapp.appspot.com/oauth/callback'
If you run into any 'unauthorized' messages while testing twitter oauth,
try:
- resetting you twitter api key
- ensuring that the twitter method you're using is one your authorized for -- i.e. - make sure you're set to read/write access at the twitter api key level if you're attempting to post a tweet.
Once you've successfully authorized through OAuth, post the tweet with tweepy:
# after a successful oauth authentication w/twitter
import tweepy
tweet = 'Hello twitter world'
auth = tweepy.OAuthHandler(TWITTER_CONSUMER_KEY, TWITTER_CONSUMER_SECRET)
auth.set_access_token(thisUser.access_token_key, thisUser.access_token_secret)
api = tweepy.API(auth_handler=auth, secure=True, retry_count=3)
api.update_status(tweet) # post the tweet
When it comes to integrating with the twitter api on appengine, it's really easy and straightforward with tweepy, once you've figured out OAuth (which should only take a couple of hours at most).
Part 3: Shortening url's
Once you can parse atom feeds, authenticate users, and post to twitter, the only piece that's really missing is shortening those long (thanks again blogger!) blog urls. For this we'll use bit.ly. Here's a simple method for shortening a url with bit.ly:
from django.utils import simplejson
from google.appengine.api import urlfetch
class BitLy():
def __init__(self, login, apikey):
self.login = login
self.apikey = apikey
def shorten(self,param):
url = "http://" + param
request = "http://api.bit.ly/shorten?version=2.0.1&longUrl="
request += url
request += "&login=" + self.login + "&apiKey=" +self.apikey
result = urlfetch.fetch(request)
json = simplejson.loads(result.content)
return json
(btw, I had this clipped from somewhere but can't find the source, so if you know where it came from, please let me know and I'll add credit)
To use it write something like this:
bitly = BitLy(BITLY_LOGIN, BITLY_API_KEY)
shortUrl = bitly.shorten('www.yahoo.com')['results']['http://www.yahoo.com']['shortUrl']
If you don't want to use bit.ly, You can always quickly build and host your own url shortener on appengine (like this).
Part 4: Putting it all together
So far we've covered how to grab the feed, post to twitter, and shorten a url prior to posting. That+oauth is the bulk of what is necessary to understand how to build this simple web app.
To put it all together into a working appengine app is relatively simple once you have built an appengine project or two. If you haven't (or if you're impatient like me), it can help to have a good, simple, webapp appengine project template (for reference, mine is here) to get up an running quickly.
Wrapping up, I want to mention that most of this project was developed in a test driven manner using continuous testing to keep myself honest and verify and test behavior of external api's. I'm using Ale in conjunction with my project starter template for this.
Here's the source for FeederTweeter in a working appengine web app. It includes a simple (and horrible looking!) web ui that is set to be only accessible by the admin of the appengine account. There may be some slight differences from what is shown here -- usually for the sake of brevity in this blog post.