New Project - Streams

A ton of services are popping up that take all of your (Twitter, Google Reader, Netflix, Blog, etc) RSS feeds and make a “stream” of your online activity. My favorite is FriendFeed. Last night I decided to see how fast I could whip one up in Django, and the result is http://stream.benjamingolub.com. So far I’ve got it crawling my Blog, Flickr, Google Reader, and Twitter (the services I use the most). It only took me a few hours to write and most of that time was spent on templates. Some things I learned about RSS feeds:

  1. Universal Feed Parser saved me a lot of time.
  2. Not all RSS feeds are created equally. Come on Google Reader, why don’t you tell me the time that I shared an item? Instead all I get is the time the item was published. So instead I have to make sure that my cycle time for a crawl is fast enough that I can use the current time as the time I shared it. At most I’ll only be off by a minute or two, but this could easily be included in the feed by Google.
  3. Tagging was dead simple to get working. I just pull in the tags (using Universal Feed Parser), create each one if it doesn’t already exist, and add them to the event. The more events that are pulled into my stream, the more tags I get. FriendFeed doesn’t have tags visible at the moment, but I’m sure they are collecting this data. Here is all it takes to do in Django:

    if 'tags' in entry:
    from django.template.defaultfilters import slugify
    for tag in entry.tags:
    tag, created = Tag.objects.get_or_create(title = tag.term.strip(), slug = slugify(tag.term.strip()))
    event.tags.add(tag)

  4. This one’s obvious but don’t rely on receiving valid data, you can bet someone will have malformed HTML that will mess with your site. So before I display any data I strip the tags, urlize the urls (if the feed contains http://www.google.com it will turn that into a link), and then truncate to 100 words. In Django that looks like this:

    {{ event.content|striptags|urlize|truncatewords:100 }}

I’ve got more ideas for crawlers in the works. The great thing about doing it on my own server is that I can store usernames and passwords I use for various accounts. So FriendFeed can’t get my Facebook feed (because they’d need my credentials) but I’ll be able to.

Update: I used to be able to correctly space out my python code but can’t anymore, oh well.

Viewing 2 Comments

    • ^
    • v
    Ben,
    I'm just getting started with Django (literally got my first project started today). Anyway, this is perfect for what I'm trying to do (parse all my feeds for a lifestream). All the plugins for WordPress are not to my personal liking. The hardest thing I'm seeing is finding a good way to parse all my RSS feeds to create the source of my lifestream. It seems you've already figured that out with a technology I'm wanting to learn. Any chance you can share your code (cleaned up of course)?

    Thanks
    • ^
    • v
    I can tell you the gist of it. Basically I use Universal Feed Parser (http://www.feedparser.org/) to write a "Crawler" for each type of source. There are 3 big objects in my database. A source (netflix, blog, google reader, etc) which defines how a stream is crawled, what color it gets, and how it is displayed. A stream is a type of source, the only reason to have streams is so you can have more than 1 of each source (and so this can be multi user friendly...although in my setup I only have 1 user it is built from the ground up to allow for more). Each stream carries things like username/link/feed/etc information so you know where to get the data from. And an entry is what comes out of each stream (actual blog posts, shared stories, etc). This is where you store the data from Universal Feed Parser (the title, content, date, tags, etc).

    Makes sense? If you need more help just leave another comment or email me at bgolub@benjamingolub.com

Trackbacks

close Reblog this comment
blog comments powered by Disqus