Sunday Afternoon Thoughts on the Design of RSS Aggregators

Email.Email weblog link
Blog this.Blog this
William Grosso

William Grosso
Dec. 14, 2003 01:35 PM

Atom feed for this author. RSS 1.0 feed for this author. RSS 2.0 feed for this author.


The other day, at the Emerging Technology SIG, Doug Cutting gave a talk on Lucene and Nutch. Before the talk, Doug casually mentioned that he used a server-based RSS aggregator. Similarly, in the responses to my blog entry on RSS Aggregrators, someone mentioned they use bloglines.

This is interesting to me. In my mind, and I was probably guided by the intuition that a "web browser is a client," RSS Aggregators were naturally client side. By which I mean, my first inclination was that RSS Aggregators naturally run on the end-user's machine, rather than on a centralized server farm. There are counterexamples, though. For example, Bloglines is an RSS Aggregator that runs out there somewhere and returns your results as a web page (and, by the way, Scott Rosenberg likes Bloglines).

Which led me to spend some time pondering: what's the boundary line between "standalone application" and "server-based" application. That is, when should an application live entirely on an end-user's machine, and when should it live on a server and be accessed through a client program (this distinction gets hazier in the case of RSS Aggregators, which are, in a loose sense, web-clients anyway).

The classic reasons for making an application a server-based application are:

  • Application Load. The application has some memory or cpu requirement that makes end-user machines not applicable. For example, an application that briefly requires 1 Gig of memory for efficient processing of an intermediate data structure.
  • Resource Sharing. The application enables users to effectively amortize the cost of some computational resource. For example, Google amortizes the cost of spidering and indexing.
  • Data Sharing. Many users, or applications, are using the same data set. In addition to search engines (sharing the index), this is the classic database-driven application. In addition, things like an authentication server ("single signon") live here.
  • Connectivity requirements. The app has to be there on a 24 x 7 basis, or some simulation thereof. E-mail servers shouldn't go off line (as end-user machines often do).
  • Manageability (it's often easier to manage a data center whose configuration you control than it is to repeatedly deploy complex functionality on thousands of desktops).
  • Accessibility. It's easier to access your information if it's stored in a central repository. It's easier to access an application if it's running on a server.
  • Security. If some information needs to have access restricted, it's easier to manage that control centrally.

The classic reasons for making an application stand-alone are:

  • Responsiveness. A local application has the potential for a better user experience. Any time you insert round-trips to a server, you add the potential for the user to wonder "What's it doing?"
  • Application load. While the individual client might not need a lot of resources, the overhead of serving many clients can overwhelm a server-based design.
  • Sheer performance. Some applications (read: games and complicated spreadsheets) are simply infeasible in a server-based model. This is actually a combination of the first two, but I think it deserves its own bullet point.
  • Personal information. It's difficult to store deep amounts of personal context on a server. If an application truly benefits from a large amount of personal context, then it's probably a standalone application.
  • Security. The user might have qualms about storing personal data somewhere remote. In addition, a security hole can compromises many people in one exploit.
  • Standalone aspects. What if the machine isn't connected to the server? If someone is going to be intermittently connected, or in low-bandwidth situations, standalone might be the way to go.

Of course, I'm blurring the lines and ignoring fat clients that do more than provide a better gui (e.g. which slide some "server" functionality over the client). It's a simple list. And there's nothing in here about P2P applications or the ways in which the faster release cycles engendered by web-based applications can be a significant competitive advantage. But I still think it captures a lot of the considerations and so I'd like to ask:

Did I miss anything in these lists?

Now the interesting thing is to think about RSS Aggregators. Why is Bloglines an internet-service and why is FeedDemon a standalone application?

Obvious things

Let's start by making the easy comparisons. From the end-user's perspective, the standalone approach has the following advantages:
  • A richer user interface (although note that Tim Bray doesn't think this is obvious).
  • Better performance on small feed sets. There's a caveat here: I've only played around with small feed sets on current applications (approx 100 feeds) where the feeds get updated frequently. If you have a lot of feeds which are infrequently updated, the bandwidth of fetching old feeds might be significant (unless people are starting to use last-modified again, which would be nice).

From the developer's perspective, the standalone approach has the following advantages:

  • No need to worry about scalability concerns.
  • No need to create an administer a server farm.
  • Better support from IDEs and other development tools.

From the end-user's perspective, the server-based approach has the following advantage:

  • Location and OS transparency. You can use it from anywhere (or, at least, from any PC. There's not a lot of "use it from the cellphone" going on yet).
  • Ability to use a customized browser (for example, one with advanced pop-up blocking, tabbed browsing, or searching functionality). Similarly, integration into the user's standard browser (ability to bookmark an article for later) seems like an advantage.

From the developer's perspective, the server-based approach has the following advantages:

  • No need to worry about deployment of complex applications to uncontrolled environments.
  • Ability to use large, server-side libraries and pieces of functionality.
  • Fast release cycles. The ability to quickly modify and update code.

Applying the Server-Based / Standalone Bullet Points

With that out of the way, let's talk horse-racing. Given that you can build an RSS aggregator that's server-based or standalone, how do they compete with each other? How will they evolve?

Server-based designs

How do you, as the designer of bloglines, make your application compelling? Well, you want to build something that is a classic server-based application (cause you're server based and it makes sense to leverage that). You want to add features that require resource sharing, data sharing, or connectivity (you've already got the accessibility thing nailed).

What do those look like? You might think connectivity's a nice one. If you can stay up 24 x 7, and you can cache RSS feeds, then people can find out about blogs which are currently off-line, but have changed. The problem is: this assumes the feed indicated a change, but then the site went off-line. And if a user is interested enough to wonder whether a feed changed, they might want to be able to fetch the article. Which means this isn't that big an advantage (the feed, or the site, being down is pretty much a bummer, unless your aggregator's going to cache a lot of data for people).

Data sharing? Well, there's potential here in that the RSS feeds are fetched much less often. This is a very good thing for authors with low-capacity servers and interesting weblogs. But it's not so compelling for the end-user. Unless we run into a scenario where a significant percentage of weblog's are up, but responding slowly. Or, a scenarios which is perhaps more likely, a significant percentage of weblogs decide to give higher priority to server-based RSS feeds on the theory that doing so will decrease their overall load.

Resource sharing? Here's where the server-based designs have a chance to shine. Bloglines has features like Top Blogs, Blog Recommendations, and the ability to subscribe to a search which are hard to imagine incorporating into a standalone design.

I think these resource sharing functions are the compelling advantage bloglines has. The interesting thing is, of course, that other applications which aren't RSS Aggregators (like Feedster) also offer some of them.

Standalone designs

How about the other side? How do you, as the designer of FeedDemon, make your application compelling? Well, you want to build something that is a classic standalone application (cause you're standalone). You want to add features that require significant personal application load, personal information, or enable you to run even when you're not connected to the net (you've already got the performance thing nailed).

The last of these is the easiest-- it probably means building a local database and having a "fetch my web" feature for offline RSS browsing. Given that even the FeedDemon help is on-line right now (the help system sends you to online help pages), it would appear that this isn't a priority (in spite of the "work offline" button, which seems to simply prevent FeedDemon from attempting to talk to the world).

"Fetch my web" seems nice even when you're on-line too. Wouldn't it be great to improve the performance of the web by having a predictive cache? Of course it would. And by subscribing to feeds, I'm telling the web browser exactly how to build the cache. The software gets simpler, and better.

In slogan form: UI is Better than AI.

How about significant load or significant personal information? What could you add to an RSS Aggregator that would make it more useful along these lines? Well, the obvious thing is memory: Suppose the RSS Aggregator not only knew about your feeds, it know about which articles you fetched over time, and was somehow taking advantage of that big database of information. Suppose you could search the database for old blog articles (though, in a shamless personal plug, I'll point out that you can do this for bloglines by incorporating the toolbar I helped build into your web browser)?

Platform Thoughts

Another point, which isn't necessarily client or server based, is that applications are platforms. By building a server-based application, and relying on a web browser for your client, you are doing two things: you are limiting the extensions that third parties can make to your application to browser-based plugins AND you are enabling the existing browser-based plugins to augment your application.

On the other hand, if you built a robust plug-in architecture into your standalone aggregator, it's possible that you could harness a intermediate-to-long-term competitive advantage-- as RSS grows in importance, and we all believe it will, people will want to customize their RSS experience (on the other hand, you have to support a developer community. Uuugh).

William Grosso is a coauthor of Java Enterprise Best Practices.