I made a patch for calibre which adds the readability algorithm for extracting the main article content from an HTML page.
It's useful for things like a Hacker News recipe (included).
The python readability port is from
https://github.com/buriy/python-readability, which I dumped into src/readability.
You can grab the branch at lp:~thomas-scholl/calibre/readability, and give the hackernews.recipe a try.
What would I need to change to get it into Calibre?