View Single Post
Old 02-11-2009, 01:40 PM   #192
XanthanGum
Connoisseur
XanthanGum began at the beginning.
 
XanthanGum's Avatar
 
Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
Print Versions

Quote:
Originally Posted by kovidgoyal View Post
Yes tags to remove are deduced from the source HTML

The simplest way to get the full text of the articles is if the website has a "Print version". If it does, you need to figure out how to map the URLs in the RSS feeds to the corresponding print version. Then encode that logic into the print_version method which takes a url and should return the print version of the URL.
Kovid,

I understand how that works. I remember seeing the BBC example in the FAQ or tutorial. It made sense.

But many sites, like Ars Technica, don't offer that print option; you're forced to advance to the next page to read the rest of the article (when reading with a browser).

I tried kipklop74's suggestion by inserting the line:

use_embedded_content = False

in the recipe. But...it doesn't fetch the rest of the Ars Technica articles.

Any suggestions? (Kovid, Darko)

Xanthan Gum
XanthanGum is offline