Quote:
Originally Posted by kovidgoyal
Yes tags to remove are deduced from the source HTML
The simplest way to get the full text of the articles is if the website has a "Print version". If it does, you need to figure out how to map the URLs in the RSS feeds to the corresponding print version. Then encode that logic into the print_version method which takes a url and should return the print version of the URL.
|
Kovid,
I understand how that works. I remember seeing the BBC example in the FAQ or tutorial. It made sense.
But many sites, like Ars Technica, don't offer that print option; you're forced to advance to the next page to read the rest of the article (when reading with a browser).
I tried kipklop74's suggestion by inserting the line:
use_embedded_content = False
in the recipe. But...it doesn't fetch the rest of the Ars Technica articles.
Any suggestions? (Kovid, Darko)
Xanthan Gum