![]() |
#1 |
Member
![]() Posts: 12
Karma: 10
Join Date: Sep 2011
Location: Chicago, Illinois, USA
Device: Nook Simple Touch
|
Chicago Tribune Recipe not selecting full article
I've been fiddling with the built-in Chicago Tribune recipe to add a few more RSS feeds. That's working fine, however, I've noticed that for longer articles, the recipe is sometimes missing substantial portions. The Chicago Tribune uses Feedburner to publish its RSS feeds. The recipe appears to download the article linked by Feedburner; however, the longer articles will have links to multiple pages and will also provide a Single Page link. Unfortunately, the Single Page link is not something that is consistently present, nor can be predicted. You must download the Feedburner page, analyze it for the Single Page link, then download that alternate page instead. This is beyond my meager understanding of the API to implement myself. Any help would be greatly appreciated.
Of course, I'd love it if the author, Kovid Goyal, can figure out a way to make this enhancement. |
![]() |
![]() |
![]() |
#2 | |
Enthusiast
![]() Posts: 28
Karma: 10
Join Date: Sep 2011
Device: Sony PRS-350, Kindle Touch
|
Quote:
Each recipe provides the variable match_regexps. Eatch URL that matches these regular expression is follwed, when the variable recursions is set to a value of 1 or greater. It is important, that the links to be followed aren't reomved by any of the remove_tags* An updated version of the recipe that will follow links is here: Spoiler:
|
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Member
![]() Posts: 12
Karma: 10
Join Date: Sep 2011
Location: Chicago, Illinois, USA
Device: Nook Simple Touch
|
Thanks much for the quick response. Works like a charm. For kicks, I used this bit of code instead and it seemed to yield virtually identical results:
match_regexps = [r'full\.column'] |
![]() |
![]() |
![]() |
#4 | |
Enthusiast
![]() Posts: 28
Karma: 10
Join Date: Sep 2011
Device: Sony PRS-350, Kindle Touch
|
Quote:
An example for the todays issue is the article here. If you want to prevent an article to be broken into several chapters, you will have to implement the get_article_url method. You will have to read the page into a Soup, analyze if it has a "single page" link (e.g. with your regex) and return the link to the complete page. |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Interesting, but flawed, article on eBooks in the International Herald Tribune | luqmaninbmore | News | 14 | 08-17-2011 10:50 AM |
Engadget recipe - full article text | UnWeave | Recipes | 5 | 07-03-2011 11:01 PM |
Chronicle Tribune recipe help | madman911 | Recipes | 0 | 01-29-2011 11:33 PM |
Decorate article headings as hyperlinks to full article? | tomsem | Recipes | 5 | 10-15-2010 08:30 PM |
Chicago Tribune now available on the Kindle! | daffy4u | Amazon Kindle | 14 | 08-11-2008 01:10 PM |