![]() |
#1 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
|
The Independent : Updated recipe for 2011 site redesign
As you probably know, the independent has recently updated its website and this has broken the old recipe.
Here is an initial basic recipe for the new site. I thought it would be good to make a thread for people to post improvements. |
![]() |
![]() |
![]() |
#2 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
|
for those who like images
Updated the recipe to pull in the images. Do you reckon its best to limit these to one per article?
Regarding the categories, is it possible to merge them into a parent category if the number of articles is below a certain threshold? |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,185
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If you want to do some dynamic modifications to the categories, you will need to override parse_feeds in your recipe, like this:
Code:
def parse_feeds(self): feeds = BasicNewsRecipe.parse_feeds(self) # do something to the feeds return feeds |
![]() |
![]() |
![]() |
#4 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
|
another update
Thanks kovid, I really appreciate all the work you do on calibre.
I've made a few changes to the recipe:
I noticed that their web server frequently times out when processing a request (probably initial teething problems with the new site). This resulted in a lot of captions being added without an image. Hopefully the post process parse should take care if this issue. Last edited by NotTaken; 11-06-2011 at 05:37 PM. |
![]() |
![]() |
![]() |
#5 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,185
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
For the timeouts, try adding delay=1 to your recipe. That will greatly slow down the download but it might prevent the timeouts.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 | |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
|
Quote:
Slight update to the recipe to add star images to the reviews. |
|
![]() |
![]() |
![]() |
#7 |
Connoisseur
![]() ![]() Posts: 99
Karma: 170
Join Date: Nov 2010
Location: Airdrie Alberta
Device: Sony 650
|
It is great to finally get a working recipe for the Independent ( my fav read) . What part of the recipe can I eliminate to remove the pictures and picture caption?
A way to get the print version would be even better This is the difference @ the end of url html html?printService=print Last edited by mufc; 11-08-2011 at 11:04 PM. |
![]() |
![]() |
![]() |
#8 |
Connoisseur
![]() ![]() Posts: 99
Karma: 170
Join Date: Nov 2010
Location: Airdrie Alberta
Device: Sony 650
|
Had some success but
Spoiler:
Thiis works on the articles I get. Problem is I am not getting many articles. For example before trying to get the print version I was getting 100 articles for UK News. With recipe change I get 5. Some categories have none. Before anyone states the obvious. Yes "oldest article 7" days is a bit much Last edited by mufc; 11-08-2011 at 11:35 PM. |
![]() |
![]() |
![]() |
#9 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
|
Looks like they try and prevent direct linking to the print pages. To remove the images you can always change remove to True in the following piece of code:
Code:
#images pattern = re.compile('slideshow') if (pattern.search(item['class'])) is not None: remove = False |
![]() |
![]() |
![]() |
#10 |
Connoisseur
![]() ![]() Posts: 99
Karma: 170
Join Date: Nov 2010
Location: Airdrie Alberta
Device: Sony 650
|
That works great. Thanks a Million !. I thought doing that would leave the photo caption but that is gone too.
![]() |
![]() |
![]() |
![]() |
#11 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
|
A few more changes
Some changes:
I was thinking about removing the advertorial articles (see here) but could not see a clean way of doing this. As far as I am aware, they are only identifiable by the text 'Advertorial Feature ' in <div class=" ... strapLine"> so I was thinking of returning None in preprocess_soup if the text was found (this causes an AttributeError exception to be raised). Can anyone think of a nicer solution? Last edited by NotTaken; 11-11-2011 at 02:09 PM. |
![]() |
![]() |
![]() |
#12 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,185
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Returning None in preprocess is fine. If you wish to be more explicit about it you could raise an Exception, though I dont recall if the download system discards exceptions in that method.
|
![]() |
![]() |
![]() |
#13 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
|
update
Thanks. A few updates:
|
![]() |
![]() |
![]() |
#14 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 65
Karma: 4640
Join Date: Aug 2011
Device: kindle
|
Fixed an issue whereby a KeyError was raised on pages with embedded flash videos. These pages also had some other crud, which I have also removed.
|
![]() |
![]() |
![]() |
#15 |
Connoisseur
![]() Posts: 50
Karma: 10
Join Date: Dec 2008
Location: Scotland
Device: Kindle DX, Kindle. iPad 3
|
I'm afraid neither the built-in recipe nor this one is working for me for The Independent any more. It downloads very few links and typically gives error messages in the log such as:
Could not fetch link http://www.independent.co.uk/news/uk...s-6277966.html Traceback (most recent call last): File "site-packages\calibre\web\fetch\simple.py", line 432, in process_links File "site-packages\calibre\web\fetch\simple.py", line 193, in get_soup File "c:\docume~1\dave\locals~1\temp\calibre_0.8.31_tmp _uf67cn\2zkxsh_recipes\recipe0.py", line 202, in preprocess_html File "c:\docume~1\dave\locals~1\temp\calibre_0.8.31_tmp _uf67cn\2zkxsh_recipes\recipe0.py", line 275, in _insertRatingStars IndexError: list index out of range I wish I could help but modifying the recipes is a little beyond me. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
[Updated recipe] Ming Pao (明報) - Hong Kong (2011/10/21) | tylau0 | Recipes | 0 | 10-21-2011 11:38 AM |
[Updated recipe] Ming Pao (明報) - Hong Kong (2011/09/21) | tylau0 | Recipes | 0 | 09-21-2011 07:13 AM |
[Updated recipe] Ming Pao (明報) - Hong Kong (2011/09/20) | tylau0 | Recipes | 1 | 09-20-2011 06:56 PM |
[Updated recipe] Ming Pao (明報) - Hong Kong (2011/06/26) | tylau0 | Recipes | 3 | 06-28-2011 12:17 PM |
Updated Recipe: Ming Pao - Hong Kong (2011/03/08) | tylau0 | Recipes | 0 | 03-08-2011 07:25 PM |