![]() |
#256 |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Feb 2009
Location: Spain
Device: Sony PRS-505
|
Thanks Kovid,
I downloaded the new .py version of the feed ('cos I don't do well with cut'n'paste) and it all works well. Many thanks for this and have a great weekend, Emmet |
![]() |
![]() |
#257 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,398
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
![]() |
Advert | |
|
![]() |
#258 |
Connoisseur
![]() Posts: 68
Karma: 20
Join Date: Jan 2009
Location: Athens, Greece
Device: Cybook Gen3
|
Could you add a recipe for the English rss feed of Al Jazeera?
The address is http://english.aljazeera.net/Service...31105943979989 It is the only RSS feed I have been unable to create a recipe from on my own. When I add it it just downloads and creates the first page, but no articles. Thank you in advance! |
![]() |
![]() |
#259 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
I'm afraid they have some protection system that detects scraping and after one or two downloads that work ok server starts to reject requests.
You could try the recipe from some other IP address and placing this in your code: Code:
simultaneous_downloads = 1 delay = 4 |
![]() |
![]() |
#260 | |
Connoisseur
![]() Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
|
Harper Magazine
Quote:
I would like to see such a recipe. Thanks. XG |
|
![]() |
Advert | |
|
![]() |
#261 |
Connoisseur
![]() Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
|
Cancel My Request for Harper's
|
![]() |
![]() |
#262 | |
Connoisseur
![]() Posts: 51
Karma: 10
Join Date: Dec 2008
Location: Germany
Device: SONY PRS-500
|
Quote:
I have a similar problem with Aljazeera English. I, too, would like to have a recipe for this service. They provide good world news coverage. Thanks if possible... XG Last edited by XanthanGum; 02-21-2009 at 11:00 AM. Reason: Confirm that kiklop74 is right |
|
![]() |
![]() |
#263 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
This behaviour is by design. When you specify --test it means "download only two articles from feed". To download everything do not use --test option. Science news and Spiegel work correctly.
|
![]() |
![]() |
#264 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Quote:
New left review - articles are in pdf. Making this recipe is too time consuming for me. Hidden City quarterly - due to complicated layout of the site this is also complicated recipe (though this one I could actually do) Radical Philosophy - this one is doable - will be done in the next 10-15 days when I catch time The Ghazal Page - this one is also doable - will be done in the next 10-15 days when I catch time |
|
![]() |
![]() |
#265 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
New recipe for Serbian news portal E-novine:
|
![]() |
![]() |
#266 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Al Jazeera in english ()
Code:
#!/usr/bin/env python __license__ = 'GPL v3' __copyright__ = '2009, Darko Miletic <darko.miletic at gmail.com>' ''' aljazeera.net ''' class AlJazeera(BasicNewsRecipe): title = 'Al Jazeera in English' __author__ = 'Darko Miletic' description = 'News from Middle East' publisher = 'Al Jazeera' category = 'news, politics, middle east' simultaneous_downloads = 1 delay = 4 oldest_article = 1 max_articles_per_feed = 100 no_stylesheets = True encoding = 'iso-8859-1' remove_javascript = True use_embedded_content = False html2lrf_options = [ '--comment', description , '--category', category , '--publisher', publisher , '--ignore-tables' ] html2epub_options = 'publisher="' + publisher + '"\ncomments="' + description + '"\ntags="' + category + '"\nlinearize_table=True' keep_only_tags = [dict(name='div', attrs={'id':'ctl00_divContent'})] remove_tags = [ dict(name=['object','link']) ,dict(name='td', attrs={'class':['MostActiveDescHeader','MostActiveDescBody']}) ] feeds = [(u'AL JAZEERA ENGLISH (AJE)', u'http://english.aljazeera.net/Services/Rss/?PostingId=2007731105943979989' )] def preprocess_html(self, soup): for item in soup.findAll(style=True): del item['style'] for item in soup.findAll(face=True): del item['face'] return soup + |
![]() |
![]() |
#267 | |
Hyperreader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 130
Karma: 28678
Join Date: Feb 2009
Device: Current: Boox Leaf2 (broken) Past: H2O, Kindle PW1, DXG;Pocketbook 360
|
Quote:
![]() Anyway, here's the recipe for Paul Thurrott's SuperSite for Windows |
|
![]() |
![]() |
#268 |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Feb 2009
Device: Sony Reader
|
'The Register' recipe
Only just got a Sony Reader and started using the Calibre software. The idea of being able to convert RSS feeds to an ebook is really appealing. I've attempted to create a custom news source for 'The Register' (http://www.theregister.co.uk/headlines.atom). The feed downloads OK and a book is produced but it only contains the feeds and not any content from the associated web page. My initial thought was that Calibre does not handle Atom feeds but the website does mention support for Atom. Any suggestions?
The code is as follows: class AdvancedUserRecipe1235238489(BasicNewsRecipe): title = u'The Register' oldest_article = 7 max_articles_per_feed = 100 use_embedded_content = False feeds = [(u'The Register', u'http://www.theregister.co.uk/headlines.atom')] Note. I added use_embedded_content = False and the file size did increase, so I assume some extra content was included but the first few pages that I checked still only contained the Feed information. |
![]() |
![]() |
#269 |
hopeless n00b
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,110
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
|
Try adding:
Code:
def print_version(self, url): return url + 'print.html' |
![]() |
![]() |
#270 |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Feb 2009
Device: Sony Reader
|
Remove <a> tags in body of article but keep element text
Thanks for that. I've now got it working reasonably well. The next issue is that the article contains hyperlinks. The default processing seems to be to replace these with the element text and then include the url in brackets afterwards. Is there a way to stop the url coming out. My initial thought was to try the pre/post processing functions but this appears to filter out way too early.
|
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |