![]() |
#856 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Quote:
|
|
![]() |
![]() |
#857 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() Posts: 475
Karma: 590
Join Date: Aug 2009
Location: Bangkok, Thailand
Device: Kindle Paperwhite
|
May I request a recipe for :
Bangkok Post: http://www.bangkokpost.com/rss/ Thai Post: http://www.thaipost.net/sitemap Thanks |
![]() |
![]() |
#858 |
Connoisseur
![]() ![]() Posts: 78
Karma: 192
Join Date: Nov 2009
Device: Sony PRS-600
|
Need some help with custome recipe
Hi,
My Python is, after 8 years, a little rusty. But I like Calibre and it's concept of plug-in recipes, so I gave it a try and produced the following recipe: Code:
from calibre.web.feeds.news import BasicNewsRecipe class FokkeEnSukkeRecipe(BasicNewsRecipe) : title = u'Fokke en Sukke' no_stylesheets = True INDEX = 'http://foksuk.nl' keep_only_tags = [dict(name='div', attrs={'class' : 'cartoon'})] remove_tags = [dict(name = 'div', attrs = {'class' : 'selectcartoon'})] def parse_index(self) : dayNames = ['maandag', 'dinsdag', 'woensdag', 'donderdag', 'vrijdag', 'zaterdag & zondag'] soup = self.index_to_soup(self.INDEX) index = soup.find('div', attrs={'class' : 'selectcartoon'}) links = index.findAll('a') maxIndex = len(links) - 1 articles = [] for i in range(len(links)) : if i == 0 : continue if links[i].renderContents() in dayNames : article = {'title' : links[i].renderContents(), 'date' : u'', 'url' : self.INDEX + links[i]['href'], 'description' : ''} articles.append(article) week = index.find('span', attrs={'class' : 'week'}).renderContents() return [[week, articles]] def preprocess_html(self, soup) : cartoon = soup.find('div', attrs={'class' : 'cartoon'}) if cartoon : return cartoon else : return soup Now what I'm doing here is maybe a little weird. For an index I parse a webpage. The returned list of articles have url's that point to similar pages as the index, the only difference being that the div with a css-class of 'cartoon' contains a different images for every article. My theory is that Calibre, after receiving my custom index, tries to parse all the url's and bombs out because that causes a lot of recursion. Implementing preprocess_html() somehow stops that. But as I said, my Python is rusty. So if anyone could give me some pointers I would greatly appriciate it. Edwin |
![]() |
![]() |
#859 | |
Member
![]() Posts: 14
Karma: 10
Join Date: Nov 2009
Device: Kindle 2 (intl.)
|
Quote:
|
|
![]() |
![]() |
#860 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,394
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
What preprocess_html is doing is extracting the div containing the cartoon and returning that. Probably the rest of the HTML on the page has something that causes an error. The download log should tell you what the error is
|
![]() |
![]() |
#861 | |
Connoisseur
![]() ![]() Posts: 78
Karma: 192
Join Date: Nov 2009
Device: Sony PRS-600
|
Re: Need some help with custome recipe
Quote:
When is this property applied? Before or after calling preprocess_html()? I know I could look it up in the source, but as I said, my Python is a little rusty. Yes, it should. But I can make head nor tails of it. Except this oddity: without preprocess_html implemented is starts with Code:
Download nieuws van Fokke en Sukke - debug InputFormatPlugin: Recipe Input running Downloading Downloading FetchingFetching http://foksuk.nl/nl?cm=79&ctime=1257807600&session=52dd92d33ef2789f432ec37762afe338http://foksuk.nl/nl?cm=79&ctime=1257721200&session=52dd92d33ef2789f432ec37762afe338 Processing images... Code:
Download nieuws van Fokke en Sukke - debug InputFormatPlugin: Recipe Input running DownloadingDownloading Fetching http://foksuk.nl/nl?cm=79&ctime=1257721200&session=89d846cf5dd9f1b85e89b24d566680ec Fetching http://foksuk.nl/nl?cm=79&ctime=1257807600&session=89d846cf5dd9f1b85e89b24d566680ec Processing images... But whatever the problem is, it can be worked around. I'll post the finished recipe soon. |
|
![]() |
![]() |
#862 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,394
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
preprocess_html is the very first thing called.
|
![]() |
![]() |
#863 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Nov 2009
Device: Kindle Paperwhite 2
|
The Straits Times for subscriber
Hi all
Anyone has the recipe for the subscriber version on The Straits Times (Digital Straits Times). http://www.straitstimes.com/The+Prin...t+Edition.html I tried to modify the exciting recipe for The Straits Times, managed to download the headlines, but when i click on the downloaded headlines, the ebook viewer showed that I need to subscribe. Can anyone englighten what is missing? I have no programming knowledege :P Code:
#!/usr/bin/env python __license__ = 'GPL v3' __copyright__ = '2008, Darko Miletic <darko.miletic at gmail.com>' ''' www.straitstimes.com ''' from calibre.web.feeds.news import BasicNewsRecipe class DigitalStraitsTimes(BasicNewsRecipe): title = 'Digital Straits Times' __author__ = 'Darko Miletic' description = 'Singapore newspaper' oldest_article = 2 max_articles_per_feed = 100 no_stylesheets = True use_embedded_content = False encoding = 'cp1252' publisher = 'Singapore Press Holdings Ltd.' category = 'news, politics, singapore, asia' language = 'en' extra_css = ' .top_headline{font-size: x-large; font-weight: bold} ' conversion_options = { 'comments' : description ,'tags' : category ,'language' : language ,'publisher' : publisher } needs_subscription = True simultaneous_downloads= 5 delay = 0 LOGIN = 'http://sphreg.asiaone.com/RegAuth2/stpLogin.html?goto=http://www.straitstimes.com/vgn-ext-templating/sti/common/STIRedirect.jsp' def get_browser(self): br = BasicNewsRecipe.get_browser() if self.username is not None and self.password is not None: br.open(self.LOGIN) br.select_form(name='loginForm') br['j_username'] = self.username br['j_password'] = self.password br.submit() return br remove_tags = [dict(name=['object','link','map'])] keep_only_tags = [dict(name='div', attrs={'class':['top_headline','story_text']})] feeds = [ (u'Most Read Stories' , u'http://www.straitstimes.com/STI/STIFILES/rss/mostreadstories.xml' ) ,(u'Top Stories' , u'http://www.straitstimes.com/STI/STIFILES/rss/prime.xml' ) ,(u'Singapore' , u'http://www.straitstimes.com/STI/STIFILES/rss/singapore.xml' ) ,(u'Asia', u'http://www.straitstimes.com/STI/STIFILES/rss/asia.xml') ] def preprocess_html(self, soup): for item in soup.findAll(style=True): del item['style'] return soup |
![]() |
![]() |
#864 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Quote:
See http://calibre.kovidgoyal.net/ticket/2238 and also https://www.mobileread.com/forums/sho...250#post421250 Last edited by kiklop74; 11-14-2009 at 06:51 AM. |
|
![]() |
![]() |
#865 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
|
![]() |
![]() |
#866 | |
Member
![]() Posts: 14
Karma: 10
Join Date: Nov 2009
Device: Kindle 2 (intl.)
|
Quote:
![]() I tried your script, but got these errors: Code:
ERROR: Conversion Error: <b>Failed</b>: Fetch news from Haaretz Fetch news from Haaretz InputFormatPlugin: Recipe Input running Python function terminated unexpectedly (Error Code: 1) Traceback (most recent call last): File "site.py", line 103, in main File "site.py", line 85, in run_entry_point File "site-packages\calibre\utils\ipc\worker.py", line 90, in main File "site-packages\calibre\gui2\convert\gui_conversion.py", line 19, in gui_convert File "site-packages\calibre\ebooks\conversion\plumber.py", line 721, in run File "site-packages\calibre\customize\conversion.py", line 208, in __call__ File "site-packages\calibre\web\feeds\input.py", line 61, in convert File "site-packages\calibre\web\feeds\news.py", line 589, in download File "site-packages\calibre\web\feeds\news.py", line 710, in build_index File "site-packages\calibre\web\feeds\news.py", line 1017, in parse_feeds File "site-packages\calibre\web\feeds\news.py", line 291, in get_feeds NotImplementedError Last edited by --abc--; 11-15-2009 at 02:05 AM. |
|
![]() |
![]() |
#867 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Nov 2009
Device: Kindle Paperwhite 2
|
|
![]() |
![]() |
#868 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Nov 2009
Device: nook
|
Hi all... I am new here..., and a new user of Calibre. Enjoying it so far, with lot of gratitide to the creator and contributors...
Could someone make a recipe for the online magazine shalom. (http://shalomtimes.com/) It does not have rss feeds. I do not not know all the intricacies of the web pages too... I will be very grateful, if someone can help me out.. Thank you in advance |
![]() |
![]() |
#869 |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Oct 2009
Device: PRS-505
|
Hi
I have not any idea about recipes but I was reading the manual and help and there is no way to start at least. I am trying to make a recipe for "Faro de vigo" The address is www.farodevigo.es The feeds index is this one "http://www.farodevigo.es/servicios/rss/rss.jsp?pServicio=rss" There are links like this one "http://www.farodevigo.es//elementosInt/rss/2" that I can open in firefox and read them as RSS. Also the previous link i posted I can open as HTML web page in this link "http://www.farodevigo.es/gran-vigo/" To the point... I can open and see rss with firefox, but there is no way to do it with calibre, it says failed feed and anything else. And if a Itry with the html I get all the DIV, SPAN and so stuff that I have tried to filter with the code of another recipe but I cannot. I think that there is any kind of java or something on the server that doesn´t want to send the feed to calibre. I am losing my sleep Any suggestion very apprecciated. Thanks in advance |
![]() |
![]() |
#870 |
Connoisseur
![]() ![]() Posts: 78
Karma: 192
Join Date: Nov 2009
Device: Sony PRS-600
|
New recipe Fokke en Sukke
|
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |