Custom recipes (archive, read-only) - Page 58

kiklop74 · 11-06-2009, 08:19 AM

Quote:

Originally Posted by niaiserie

First, I just want to say thanks to whoever created the Harper's subscription recipe. It works beautifully. But I was wondering, since Harper's gives subscribers access to all past issues, if there was a way to download a specific issue via some input in the Fetch News dialog?

I'd also like to request a recipe for the subscription version of The Nation.

Thanks.

It is possible to create such recipe but calibre GUI does not permit specification of custom input parameters other than username and password.

bthoven · 11-06-2009, 12:37 PM

May I request a recipe for :
Bangkok Post: http://www.bangkokpost.com/rss/
Thai Post: http://www.thaipost.net/sitemap

Thanks

evanmaastrigt · 11-09-2009, 07:40 PM

Hi,

My Python is, after 8 years, a little rusty. But I like Calibre and it's concept of plug-in recipes, so I gave it a try and produced the following recipe:

Code:

from calibre.web.feeds.news import BasicNewsRecipe

class FokkeEnSukkeRecipe(BasicNewsRecipe) :
	title          = u'Fokke en Sukke'
	no_stylesheets = True
	INDEX = 'http://foksuk.nl'
	
	keep_only_tags = [dict(name='div', attrs={'class' : 'cartoon'})]
	remove_tags = [dict(name = 'div', attrs = {'class' : 'selectcartoon'})]
	
	def parse_index(self) :
		dayNames = ['maandag', 'dinsdag', 'woensdag', 'donderdag', 'vrijdag', 'zaterdag & zondag']
		soup = self.index_to_soup(self.INDEX)
		
		index = soup.find('div', attrs={'class' : 'selectcartoon'})
		links = index.findAll('a')
		maxIndex = len(links) - 1
		articles = []
		for i in range(len(links)) :
			if i == 0 :
				continue
			
			if links[i].renderContents() in dayNames :
				article = {'title' : links[i].renderContents(), 'date' : u'', 'url'  : self.INDEX + links[i]['href'], 'description' : ''}
				articles.append(article)
					
		week = index.find('span', attrs={'class' : 'week'}).renderContents()
		
		return [[week, articles]]
					
	def preprocess_html(self, soup) :
		cartoon = soup.find('div', attrs={'class' : 'cartoon'})
		if cartoon :
			return cartoon
		else :
			return soup

Now this actually seems to work, which is nice. But it is not completely finished yet. But before I continue I like to now why this works. If I comment out the preprocess_html() override it cannot find the cartoons I'm after anymore. Which I don't really understand.

Now what I'm doing here is maybe a little weird. For an index I parse a webpage. The returned list of articles have url's that point to similar pages as the index, the only difference being that the div with a css-class of 'cartoon' contains a different images for every article.

My theory is that Calibre, after receiving my custom index, tries to parse all the url's and bombs out because that causes a lot of recursion. Implementing preprocess_html() somehow stops that.

But as I said, my Python is rusty. So if anyone could give me some pointers I would greatly appriciate it.

Edwin

--abc-- · 11-10-2009, 07:10 AM

Quote:

Originally Posted by mccande

Could I ask one of the Python geniuses to provide recipes for Haaretz
http://www.haaretz.com/feed/enewsRss.xml

As I could not find a recipe for Haaretz, I would like to ask if it is possible to create one? That would be a really great addition!

kovidgoyal · 11-10-2009, 12:06 PM

What preprocess_html is doing is extracting the div containing the cartoon and returning that. Probably the rest of the HTML on the page has something that causes an error. The download log should tell you what the error is

evanmaastrigt · 11-10-2009, 02:48 PM

Quote:

Originally Posted by kovidgoyal

What preprocess_html is doing is extracting the div containing the cartoon and returning that. Probably the rest of the HTML on the page has something that causes an error. The download log should tell you what the error is

Thanks for your reply. But what I don't understand is that I already constrain the input to that very same div by setting the keep_only_tags property.

When is this property applied? Before or after calling preprocess_html()? I know I could look it up in the source, but as I said, my Python is a little rusty.

Quote:

Originally Posted by kovidgoyal

The download log should tell you what the error is

Yes, it should. But I can make head nor tails of it. Except this oddity: without preprocess_html implemented is starts with

Code:

Download nieuws van Fokke en Sukke - debug
InputFormatPlugin: Recipe Input running Downloading
Downloading
FetchingFetching  http://foksuk.nl/nl?cm=79&ctime=1257807600&session=52dd92d33ef2789f432ec37762afe338http://foksuk.nl/nl?cm=79&ctime=1257721200&session=52dd92d33ef2789f432ec37762afe338

Processing images...

and with

Code:

Download nieuws van Fokke en Sukke - debug
InputFormatPlugin: Recipe Input running DownloadingDownloading

Fetching http://foksuk.nl/nl?cm=79&ctime=1257721200&session=89d846cf5dd9f1b85e89b24d566680ec
Fetching http://foksuk.nl/nl?cm=79&ctime=1257807600&session=89d846cf5dd9f1b85e89b24d566680ec
Processing images...

But I have no idea what this might mean, if anything.

But whatever the problem is, it can be worked around. I'll post the finished recipe soon.

kovidgoyal · 11-10-2009, 03:26 PM

preprocess_html is the very first thing called.

dongdong · 11-14-2009, 02:11 AM

Hi all

Anyone has the recipe for the subscriber version on The Straits Times (Digital Straits Times).

http://www.straitstimes.com/The+Prin...t+Edition.html

I tried to modify the exciting recipe for The Straits Times, managed to download the headlines, but when i click on the downloaded headlines, the ebook viewer showed that I need to subscribe.

Can anyone englighten what is missing? I have no programming knowledege :P

Code:

#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = '2008, Darko Miletic <darko.miletic at gmail.com>'
'''
www.straitstimes.com
'''

from calibre.web.feeds.news import BasicNewsRecipe

class DigitalStraitsTimes(BasicNewsRecipe):
    title                  = 'Digital Straits Times'
    __author__             = 'Darko Miletic'
    description            = 'Singapore newspaper'
    oldest_article         = 2
    max_articles_per_feed  = 100
    no_stylesheets         = True
    use_embedded_content   = False
    encoding               = 'cp1252'
    publisher              = 'Singapore Press Holdings Ltd.'
    category               = 'news, politics, singapore, asia'
    language               = 'en'
    extra_css              = ' .top_headline{font-size: x-large; font-weight: bold} '

    conversion_options = {
                             'comments'  : description
                            ,'tags'      : category
                            ,'language'  : language
                            ,'publisher' : publisher
                         }
  
    needs_subscription    = True
    simultaneous_downloads= 5
    delay                 = 0
    LOGIN = 'http://sphreg.asiaone.com/RegAuth2/stpLogin.html?goto=http://www.straitstimes.com/vgn-ext-templating/sti/common/STIRedirect.jsp'
    
    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        if self.username is not None and self.password is not None:
            br.open(self.LOGIN)
            br.select_form(name='loginForm')
            br['j_username'] = self.username
            br['j_password'] = self.password
            br.submit()
        return br

    remove_tags = [dict(name=['object','link','map'])]

    keep_only_tags = [dict(name='div', attrs={'class':['top_headline','story_text']})]

        
    feeds = [ 
               (u'Most Read Stories'         , u'http://www.straitstimes.com/STI/STIFILES/rss/mostreadstories.xml'        ) 
              ,(u'Top Stories'         , u'http://www.straitstimes.com/STI/STIFILES/rss/prime.xml'        ) 
              ,(u'Singapore'       , u'http://www.straitstimes.com/STI/STIFILES/rss/singapore.xml'      ) 
              ,(u'Asia', u'http://www.straitstimes.com/STI/STIFILES/rss/asia.xml') 
            ]

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        return soup

kiklop74 · 11-14-2009, 07:42 AM

Quote:

Originally Posted by --abc--

As I could not find a recipe for Haaretz, I would like to ask if it is possible to create one? That would be a really great addition!

I already tried that last year but there is an ssue with RTL languages in epub format. I was not able to make it display properly. There is an issue in the calibre trac about that.

See http://calibre.kovidgoyal.net/ticket/2238

and also
https://www.mobileread.com/forums/sho...250#post421250

kiklop74 · 11-14-2009, 07:47 AM

Quote:

Originally Posted by dongdong

Hi all

Anyone has the recipe for the subscriber version on The Straits Times (Digital Straits Times).

I tried once making that recipe and was unable to make it work due to rather decent protection they have for logging on.

--abc-- · 11-15-2009, 03:02 AM

Quote:

Originally Posted by kiklop74

I already tried that last year but there is an ssue with RTL languages in epub format. I was not able to make it display properly. There is an issue in the calibre trac about that.

See http://calibre.kovidgoyal.net/ticket/2238

and also
https://www.mobileread.com/forums/sho...250#post421250

Thank you very much for your answer. Did you also try to make a recipe for the English version of Haaretz? I don't understand Hebrew, so I am a lot more interested in the English news from Haaretz.

I tried your script, but got these errors:

Code:

ERROR: Conversion Error: <b>Failed</b>: Fetch news from Haaretz

Fetch news from Haaretz
InputFormatPlugin: Recipe Input running Python function terminated unexpectedly
   (Error Code: 1)
Traceback (most recent call last):
  File "site.py", line 103, in main
  File "site.py", line 85, in run_entry_point
  File "site-packages\calibre\utils\ipc\worker.py", line 90, in main
  File "site-packages\calibre\gui2\convert\gui_conversion.py", line 19, in gui_convert
  File "site-packages\calibre\ebooks\conversion\plumber.py", line 721, in run
  File "site-packages\calibre\customize\conversion.py", line 208, in __call__
  File "site-packages\calibre\web\feeds\input.py", line 61, in convert
  File "site-packages\calibre\web\feeds\news.py", line 589, in download
  File "site-packages\calibre\web\feeds\news.py", line 710, in build_index
  File "site-packages\calibre\web\feeds\news.py", line 1017, in parse_feeds
  File "site-packages\calibre\web\feeds\news.py", line 291, in get_feeds
NotImplementedError

dongdong · 11-15-2009, 08:11 AM

Quote:

Originally Posted by kiklop74

I tried once making that recipe and was unable to make it work due to rather decent protection they have for logging on.

hi thanks, what a waste. I managed to log on using my subscription, thought I had it done when I saw the headlines downloaded, too bad then

jerome2018 · 11-15-2009, 03:47 PM

Hi all... I am new here..., and a new user of Calibre. Enjoying it so far, with lot of gratitide to the creator and contributors...

Could someone make a recipe for the online magazine shalom. (http://shalomtimes.com/) It does not have rss feeds. I do not not know all the intricacies of the web pages too...

I will be very grateful, if someone can help me out..

Thank you in advance

fortunados · 11-16-2009, 07:53 AM

Hi
I have not any idea about recipes but I was reading the manual and help and there is no way to start at least.

I am trying to make a recipe for "Faro de vigo"

The address is www.farodevigo.es

The feeds index is this one "http://www.farodevigo.es/servicios/rss/rss.jsp?pServicio=rss"

There are links like this one "http://www.farodevigo.es//elementosInt/rss/2" that I can open in firefox and read them as RSS.

Also the previous link i posted I can open as HTML web page in this link
"http://www.farodevigo.es/gran-vigo/"

To the point...
I can open and see rss with firefox, but there is no way to do it with calibre, it says failed feed and anything else.

And if a Itry with the html I get all the DIV, SPAN and so stuff that I have tried to filter with the code of another recipe but I cannot.

I think that there is any kind of java or something on the server that doesn´t want to send the feed to calibre.

I am losing my sleep

Any suggestion very apprecciated.

Thanks in advance

evanmaastrigt · 11-16-2009, 02:54 PM

Here is a new recipe for the popular Dutch daily cartoon 'Fokke en Sukke'

Enjoy!

fokkeensukke.zip

11-09-2009, 07:40 PM	#858
evanmaastrigt Connoisseur Posts: 78 Karma: 192 Join Date: Nov 2009 Device: Sony PRS-600	Need some help with custome recipe Hi, My Python is, after 8 years, a little rusty. But I like Calibre and it's concept of plug-in recipes, so I gave it a try and produced the following recipe: Code: from calibre.web.feeds.news import BasicNewsRecipe class FokkeEnSukkeRecipe(BasicNewsRecipe) : title = u'Fokke en Sukke' no_stylesheets = True INDEX = 'http://foksuk.nl' keep_only_tags = [dict(name='div', attrs={'class' : 'cartoon'})] remove_tags = [dict(name = 'div', attrs = {'class' : 'selectcartoon'})] def parse_index(self) : dayNames = ['maandag', 'dinsdag', 'woensdag', 'donderdag', 'vrijdag', 'zaterdag & zondag'] soup = self.index_to_soup(self.INDEX) index = soup.find('div', attrs={'class' : 'selectcartoon'}) links = index.findAll('a') maxIndex = len(links) - 1 articles = [] for i in range(len(links)) : if i == 0 : continue if links[i].renderContents() in dayNames : article = {'title' : links[i].renderContents(), 'date' : u'', 'url' : self.INDEX + links[i]['href'], 'description' : ''} articles.append(article) week = index.find('span', attrs={'class' : 'week'}).renderContents() return [[week, articles]] def preprocess_html(self, soup) : cartoon = soup.find('div', attrs={'class' : 'cartoon'}) if cartoon : return cartoon else : return soup Now this actually seems to work, which is nice. But it is not completely finished yet. But before I continue I like to now why this works. If I comment out the preprocess_html() override it cannot find the cartoons I'm after anymore. Which I don't really understand. Now what I'm doing here is maybe a little weird. For an index I parse a webpage. The returned list of articles have url's that point to similar pages as the index, the only difference being that the div with a css-class of 'cartoon' contains a different images for every article. My theory is that Calibre, after receiving my custom index, tries to parse all the url's and bombs out because that causes a lot of recursion. Implementing preprocess_html() somehow stops that. But as I said, my Python is rusty. So if anyone could give me some pointers I would greatly appriciate it. Edwin

11-16-2009, 02:54 PM	#870
evanmaastrigt Connoisseur Posts: 78 Karma: 192 Join Date: Nov 2009 Device: Sony PRS-600	New recipe Fokke en Sukke Here is a new recipe for the popular Dutch daily cartoon 'Fokke en Sukke' Enjoy! fokkeensukke.zip

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Custom column read ?	pchrist7	Calibre	2	10-04-2010 03:52 AM
Archive for custom screensavers	sleeplessdave	Amazon Kindle	1	07-07-2010 01:33 PM
How to back up preferences and custom recipes?	greenapple	Calibre	3	03-29-2010 06:08 AM
Donations for Custom Recipes	ddavtian	Calibre	5	01-23-2010 05:54 PM
Help understanding custom recipes	andersent	Calibre	0	12-17-2009 03:37 PM

11-06-2009, 12:37 PM	#857
bthoven Evangelist Posts: 475 Karma: 590 Join Date: Aug 2009 Location: Bangkok, Thailand Device: Kindle Paperwhite	May I request a recipe for : Bangkok Post: http://www.bangkokpost.com/rss/ Thai Post: http://www.thaipost.net/sitemap Thanks

11-10-2009, 12:06 PM	#860
kovidgoyal creator of calibre Posts: 45,733 Karma: 28549306 Join Date: Oct 2006 Location: Mumbai, India Device: Various	What preprocess_html is doing is extracting the div containing the cartoon and returning that. Probably the rest of the HTML on the page has something that causes an error. The download log should tell you what the error is

11-10-2009, 03:26 PM	#862
kovidgoyal creator of calibre Posts: 45,733 Karma: 28549306 Join Date: Oct 2006 Location: Mumbai, India Device: Various	preprocess_html is the very first thing called.

11-15-2009, 03:47 PM	#868
jerome2018 Junior Member Posts: 1 Karma: 10 Join Date: Nov 2009 Device: nook	Hi all... I am new here..., and a new user of Calibre. Enjoying it so far, with lot of gratitide to the creator and contributors... Could someone make a recipe for the online magazine shalom. (http://shalomtimes.com/) It does not have rss feeds. I do not not know all the intricacies of the web pages too... I will be very grateful, if someone can help me out.. Thank you in advance

11-16-2009, 07:53 AM	#869
fortunados Junior Member Posts: 6 Karma: 10 Join Date: Oct 2009 Device: PRS-505	Hi I have not any idea about recipes but I was reading the manual and help and there is no way to start at least. I am trying to make a recipe for "Faro de vigo" The address is www.farodevigo.es The feeds index is this one "http://www.farodevigo.es/servicios/rss/rss.jsp?pServicio=rss" There are links like this one "http://www.farodevigo.es//elementosInt/rss/2" that I can open in firefox and read them as RSS. Also the previous link i posted I can open as HTML web page in this link "http://www.farodevigo.es/gran-vigo/" To the point... I can open and see rss with firefox, but there is no way to do it with calibre, it says failed feed and anything else. And if a Itry with the html I get all the DIV, SPAN and so stuff that I have tried to filter with the code of another recipe but I cannot. I think that there is any kind of java or something on the server that doesn´t want to send the feed to calibre. I am losing my sleep Any suggestion very apprecciated. Thanks in advance