![]() |
#451 |
Member
![]() Posts: 13
Karma: 10
Join Date: Apr 2009
Location: Switzerland
Device: PRS505
|
Zurich cinema
Hi,
I'm trying to build a recipe for the following page which lists the current films showing in the Zurich cinemas: http://www.kulturinfo.ch/kino/db_front/showact.php Each link goes to a description of the film. I would like to end up with an ebook where the films are the "Chapters". So far I have the following code: Code:
#!/usr/bin/env python # vim:et:sts:sw=4:sts=4 # Last modified: 2009 Apr 18 """ zhkimo """ import string, re from calibre import strftime from calibre.web.feeds.recipes import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import BeautifulSoup class zhkino(BasicNewsRecipe): title = "Zurich Cinema" __author__ = "Alexis Iglauer" description = "Weekly Cinema listing for Zurich" index = 'http://www.kulturinfo.ch/kino/db_front/showact.php' #remove_tags_before = dict(name='div', id='storytop') #remove_tags = [dict(name='div', id=['seealso', 'storybottom', 'footer', 'ad_banner_top', 'sidebar'])] no_stylesheets = True #feeds = [ ('News Front Page', 'http://newsrss.bbc.co.uk/rss/newsonline_world_edition/front_page/rss.xml')] def parse_index(self): return [('axtst',{'title':'T', 'date':'D', 'url',"U", 'description':"D"})] Code:
ax@shiny:/pub/Books/tmp/t$ feeds2disk --debug --test zhkino.py Traceback (most recent call last): File "/Applications/Tools/calibre.app/Contents/Resources/loaders/feeds2disk.py", line 9, in <module> main() File "/Applications/Tools/calibre.app/Contents/Resources/lib/python2.6/site-packages.zip/calibre/web/feeds/main.py", line 164, in main File "/Applications/Tools/calibre.app/Contents/Resources/lib/python2.6/site-packages.zip/calibre/web/feeds/main.py", line 135, in run_recipe File "calibre/web/feeds/recipes/__init__.pyo", line 106, in compile_recipe File "/var/folders/QO/QONZvFNdEi0MpoGeGOCCBk+++TI/-Tmp-/calibre_0.5.7__c2R1d_recipes/recipe1.py", line 5, in <module> zhkino.py NameError: name 'zhkino' is not defined Kind regards Alexis |
![]() |
![]() |
#452 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 105
Karma: 94000
Join Date: Oct 2007
Location: Vienna
Device: Cybook Gen3
|
|
![]() |
![]() |
#453 |
Member
![]() Posts: 13
Karma: 10
Join Date: Apr 2009
Location: Switzerland
Device: PRS505
|
|
![]() |
![]() |
#454 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,395
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@redp
You can't do TXT output at the moment. The 0.6.0 release of calibre will support this, so you have to wait until then. |
![]() |
![]() |
#455 | |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Apr 2009
Device: none
|
news feed to text receipe
Quote:
Redp |
|
![]() |
![]() |
#456 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,395
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The output of recipes is saved as HTML and then processed by the rest of the conversion system. You can certainly write an arbitrarily complex recipe that does whatever you want and then use it with the feeds2disk command to output, but you're on your own doing that
![]() |
![]() |
![]() |
#457 |
Member
![]() Posts: 13
Karma: 10
Join Date: Apr 2009
Location: Switzerland
Device: PRS505
|
Cinema, take 2
Right, I'm making progress with my Zurich Cinemas script, but am running into a conceptual issue -- apologies if this is clear in the manual somewhere but so far I haven't found it.
The page http://www.kulturinfo.ch/kino/db_front/showact.php contains a list of films. I would like this list to be the 'table of contents' of my eBook and each link to go to a page giving the film details (as happens when you click on the webpage link). I'm busy overriding parse_index to get a list of feeds but seem to be stuck between choosing one of the following two options: a) Return a list of films, which makes each film heading a feed with one article. This seems to lead to an intermediate page between the 'table of contents' and the actual film description, with this intermediate page having just the one film on it b) Return a one-item list, with all films attached as a list of articles to this one feed. This causes an table of contents with a single entry in it. The example I've been cribbing off (The Atlantic) does this too. Is there any way to not have either 'interstitials' like in a) or a single-entry ToC as in b)? If not, I'd probably choose b) as the lesser evil....... Thanks ax42 |
![]() |
![]() |
#458 | |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Apr 2009
Device: Sony 700BC
|
Guys.. What am I doing wrong?
Quote:
Fu*ker goes ahead and fetches "http://go.vg.no/cgi-bin/go.cgi/vg-rss-12/http://www.vg.no/nyheter/utenriks/artikkel.php?artid=562364" instead of "http://www.vg.no/pub/skrivervennlig.hbs?artid=562364" |
|
![]() |
![]() |
#459 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
|
![]() |
![]() |
#460 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,395
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
@ax42 You can acheive whatever effect you want by overriding create_opf in your recipe
|
![]() |
![]() |
#461 | ||
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Quote:
Quote:
A good example of what you want to accomplish can be found in several recipes I wrote. For example recipe Vreme does exactly what you want to do. We have one page that lists all articles we want to put into feed. So I just parse them by specific condition appropriate to that page and put found data into only one feed. Code:
def parse_index(self): articles = [] soup = self.index_to_soup(self.INDEX) for item in soup.findAll(['h3','h4']): description = '' title_prefix = '' feed_link = item.find('a') if feed_link and feed_link.has_key('href') and feed_link['href'].startswith('/cms/view.php'): url = self.INDEX + feed_link['href'] title = title_prefix + self.tag_to_string(feed_link) date = strftime(self.timefmt) articles.append({ 'title' :title ,'date' :date ,'url' :url ,'description':description }) return [(soup.head.title.string, articles)] In your case it would look something like this: Code:
def parse_index(self): articles = [] soup = self.index_to_soup('http://www.kulturinfo.ch/kino/db_front/showact.php') for item in soup.findAll('td',attrs={'class':'title'}): description = '' title_prefix = '' feed_link = item.find('a') if feed_link and feed_link.has_key('href'): unneeded, sep, purl = feed_link['href'].partition('..') url = 'http://www.kulturinfo.ch/kino' + purl title = self.tag_to_string(feed_link) date = strftime(self.timefmt) articles.append({ 'title' :title ,'date' :date ,'url' :url ,'description':description }) return [('Articles', articles)] |
||
![]() |
![]() |
#462 | |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Apr 2009
Device: Sony 700BC
|
Quote:
Anyway thanks alot, man ![]() |
|
![]() |
![]() |
#463 |
Member
![]() Posts: 13
Karma: 10
Join Date: Apr 2009
Location: Switzerland
Device: PRS505
|
@kiklop - thanks! I suspected I was busy trying to reinvent the wheel. I'll clean up my script in the next few days and post it.
@kovidgoyal - sounds interesting, is createpdf in the reference docs somewhere? BTW, where can I report a bug/fix an error in the online docs? The docs for BasicNewsRecipe.get_feeds() says "Return a list of :term:RSS feeds" which looks like a bug. See http://calibre.kovidgoyal.net/user_m...cipe.get_feeds Thanks again ax42 |
![]() |
![]() |
#464 |
Member
![]() Posts: 13
Karma: 10
Join Date: Apr 2009
Location: Switzerland
Device: PRS505
|
@kiklop - I unfortunately can't run the Vreme recipe (requires a login). Does it result in a page with only one link on it called "Articles"? My code (concidentally) seems to be quite close to what you suggested already (unless I'm missing something). The recipe for the Atlantic also results in a single page with a "Current Issue" link, which comes from the way parse_index passes back the list of feeds.
Code:
def parse_index(self): films = [] soup = self.index_to_soup(self.Index) for item in soup.findAll('td', attrs={'class':'title'}): if self.DEBUG: print 'i:', item, 's:', item.string description = '' a = item.find('a') if a == None: self.title = item.string.replace('AKTUELLE FILMLISTE', 'ZH Cinema') if self.DEBUG: print 'title:', self.title else: if a.has_key('href'): url = a['href'].replace('..', 'http://www.kulturinfo.ch/kino') if self.DEBUG: print 'url:', url title = self.tag_to_string(a) films.append({ 'title':title, 'date':'', 'url':url, 'description':description }) if self.DEBUG: print 'ls:', films[-1] if self.DEBUG: print 'ret:', ['x', films] return [('Filme', films)] ax42 |
![]() |
![]() |
#465 |
Member
![]() Posts: 17
Karma: 10
Join Date: May 2008
Device: CASIO pocket viewer S1600, Sony PRS-505 and Cybook Gen 3
|
any chance of good Chinese recipes of zaobao.com?
http://realtime.zaobao.com/news.xml http://www.zaobao.com/zg/zg.xml http://www.zaobao.com/gj/gj.xml http://www.zaobao.com/wencui/wencui.xml Thanks a lot! |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |