Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 04-18-2009, 09:05 AM   #451
ax42
Member
ax42 began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Apr 2009
Location: Switzerland
Device: PRS505
Zurich cinema

Hi,

I'm trying to build a recipe for the following page which lists the current films showing in the Zurich cinemas:

http://www.kulturinfo.ch/kino/db_front/showact.php

Each link goes to a description of the film. I would like to end up with an ebook where the films are the "Chapters".

So far I have the following code:

Code:
#!/usr/bin/env  python
# vim:et:sts:sw=4:sts=4
# Last modified: 2009 Apr 18
"""
zhkimo
"""
import string, re
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup

class zhkino(BasicNewsRecipe):

    title = "Zurich Cinema"
    __author__ = "Alexis Iglauer"
    description = "Weekly Cinema listing for Zurich"
    index = 'http://www.kulturinfo.ch/kino/db_front/showact.php'
    #remove_tags_before = dict(name='div', id='storytop')
    #remove_tags        = [dict(name='div', id=['seealso', 'storybottom', 'footer', 'ad_banner_top', 'sidebar'])]
    no_stylesheets     = True
    #feeds          = [ ('News Front Page', 'http://newsrss.bbc.co.uk/rss/newsonline_world_edition/front_page/rss.xml')] 


    def parse_index(self):
        return [('axtst',{'title':'T', 'date':'D', 'url',"U", 'description':"D"})]
As you can see, I am commenting out a few things to get a handle on the error I am getting:
Code:
ax@shiny:/pub/Books/tmp/t$ feeds2disk --debug --test zhkino.py 
Traceback (most recent call last):
  File "/Applications/Tools/calibre.app/Contents/Resources/loaders/feeds2disk.py", line 9, in <module>
    main()
  File "/Applications/Tools/calibre.app/Contents/Resources/lib/python2.6/site-packages.zip/calibre/web/feeds/main.py", line 164, in main
  File "/Applications/Tools/calibre.app/Contents/Resources/lib/python2.6/site-packages.zip/calibre/web/feeds/main.py", line 135, in run_recipe
  File "calibre/web/feeds/recipes/__init__.pyo", line 106, in compile_recipe
  File "/var/folders/QO/QONZvFNdEi0MpoGeGOCCBk+++TI/-Tmp-/calibre_0.5.7__c2R1d_recipes/recipe1.py", line 5, in <module>
    zhkino.py
NameError: name 'zhkino' is not defined
BTW this is on OS X with calibre 0.5.7. Any pointers would be much appreciated.

Kind regards
Alexis
ax42 is offline  
Old 04-18-2009, 11:47 AM   #452
laborg
Zealot
laborg has top level security clearance to Area 51.laborg has top level security clearance to Area 51.laborg has top level security clearance to Area 51.laborg has top level security clearance to Area 51.laborg has top level security clearance to Area 51.laborg has top level security clearance to Area 51.laborg has top level security clearance to Area 51.laborg has top level security clearance to Area 51.laborg has top level security clearance to Area 51.laborg has top level security clearance to Area 51.laborg has top level security clearance to Area 51.
 
Posts: 105
Karma: 94000
Join Date: Oct 2007
Location: Vienna
Device: Cybook Gen3
Quote:
Originally Posted by ax42 View Post
Hi,

Code:
    def parse_index(self):
        return [('axtst',{'title':'T', 'date':'D', 'url',"U", 'description':"D"})]
Alexis
You forgot a ":" between url and "U" ...
laborg is offline  
Old 04-18-2009, 11:55 AM   #453
ax42
Member
ax42 began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Apr 2009
Location: Switzerland
Device: PRS505
Quote:
Originally Posted by laborg View Post
You forgot a ":" between url and "U" ...
Well spotted, thanks!
ax42 is offline  
Old 04-18-2009, 12:34 PM   #454
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,395
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@redp

You can't do TXT output at the moment. The 0.6.0 release of calibre will support this, so you have to wait until then.
kovidgoyal is online now  
Old 04-18-2009, 02:55 PM   #455
redp
Junior Member
redp began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Apr 2009
Device: none
news feed to text receipe

Quote:
Originally Posted by kovidgoyal View Post
@redp

You can't do TXT output at the moment. The 0.6.0 release of calibre will support this, so you have to wait until then.
With all due respect kovidgoyal, I think you underestimate the flexibility of Calibre... I think it is pretty easy to extract text from tags bearing news and later use reg exp to strip the body off tags. If I understood you correctly, in the future release you can do it with one command, but I bet you can do the same with 3-4 py commands with the current version... Good chance I miss something so I beg my paddorn in advance,

Redp
redp is offline  
Old 04-18-2009, 03:30 PM   #456
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,395
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The output of recipes is saved as HTML and then processed by the rest of the conversion system. You can certainly write an arbitrarily complex recipe that does whatever you want and then use it with the feeds2disk command to output, but you're on your own doing that
kovidgoyal is online now  
Old 04-18-2009, 07:02 PM   #457
ax42
Member
ax42 began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Apr 2009
Location: Switzerland
Device: PRS505
Cinema, take 2

Right, I'm making progress with my Zurich Cinemas script, but am running into a conceptual issue -- apologies if this is clear in the manual somewhere but so far I haven't found it.

The page http://www.kulturinfo.ch/kino/db_front/showact.php contains a list of films. I would like this list to be the 'table of contents' of my eBook and each link to go to a page giving the film details (as happens when you click on the webpage link). I'm busy overriding parse_index to get a list of feeds but seem to be stuck between choosing one of the following two options:

a) Return a list of films, which makes each film heading a feed with one article. This seems to lead to an intermediate page between the 'table of contents' and the actual film description, with this intermediate page having just the one film on it

b) Return a one-item list, with all films attached as a list of articles to this one feed. This causes an table of contents with a single entry in it. The example I've been cribbing off (The Atlantic) does this too.

Is there any way to not have either 'interstitials' like in a) or a single-entry ToC as in b)? If not, I'd probably choose b) as the lesser evil.......

Thanks
ax42
ax42 is offline  
Old 04-18-2009, 07:04 PM   #458
OlaNordmann
Junior Member
OlaNordmann began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Apr 2009
Device: Sony 700BC
Guys.. What am I doing wrong?

Quote:
feeds = [(u'Nyheter utenriks', u'http://www1.vg.no/rss/create.php?categories=12&keywords=&limit=10')]

def print_version(self, url):
return url.replace('http://go.vg.no/cgi-bin/go.cgi/vg-rss-12/http://www.vg.no/nyheter/utenriks/artikkel.php', 'http://www.vg.no/pub/skrivervennlig.hbs')

Fu*ker goes ahead and fetches "http://go.vg.no/cgi-bin/go.cgi/vg-rss-12/http://www.vg.no/nyheter/utenriks/artikkel.php?artid=562364" instead of "http://www.vg.no/pub/skrivervennlig.hbs?artid=562364"
OlaNordmann is offline  
Old 04-18-2009, 08:17 PM   #459
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Quote:
Originally Posted by OlaNordmann View Post
Guys.. What am I doing wrong?
Complicating things.

Always keep it simple.

Code:
    def print_version(self, url):
        uneeded, sep, article_id = url.rpartition('artid=')
        return 'http://www.vg.no/pub/skrivervennlig.hbs?artid=' + article_id
kiklop74 is offline  
Old 04-18-2009, 08:34 PM   #460
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,395
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@ax42 You can acheive whatever effect you want by overriding create_opf in your recipe
kovidgoyal is online now  
Old 04-18-2009, 08:35 PM   #461
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Quote:
Originally Posted by ax42 View Post
The page http://www.kulturinfo.ch/kino/db_front/showact.php contains a list of films. I would like this list to be the 'table of contents' of my eBook and each link to go to a page giving the film details (as happens when you click on the webpage link). I'm busy overriding parse_index to get a list of feeds but seem to be stuck between choosing one of the following two options:

a) Return a list of films, which makes each film heading a feed with one article. This seems to lead to an intermediate page between the 'table of contents' and the actual film description, with this intermediate page having just the one film on it
Why?? This is quite pointless.

Quote:
Originally Posted by ax42 View Post
b) Return a one-item list, with all films attached as a list of articles to this one feed. This causes an table of contents with a single entry in it. The example I've been cribbing off (The Atlantic) does this too.
This is the way to go since TOC will be shown with the list of articles on the reader.

A good example of what you want to accomplish can be found in several recipes I wrote.

For example recipe Vreme does exactly what you want to do. We have one page that lists all articles we want to put into feed. So I just parse them by specific condition appropriate to that page and put found data into only one feed.

Code:
    def parse_index(self):
        articles = []
        soup = self.index_to_soup(self.INDEX)        
        for item in soup.findAll(['h3','h4']):
            description = ''
            title_prefix = ''
            feed_link = item.find('a')
            if feed_link and feed_link.has_key('href') and feed_link['href'].startswith('/cms/view.php'):
                url   = self.INDEX + feed_link['href']
                title = title_prefix + self.tag_to_string(feed_link)
                date  = strftime(self.timefmt)                
                articles.append({
                                  'title'      :title
                                 ,'date'       :date
                                 ,'url'        :url
                                 ,'description':description
                                })
        return [(soup.head.title.string, articles)]

In your case it would look something like this:

Code:
    def parse_index(self):
        articles = []
        soup = self.index_to_soup('http://www.kulturinfo.ch/kino/db_front/showact.php')
        
        for item in soup.findAll('td',attrs={'class':'title'}):
            description = ''
            title_prefix = ''
            feed_link = item.find('a')
            if feed_link and feed_link.has_key('href'):
                unneeded, sep, purl = feed_link['href'].partition('..')
                url   = 'http://www.kulturinfo.ch/kino' + purl
                title = self.tag_to_string(feed_link)
                date  = strftime(self.timefmt)                
                articles.append({
                                  'title'      :title
                                 ,'date'       :date
                                 ,'url'        :url
                                 ,'description':description
                                })
        return [('Articles', articles)]
kiklop74 is offline  
Old 04-18-2009, 08:44 PM   #462
OlaNordmann
Junior Member
OlaNordmann began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Apr 2009
Device: Sony 700BC
Quote:
Originally Posted by kiklop74 View Post
Complicating things.

Always keep it simple.

Code:
    def print_version(self, url):
        uneeded, sep, article_id = url.rpartition('artid=')
        return 'http://www.vg.no/pub/skrivervennlig.hbs?artid=' + article_id
Worked like a charm.. I really don't have a clue what I'm doing...
Anyway thanks alot, man I'm so grateful..
OlaNordmann is offline  
Old 04-19-2009, 05:29 AM   #463
ax42
Member
ax42 began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Apr 2009
Location: Switzerland
Device: PRS505
@kiklop - thanks! I suspected I was busy trying to reinvent the wheel. I'll clean up my script in the next few days and post it.

@kovidgoyal - sounds interesting, is createpdf in the reference docs somewhere?

BTW, where can I report a bug/fix an error in the online docs? The docs for BasicNewsRecipe.get_feeds() says "Return a list of :term:RSS feeds" which looks like a bug. See http://calibre.kovidgoyal.net/user_m...cipe.get_feeds

Thanks again
ax42
ax42 is offline  
Old 04-19-2009, 06:02 AM   #464
ax42
Member
ax42 began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Apr 2009
Location: Switzerland
Device: PRS505
@kiklop - I unfortunately can't run the Vreme recipe (requires a login). Does it result in a page with only one link on it called "Articles"? My code (concidentally) seems to be quite close to what you suggested already (unless I'm missing something). The recipe for the Atlantic also results in a single page with a "Current Issue" link, which comes from the way parse_index passes back the list of feeds.

Code:
def parse_index(self):
        films = []
        soup = self.index_to_soup(self.Index)
        for item in soup.findAll('td', attrs={'class':'title'}):
            if self.DEBUG: print 'i:', item, 's:', item.string
            description = ''

            a = item.find('a')
            if a == None: 
                self.title = item.string.replace('AKTUELLE FILMLISTE', 'ZH Cinema')
                if self.DEBUG: print 'title:', self.title

            else:
                if a.has_key('href'):
                    url = a['href'].replace('..', 'http://www.kulturinfo.ch/kino')
                    if self.DEBUG: print 'url:', url
                title = self.tag_to_string(a)
                films.append({
                                 'title':title,
                                 'date':'',
                                 'url':url,
                                 'description':description
                                })
                if self.DEBUG: print 'ls:', films[-1]
        if self.DEBUG: print 'ret:', ['x', films]
        return [('Filme', films)]
Any ideas?

ax42
ax42 is offline  
Old 04-19-2009, 07:32 AM   #465
pubolab
Member
pubolab began at the beginning.
 
Posts: 17
Karma: 10
Join Date: May 2008
Device: CASIO pocket viewer S1600, Sony PRS-505 and Cybook Gen 3
any chance of good Chinese recipes of zaobao.com?

http://realtime.zaobao.com/news.xml
http://www.zaobao.com/zg/zg.xml
http://www.zaobao.com/gj/gj.xml
http://www.zaobao.com/wencui/wencui.xml

Thanks a lot!
pubolab is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 05:41 AM.


MobileRead.com is a privately owned, operated and funded community.