Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 10-27-2008, 09:30 PM   #1
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 780
Karma: 194642
Join Date: Dec 2007
Location: Argentina
Device: Kindle PaperWhite, Motorola Xoom
Problems writing recipe

I'm trying to write recipe for one weekly magazine on-line. The frontpage is the one with the links embedded into span tags with specific class. The code works - sort of.

Even though page has 10-13 links the loop I created retrieves 2 and than stops. Can anybody help me with this please?

Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = '2008, Darko Miletic <darko.miletic at gmail.com>'
'''
vreme.com
'''

import string
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe

class Vreme(BasicNewsRecipe):
    
    title       = 'Vreme'
    __author__  = 'Darko Miletic'
    description = 'Politicki Nedeljnik Srbije'
    timefmt = ' [%a, %d %b, %Y]'
    no_stylesheets = True
    simultaneous_downloads = 1
    delay = 1
    INDEX = 'http://www.vreme.com'

    
    def parse_index(self):
        articles = []
        soup = self.index_to_soup(self.INDEX)
        
        for item in soup.findAll('span', attrs={'class':'toc2'}):
            #print item
            feed_link = item.find('a')
            if feed_link and feed_link.has_key('href'):
                url = self.INDEX+feed_link['href']+'&print=yes'
                title = self.tag_to_string(feed_link)
                date = strftime('%a, %d %b')
                description = ''
                articles.append({
                                 'title':title,
                                 'date':date,
                                 'url':url,
                                 'description':description
                                })
        return [('Latest edition', articles)]
kiklop74 is offline   Reply With Quote
Old 10-27-2008, 10:49 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,779
Karma: 4998511
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
works for me if i add the line
# -*- coding: utf-8 -*-

to the top
kovidgoyal is offline   Reply With Quote
Old 10-28-2008, 06:02 AM   #3
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 780
Karma: 194642
Join Date: Dec 2007
Location: Argentina
Device: Kindle PaperWhite, Motorola Xoom
When I added line you suggested the script crashes when retrieving second link:

Code:
INFO: Downloading

DEBUG: Fetching http://www.vreme.com/cms/view.php?id=726606&print=yes

Traceback (most recent call last):
  File "main.py", line 151, in <module>
  File "main.py", line 146, in main
  File "main.py", line 134, in run_recipe
  File "calibre\web\feeds\news.pyo", line 472, in download
  File "calibre\web\feeds\news.pyo", line 639, in build_index
  File "calibre\threadpool.pyo", line 219, in poll
  File "calibre\web\feeds\news.pyo", line 743, in article_downloaded
TypeError: not all arguments converted during string formatting
kiklop74 is offline   Reply With Quote
Old 10-28-2008, 06:09 AM   #4
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 780
Karma: 194642
Join Date: Dec 2007
Location: Argentina
Device: Kindle PaperWhite, Motorola Xoom
I noticed that I have older version of calibre so I downloaded and installed latest version and as a result it does not crash with your line any more but it still downloads only two pages ignoring the other links. Is there anything else I can overload or trace to see what is going on?
kiklop74 is offline   Reply With Quote
Old 10-28-2008, 09:32 AM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,779
Karma: 4998511
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Are you running it with the --test option? That will cause only the first two articles to download. Run with the --debug option to see exactly what it is doing.
kovidgoyal is offline   Reply With Quote
Old 10-28-2008, 11:05 AM   #6
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 780
Karma: 194642
Join Date: Dec 2007
Location: Argentina
Device: Kindle PaperWhite, Motorola Xoom
Oh boy, do I feel silly

Yes that was it. I was not aware that --test downloads only two pages

Thanks man. Calibre is really great tool!
kiklop74 is offline   Reply With Quote
Old 10-28-2008, 01:31 PM   #7
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 780
Karma: 194642
Join Date: Dec 2007
Location: Argentina
Device: Kindle PaperWhite, Motorola Xoom
There is another problem I'm experiencing now.

I added support for logon to protected part of that site that gives access to all articles in the magazine.

This is how script looks now:

Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = '2008, Darko Miletic <darko.miletic at gmail.com>'
'''
vreme.com
'''

import string
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe

class Vreme(BasicNewsRecipe):
    
    title       = 'Vreme'
    __author__  = 'Darko Miletic'
    description = 'Politicki Nedeljnik Srbije'
    timefmt = ' [%a, %d %b, %Y]'
    no_stylesheets = True
    simultaneous_downloads = 1
    delay = 1
    needs_subscription = True
    INDEX = 'http://www.vreme.com'
    LOGIN = 'http://www.vreme.com/account/index.php'

    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        if self.username is not None and self.password is not None:
            br.open(self.LOGIN)
            br.select_form(name='f')
            br['username'] = self.username
            br['password'] = self.password
            br.submit()
        return br
    
    def parse_index(self):
        articles = []
        soup = self.index_to_soup(self.INDEX)
        
        for item in soup.findAll('span', attrs={'class':'toc2'}):
            feed_link = item.find('a')
            if feed_link and feed_link.has_key('href'):
                url = self.INDEX+feed_link['href']+'&print=yes'
                title = self.tag_to_string(feed_link)
                date = strftime('%a, %d %b')
                description = ''
                articles.append({
                                 'title':title,
                                 'date':date,
                                 'url':url,
                                 'description':description
                                })
        return [('Latest edition', articles)]
If I execute this from command line it downloads everything fine but if I execute the same thing from calibre GUI the job just hangs doing nothing.

This is command line I used:

Code:
feeds2lrf.exe --username=<user> --password=<pass> vreme.py
Any ideas?
kiklop74 is offline   Reply With Quote
Old 10-28-2008, 03:02 PM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,779
Karma: 4998511
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
click on the hourglass then double click on the job to see the log and find out what its doing
kovidgoyal is offline   Reply With Quote
Old 10-28-2008, 06:47 PM   #9
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 780
Karma: 194642
Join Date: Dec 2007
Location: Argentina
Device: Kindle PaperWhite, Motorola Xoom
weird

After restarting instance of calibre gui it started working but alas problems persist...

Aparently for some reason logon information is not retained in browser instance and therefore when I want to access the links in protected part instead I get the logon page....

Using firebug I identified correctly form elements that should be submitted however just to be sure can I post the elements manually knowing the name of the POST items?

Is there some example for this?
kiklop74 is offline   Reply With Quote
Old 10-28-2008, 06:58 PM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,779
Karma: 4998511
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
As far as I know login information is automatically retained by the browser instance. See for example the nytimes and wsj recipes.
kovidgoyal is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
What are you currently writing? Dr. Drib Writers' Corner 950 09-13-2012 12:12 AM
Writing with an accent, how do you do it? basilsands Writers' Corner 19 10-03-2010 05:32 AM
Problems with economist recipe lady kay Calibre 1 08-06-2010 07:49 AM
Problems with Economist recipe 0.5.1 MTBSJC Calibre 7 03-23-2009 01:54 PM
Help with writing recipe kiklop74 Calibre 2 12-05-2008 01:54 PM


All times are GMT -4. The time now is 08:13 AM.


MobileRead.com is a privately owned, operated and funded community.