Problems writing recipe

kiklop74 · 10-27-2008, 10:30 PM

I'm trying to write recipe for one weekly magazine on-line. The frontpage is the one with the links embedded into span tags with specific class. The code works - sort of.

Even though page has 10-13 links the loop I created retrieves 2 and than stops. Can anybody help me with this please?

Code:

#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = '2008, Darko Miletic <darko.miletic at gmail.com>'
'''
vreme.com
'''

import string
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe

class Vreme(BasicNewsRecipe):
    
    title       = 'Vreme'
    __author__  = 'Darko Miletic'
    description = 'Politicki Nedeljnik Srbije'
    timefmt = ' [%a, %d %b, %Y]'
    no_stylesheets = True
    simultaneous_downloads = 1
    delay = 1
    INDEX = 'http://www.vreme.com'

    
    def parse_index(self):
        articles = []
        soup = self.index_to_soup(self.INDEX)
        
        for item in soup.findAll('span', attrs={'class':'toc2'}):
            #print item
            feed_link = item.find('a')
            if feed_link and feed_link.has_key('href'):
                url = self.INDEX+feed_link['href']+'&print=yes'
                title = self.tag_to_string(feed_link)
                date = strftime('%a, %d %b')
                description = ''
                articles.append({
                                 'title':title,
                                 'date':date,
                                 'url':url,
                                 'description':description
                                })
        return [('Latest edition', articles)]

kovidgoyal · 10-27-2008, 11:49 PM

works for me if i add the line
# -*- coding: utf-8 -*-

to the top

kiklop74 · 10-28-2008, 07:02 AM

When I added line you suggested the script crashes when retrieving second link:

Code:

INFO: Downloading

DEBUG: Fetching http://www.vreme.com/cms/view.php?id=726606&print=yes

Traceback (most recent call last):
  File "main.py", line 151, in <module>
  File "main.py", line 146, in main
  File "main.py", line 134, in run_recipe
  File "calibre\web\feeds\news.pyo", line 472, in download
  File "calibre\web\feeds\news.pyo", line 639, in build_index
  File "calibre\threadpool.pyo", line 219, in poll
  File "calibre\web\feeds\news.pyo", line 743, in article_downloaded
TypeError: not all arguments converted during string formatting

kiklop74 · 10-28-2008, 07:09 AM

I noticed that I have older version of calibre so I downloaded and installed latest version and as a result it does not crash with your line any more but it still downloads only two pages ignoring the other links. Is there anything else I can overload or trace to see what is going on?

kovidgoyal · 10-28-2008, 10:32 AM

Are you running it with the --test option? That will cause only the first two articles to download. Run with the --debug option to see exactly what it is doing.

kiklop74 · 10-28-2008, 12:05 PM

Oh boy, do I feel silly

Yes that was it. I was not aware that --test downloads only two pages

Thanks man. Calibre is really great tool!

kiklop74 · 10-28-2008, 02:31 PM

There is another problem I'm experiencing now.

I added support for logon to protected part of that site that gives access to all articles in the magazine.

This is how script looks now:

Code:

#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = '2008, Darko Miletic <darko.miletic at gmail.com>'
'''
vreme.com
'''

import string
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe

class Vreme(BasicNewsRecipe):
    
    title       = 'Vreme'
    __author__  = 'Darko Miletic'
    description = 'Politicki Nedeljnik Srbije'
    timefmt = ' [%a, %d %b, %Y]'
    no_stylesheets = True
    simultaneous_downloads = 1
    delay = 1
    needs_subscription = True
    INDEX = 'http://www.vreme.com'
    LOGIN = 'http://www.vreme.com/account/index.php'

    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        if self.username is not None and self.password is not None:
            br.open(self.LOGIN)
            br.select_form(name='f')
            br['username'] = self.username
            br['password'] = self.password
            br.submit()
        return br
    
    def parse_index(self):
        articles = []
        soup = self.index_to_soup(self.INDEX)
        
        for item in soup.findAll('span', attrs={'class':'toc2'}):
            feed_link = item.find('a')
            if feed_link and feed_link.has_key('href'):
                url = self.INDEX+feed_link['href']+'&print=yes'
                title = self.tag_to_string(feed_link)
                date = strftime('%a, %d %b')
                description = ''
                articles.append({
                                 'title':title,
                                 'date':date,
                                 'url':url,
                                 'description':description
                                })
        return [('Latest edition', articles)]

If I execute this from command line it downloads everything fine but if I execute the same thing from calibre GUI the job just hangs doing nothing.

This is command line I used:

Code:

feeds2lrf.exe --username=<user> --password=<pass> vreme.py

Any ideas?

kovidgoyal · 10-28-2008, 04:02 PM

click on the hourglass then double click on the job to see the log and find out what its doing

kiklop74 · 10-28-2008, 07:47 PM

After restarting instance of calibre gui it started working but alas problems persist...

Aparently for some reason logon information is not retained in browser instance and therefore when I want to access the links in protected part instead I get the logon page....

Using firebug I identified correctly form elements that should be submitted however just to be sure can I post the elements manually knowing the name of the POST items?

Is there some example for this?

kovidgoyal · 10-28-2008, 07:58 PM

As far as I know login information is automatically retained by the browser instance. See for example the nytimes and wsj recipes.

10-28-2008, 07:47 PM	#9
kiklop74 Guru Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage	weird After restarting instance of calibre gui it started working but alas problems persist... Aparently for some reason logon information is not retained in browser instance and therefore when I want to access the links in protected part instead I get the logon page.... Using firebug I identified correctly form elements that should be submitted however just to be sure can I post the elements manually knowing the name of the POST items? Is there some example for this?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
What are you currently writing?	Dr. Drib	Writers' Corner	948	09-13-2012 01:12 AM
Writing with an accent, how do you do it?	basilsands	Writers' Corner	19	10-03-2010 06:32 AM
Problems with economist recipe	lady kay	Calibre	1	08-06-2010 08:49 AM
Problems with Economist recipe 0.5.1	MTBSJC	Calibre	7	03-23-2009 02:54 PM
Help with writing recipe	kiklop74	Calibre	2	12-05-2008 02:54 PM

10-27-2008, 11:49 PM	#2
kovidgoyal creator of calibre Posts: 45,982 Karma: 29579516 Join Date: Oct 2006 Location: Mumbai, India Device: Various	works for me if i add the line # -- coding: utf-8 -- to the top

10-28-2008, 07:09 AM	#4
kiklop74 Guru Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage	I noticed that I have older version of calibre so I downloaded and installed latest version and as a result it does not crash with your line any more but it still downloads only two pages ignoring the other links. Is there anything else I can overload or trace to see what is going on?

10-28-2008, 10:32 AM	#5
kovidgoyal creator of calibre Posts: 45,982 Karma: 29579516 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Are you running it with the --test option? That will cause only the first two articles to download. Run with the --debug option to see exactly what it is doing.

10-28-2008, 12:05 PM	#6
kiklop74 Guru Posts: 800 Karma: 194644 Join Date: Dec 2007 Location: Argentina Device: Kindle Voyage	Oh boy, do I feel silly Yes that was it. I was not aware that --test downloads only two pages Thanks man. Calibre is really great tool!

10-28-2008, 04:02 PM	#8
kovidgoyal creator of calibre Posts: 45,982 Karma: 29579516 Join Date: Oct 2006 Location: Mumbai, India Device: Various	click on the hourglass then double click on the job to see the log and find out what its doing

10-28-2008, 07:58 PM	#10
kovidgoyal creator of calibre Posts: 45,982 Karma: 29579516 Join Date: Oct 2006 Location: Mumbai, India Device: Various	As far as I know login information is automatically retained by the browser instance. See for example the nytimes and wsj recipes.

Advert

Advert