10-27-2008, 09:30 PM | #1 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Problems writing recipe
I'm trying to write recipe for one weekly magazine on-line. The frontpage is the one with the links embedded into span tags with specific class. The code works - sort of.
Even though page has 10-13 links the loop I created retrieves 2 and than stops. Can anybody help me with this please? Code:
#!/usr/bin/env python __license__ = 'GPL v3' __copyright__ = '2008, Darko Miletic <darko.miletic at gmail.com>' ''' vreme.com ''' import string from calibre import strftime from calibre.web.feeds.recipes import BasicNewsRecipe class Vreme(BasicNewsRecipe): title = 'Vreme' __author__ = 'Darko Miletic' description = 'Politicki Nedeljnik Srbije' timefmt = ' [%a, %d %b, %Y]' no_stylesheets = True simultaneous_downloads = 1 delay = 1 INDEX = 'http://www.vreme.com' def parse_index(self): articles = [] soup = self.index_to_soup(self.INDEX) for item in soup.findAll('span', attrs={'class':'toc2'}): #print item feed_link = item.find('a') if feed_link and feed_link.has_key('href'): url = self.INDEX+feed_link['href']+'&print=yes' title = self.tag_to_string(feed_link) date = strftime('%a, %d %b') description = '' articles.append({ 'title':title, 'date':date, 'url':url, 'description':description }) return [('Latest edition', articles)] |
10-27-2008, 10:49 PM | #2 |
creator of calibre
Posts: 43,744
Karma: 22446736
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
works for me if i add the line
# -*- coding: utf-8 -*- to the top |
Advert | |
|
10-28-2008, 06:02 AM | #3 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
When I added line you suggested the script crashes when retrieving second link:
Code:
INFO: Downloading DEBUG: Fetching http://www.vreme.com/cms/view.php?id=726606&print=yes Traceback (most recent call last): File "main.py", line 151, in <module> File "main.py", line 146, in main File "main.py", line 134, in run_recipe File "calibre\web\feeds\news.pyo", line 472, in download File "calibre\web\feeds\news.pyo", line 639, in build_index File "calibre\threadpool.pyo", line 219, in poll File "calibre\web\feeds\news.pyo", line 743, in article_downloaded TypeError: not all arguments converted during string formatting |
10-28-2008, 06:09 AM | #4 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
I noticed that I have older version of calibre so I downloaded and installed latest version and as a result it does not crash with your line any more but it still downloads only two pages ignoring the other links. Is there anything else I can overload or trace to see what is going on?
|
10-28-2008, 09:32 AM | #5 |
creator of calibre
Posts: 43,744
Karma: 22446736
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Are you running it with the --test option? That will cause only the first two articles to download. Run with the --debug option to see exactly what it is doing.
|
Advert | |
|
10-28-2008, 11:05 AM | #6 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Oh boy, do I feel silly
Yes that was it. I was not aware that --test downloads only two pages Thanks man. Calibre is really great tool! |
10-28-2008, 01:31 PM | #7 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
There is another problem I'm experiencing now.
I added support for logon to protected part of that site that gives access to all articles in the magazine. This is how script looks now: Code:
#!/usr/bin/env python __license__ = 'GPL v3' __copyright__ = '2008, Darko Miletic <darko.miletic at gmail.com>' ''' vreme.com ''' import string from calibre import strftime from calibre.web.feeds.recipes import BasicNewsRecipe class Vreme(BasicNewsRecipe): title = 'Vreme' __author__ = 'Darko Miletic' description = 'Politicki Nedeljnik Srbije' timefmt = ' [%a, %d %b, %Y]' no_stylesheets = True simultaneous_downloads = 1 delay = 1 needs_subscription = True INDEX = 'http://www.vreme.com' LOGIN = 'http://www.vreme.com/account/index.php' def get_browser(self): br = BasicNewsRecipe.get_browser() if self.username is not None and self.password is not None: br.open(self.LOGIN) br.select_form(name='f') br['username'] = self.username br['password'] = self.password br.submit() return br def parse_index(self): articles = [] soup = self.index_to_soup(self.INDEX) for item in soup.findAll('span', attrs={'class':'toc2'}): feed_link = item.find('a') if feed_link and feed_link.has_key('href'): url = self.INDEX+feed_link['href']+'&print=yes' title = self.tag_to_string(feed_link) date = strftime('%a, %d %b') description = '' articles.append({ 'title':title, 'date':date, 'url':url, 'description':description }) return [('Latest edition', articles)] This is command line I used: Code:
feeds2lrf.exe --username=<user> --password=<pass> vreme.py |
10-28-2008, 03:02 PM | #8 |
creator of calibre
Posts: 43,744
Karma: 22446736
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
click on the hourglass then double click on the job to see the log and find out what its doing
|
10-28-2008, 06:47 PM | #9 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
weird
After restarting instance of calibre gui it started working but alas problems persist...
Aparently for some reason logon information is not retained in browser instance and therefore when I want to access the links in protected part instead I get the logon page.... Using firebug I identified correctly form elements that should be submitted however just to be sure can I post the elements manually knowing the name of the POST items? Is there some example for this? |
10-28-2008, 06:58 PM | #10 |
creator of calibre
Posts: 43,744
Karma: 22446736
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
As far as I know login information is automatically retained by the browser instance. See for example the nytimes and wsj recipes.
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
What are you currently writing? | Dr. Drib | Writers' Corner | 948 | 09-13-2012 12:12 AM |
Writing with an accent, how do you do it? | basilsands | Writers' Corner | 19 | 10-03-2010 05:32 AM |
Problems with economist recipe | lady kay | Calibre | 1 | 08-06-2010 07:49 AM |
Problems with Economist recipe 0.5.1 | MTBSJC | Calibre | 7 | 03-23-2009 01:54 PM |
Help with writing recipe | kiklop74 | Calibre | 2 | 12-05-2008 01:54 PM |