|
|
#1 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Problems writing recipe
I'm trying to write recipe for one weekly magazine on-line. The frontpage is the one with the links embedded into span tags with specific class. The code works - sort of.
Even though page has 10-13 links the loop I created retrieves 2 and than stops. Can anybody help me with this please? Code:
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '2008, Darko Miletic <darko.miletic at gmail.com>'
'''
vreme.com
'''
import string
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
class Vreme(BasicNewsRecipe):
title = 'Vreme'
__author__ = 'Darko Miletic'
description = 'Politicki Nedeljnik Srbije'
timefmt = ' [%a, %d %b, %Y]'
no_stylesheets = True
simultaneous_downloads = 1
delay = 1
INDEX = 'http://www.vreme.com'
def parse_index(self):
articles = []
soup = self.index_to_soup(self.INDEX)
for item in soup.findAll('span', attrs={'class':'toc2'}):
#print item
feed_link = item.find('a')
if feed_link and feed_link.has_key('href'):
url = self.INDEX+feed_link['href']+'&print=yes'
title = self.tag_to_string(feed_link)
date = strftime('%a, %d %b')
description = ''
articles.append({
'title':title,
'date':date,
'url':url,
'description':description
})
return [('Latest edition', articles)]
|
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,626
Karma: 28549046
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
works for me if i add the line
# -*- coding: utf-8 -*- to the top |
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
When I added line you suggested the script crashes when retrieving second link:
Code:
INFO: Downloading DEBUG: Fetching http://www.vreme.com/cms/view.php?id=726606&print=yes Traceback (most recent call last): File "main.py", line 151, in <module> File "main.py", line 146, in main File "main.py", line 134, in run_recipe File "calibre\web\feeds\news.pyo", line 472, in download File "calibre\web\feeds\news.pyo", line 639, in build_index File "calibre\threadpool.pyo", line 219, in poll File "calibre\web\feeds\news.pyo", line 743, in article_downloaded TypeError: not all arguments converted during string formatting |
|
|
|
|
|
#4 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
I noticed that I have older version of calibre so I downloaded and installed latest version and as a result it does not crash with your line any more but it still downloads only two pages ignoring the other links. Is there anything else I can overload or trace to see what is going on?
|
|
|
|
|
|
#5 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,626
Karma: 28549046
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Are you running it with the --test option? That will cause only the first two articles to download. Run with the --debug option to see exactly what it is doing.
|
|
|
|
| Advert | |
|
|
|
|
#6 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Oh boy, do I feel silly
![]() Yes that was it. I was not aware that --test downloads only two pages ![]() Thanks man. Calibre is really great tool! |
|
|
|
|
|
#7 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
There is another problem I'm experiencing now.
I added support for logon to protected part of that site that gives access to all articles in the magazine. This is how script looks now: Code:
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '2008, Darko Miletic <darko.miletic at gmail.com>'
'''
vreme.com
'''
import string
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
class Vreme(BasicNewsRecipe):
title = 'Vreme'
__author__ = 'Darko Miletic'
description = 'Politicki Nedeljnik Srbije'
timefmt = ' [%a, %d %b, %Y]'
no_stylesheets = True
simultaneous_downloads = 1
delay = 1
needs_subscription = True
INDEX = 'http://www.vreme.com'
LOGIN = 'http://www.vreme.com/account/index.php'
def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username is not None and self.password is not None:
br.open(self.LOGIN)
br.select_form(name='f')
br['username'] = self.username
br['password'] = self.password
br.submit()
return br
def parse_index(self):
articles = []
soup = self.index_to_soup(self.INDEX)
for item in soup.findAll('span', attrs={'class':'toc2'}):
feed_link = item.find('a')
if feed_link and feed_link.has_key('href'):
url = self.INDEX+feed_link['href']+'&print=yes'
title = self.tag_to_string(feed_link)
date = strftime('%a, %d %b')
description = ''
articles.append({
'title':title,
'date':date,
'url':url,
'description':description
})
return [('Latest edition', articles)]
This is command line I used: Code:
feeds2lrf.exe --username=<user> --password=<pass> vreme.py |
|
|
|
|
|
#8 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,626
Karma: 28549046
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
click on the hourglass then double click on the job to see the log and find out what its doing
|
|
|
|
|
|
#9 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
weird
After restarting instance of calibre gui it started working but alas problems persist...
Aparently for some reason logon information is not retained in browser instance and therefore when I want to access the links in protected part instead I get the logon page.... Using firebug I identified correctly form elements that should be submitted however just to be sure can I post the elements manually knowing the name of the POST items? Is there some example for this? |
|
|
|
|
|
#10 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,626
Karma: 28549046
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
As far as I know login information is automatically retained by the browser instance. See for example the nytimes and wsj recipes.
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| What are you currently writing? | Dr. Drib | Writers' Corner | 948 | 09-13-2012 01:12 AM |
| Writing with an accent, how do you do it? | basilsands | Writers' Corner | 19 | 10-03-2010 06:32 AM |
| Problems with economist recipe | lady kay | Calibre | 1 | 08-06-2010 08:49 AM |
| Problems with Economist recipe 0.5.1 | MTBSJC | Calibre | 7 | 03-23-2009 02:54 PM |
| Help with writing recipe | kiklop74 | Calibre | 2 | 12-05-2008 02:54 PM |