03-21-2008, 11:28 PM | #241 |
Groupie
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
|
Kovid,
Thanks for the fixed recipe for USAToday. Looks much better to these tired eyes. Also thanks for the tip about cron. I did not realize such a utility was available on the Mac. Maybe its time to take a look under the hood. Searching the web I found a GUI for cron called croniX_3.0.2. When you run it gives the ability to create a custom crontab file. When I run the following command from the bash terminal: feeds2lrf --output=/users/billc/desktop/news.lrf desktop/books/nwa2.py I produce an output file called news.lrf on my desktop. I then deleted the file and put the same command into cronniX and used the 'Run Now' commmand (under the 'Task' drop down menu) all I got was: Running command feeds2lrf --output=/users/billc/desktop/news.lrf desktop/books/nwa2.py The output will appear below when the command has finished executing Fetching feeds... then the program goes off into lala land and produces no output. Clearly there is something wrong! Is there one of those cryptic commands like sh that should precede the main command? Or What? |
03-22-2008, 01:07 AM | #242 |
creator of calibre
Posts: 43,869
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
Advert | |
|
03-22-2008, 01:08 AM | #243 | |
creator of calibre
Posts: 43,869
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Quote:
|
|
03-22-2008, 01:55 AM | #244 |
Addict
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
|
|
03-22-2008, 05:19 AM | #245 |
creator of calibre
Posts: 43,869
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Move the import statements to just above where the imported modules are used. A proper fix will be in the next release. Why aren't you using the built-in Wall Street Journal?
|
Advert | |
|
03-22-2008, 12:03 PM | #246 |
Addict
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
|
Thanks Kovid.
It helped, now it runs. But it didn't get any articles (jumped from "0% Starting download" to "100% Feeds downloaded"). I'll try to fix it myself. Built-in WSJ is good but it doesn't have many articles from the paper edition. This one was getting all articles from paper. David |
03-22-2008, 12:11 PM | #247 |
creator of calibre
Posts: 43,869
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You can still run it using web2lrf instead of feeds2lrf
|
03-22-2008, 02:26 PM | #248 |
Groupie
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
|
Boy you talk about being invincibly ignorant. I knew enough to use the absolute path to the saved file but it never occurred to me that you should use the absolute path to the recipe file. All of which is to say it works! Thanks.
Do you have any idea what the publication date is for the current edition of the Atlantic Monthly? I would like to set up a command in Crontab to capture it each month. |
03-22-2008, 04:01 PM | #249 |
Groupie
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
|
Kovid,
I downloaded the Atlantic Monthly recipe from your website with the intention of modifing it to capture the daily feed from them. I modified the recipe by as follows: Code:
#!/usr/bin/env python ## Copyright (C) 2008 Kovid Goyal kovid@kovidgoyal.net ## This program is free software; you can redistribute it and/or modify ## it under the terms of the GNU General Public License as published by ## the Free Software Foundation; either version 2 of the License, or ## (at your option) any later version. ## ## This program is distributed in the hope that it will be useful, ## but WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License along ## with this program; if not, write to the Free Software Foundation, Inc., ## 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. ''' thecurrent.theatlantic.com ''' from libprs500.web.feeds.news import BasicNewsRecipe from libprs500.ebooks.BeautifulSoup import BeautifulSoup class TheAtlantic(BasicNewsRecipe): title = 'THeCrrent.The Atlantic' INDEX = 'http://thecurrent.theatlantic.com/' remove_tags_before = dict(name='div', id='storytop') remove_tags = [dict(name='div', id='seealso')] extra_css = '#bodytext {line-height: 1}' def parse_index(self): articles = [] src = self.browser.open(self.INDEX).read() soup = BeautifulSoup(src, convertEntities=BeautifulSoup.HTML_ENTITIES) issue = soup.find('span', attrs={'class':'issue'}) if issue: self.timefmt = ' [%s]'%self.tag_to_string(issue).rpartition('|')[-1].strip().replace('/', '-') for item in soup.findAll('div', attrs={'class':'item'}): a = item.find('a') if a and a.has_key('href'): url = a['href'] url = 'http://www.theatlantic.com/'+url.replace('/doc', 'doc/print') title = self.tag_to_string(a) byline = item.find(attrs={'class':'byline'}) date = self.tag_to_string(byline) if byline else '' description = '' articles.append({ 'title':title, 'date':date, 'url':url, 'description':description }) return {'Daily Issue' : articles } Macintosh-3:books billc$ feeds2lrf atlantic-1.py Fetching feeds... 0% [----------------------------------------------------------------------] Fetching feeds... Traceback (most recent call last): File "/Users/billc/Downloads/libprs500-1.app/Contents/Resources/feeds2lrf.py", line 9, in <module> main() File "libprs500/ebooks/lrf/feeds/convert_from.pyo", line 52, in main File "libprs500/web/feeds/main.pyo", line 141, in run_recipe File "libprs500/web/feeds/news.pyo", line 411, in download File "libprs500/web/feeds/news.pyo", line 514, in build_index File "<string>", line 37, in parse_index NameError: global name 'BeautifulSoup' is not defined Macintosh-3:books billc$ But it seems to me that 'BeautifulSoup' is defined in line 22 e.g. Code:
rom libprs500.ebooks.BeautifulSoup import BeautifulSoup I Went back and ran the unmodified recipe in terminal mode and got the same result. Last edited by Deputy-Dawg; 03-22-2008 at 04:56 PM. Reason: added info |
03-22-2008, 05:42 PM | #250 | |
creator of calibre
Posts: 43,869
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Quote:
|
|
03-22-2008, 05:43 PM | #251 |
creator of calibre
Posts: 43,869
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
There's a bug that causes problems with custom recipes. Just copy the import statement to the line just above where it is used and you should be fine.
|
03-22-2008, 06:58 PM | #252 |
Groupie
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
|
Kovid,
I modified the code as follows: Code:
#!/usr/bin/env python ## Copyright (C) 2008 Kovid Goyal kovid@kovidgoyal.net ## This program is free software; you can redistribute it and/or modify ## it under the terms of the GNU General Public License as published by ## the Free Software Foundation; either version 2 of the License, or ## (at your option) any later version. ## ## This program is distributed in the hope that it will be useful, ## but WITHOUT ANY WARRANTY; without even the implied warranty of ## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ## GNU General Public License for more details. ## ## You should have received a copy of the GNU General Public License along ## with this program; if not, write to the Free Software Foundation, Inc., ## 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. ''' theatlantic.com ''' import re from libprs500.web.feeds.news import BasicNewsRecipe class TheAtlantic(BasicNewsRecipe): title = 'The Atlantic' INDEX = 'http://www.theatlantic.com/doc/current' remove_tags_before = dict(name='div', id='storytop') remove_tags = [dict(name='div', id='seealso')] extra_css = '#bodytext {line-height: 1}' def parse_index(self): articles = [] src = self.browser.open(self.INDEX).read() from libprs500.ebooks.BeautifulSoup import BeautifulSoup soup = BeautifulSoup(src, convertEntities=BeautifulSoup.HTML_ENTITIES) issue = soup.find('span', attrs={'class':'issue'}) if issue: self.timefmt = ' [%s]'%self.tag_to_string(issue).rpartition('|')[-1].strip().replace('/', '-') for item in soup.findAll('div', attrs={'class':'item'}): a = item.find('a') if a and a.has_key('href'): url = a['href'] url = 'http://www.theatlantic.com/'+url.replace('/doc', 'doc/print') title = self.tag_to_string(a) byline = item.find(attrs={'class':'byline'}) date = self.tag_to_string(byline) if byline else '' description = '' articles.append({ 'title':title, 'date':date, 'url':url, 'description':description }) return {'Current Issue' : articles } Macintosh-3:books billc$ feeds2lrf atlantic-2.py Fetching feeds... 0% [----------------------------------------------------------------------] Fetching feeds... Traceback (most recent call last): File "/Users/billc/Downloads/libprs500.app/Contents/Resources/feeds2lrf.py", line 9, in <module> main() File "libprs500/ebooks/lrf/feeds/convert_from.pyo", line 52, in main File "libprs500/web/feeds/main.pyo", line 141, in run_recipe File "libprs500/web/feeds/news.pyo", line 411, in download File "libprs500/web/feeds/news.pyo", line 515, in build_index File "libprs500/web/feeds/__init__.pyo", line 193, in feeds_from_index ValueError: too many values to unpack Macintosh-3:books billc$ |
03-22-2008, 07:17 PM | #253 |
creator of calibre
Posts: 43,869
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The return statement should be
Code:
return [('Current Issue', articles)] Last edited by kovidgoyal; 03-22-2008 at 07:24 PM. |
03-25-2008, 06:34 PM | #254 |
Addict
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
|
feeds2disk
Kovid, I tried to use "feeds2disk" for Newsweek (built-in profile gets very few articles from the latest issue) and got an error message:
C:\Misc\News\Newsweek>feeds2disk --feeds="['http://feeds.newsweek.com/newsweek/NationalNews','http://feeds.newsweek.com/headlines/business','http://feeds.newswe ek.com/newsweek/WorldNews']" Fetching feeds... Traceback (most recent call last): File "main.py", line 158, in <module> File "main.py", line 153, in main File "main.py", line 134, in run_recipe UnboundLocalError: local variable 'is_profile' referenced before assignment feeds2disk works fine with built-in profiles, but I always got this error when specifying the feed address. David |
03-25-2008, 06:47 PM | #255 |
creator of calibre
Posts: 43,869
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Will be fixed in the next release.
|
Tags |
libprs500, web2lrf |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
web2lrf to capture blog archive? | Deputy-Dawg | Sony Reader Dev Corner | 1 | 02-14-2008 11:41 PM |
web2lrf: La Repubblica | alexxxm | Sony Reader | 1 | 11-13-2007 12:27 PM |