web2lrf - Page 17

Deputy-Dawg · 03-21-2008, 11:28 PM

Kovid,
Thanks for the fixed recipe for USAToday. Looks much better to these tired eyes. Also thanks for the tip about cron. I did not realize such a utility was available on the Mac. Maybe its time to take a look under the hood.

Searching the web I found a GUI for cron called croniX_3.0.2. When you run it gives the ability to create a custom crontab file.

When I run the following command from the bash terminal:

feeds2lrf --output=/users/billc/desktop/news.lrf desktop/books/nwa2.py

I produce an output file called news.lrf on my desktop. I then deleted the file and put the same command into cronniX and used the 'Run Now' commmand (under the 'Task' drop down menu) all I got was:

Running command
feeds2lrf --output=/users/billc/desktop/news.lrf desktop/books/nwa2.py
The output will appear below when the command has finished executing
Fetching feeds...

then the program goes off into lala land and produces no output. Clearly there is something wrong! Is there one of those cryptic commands like sh that should precede the main command? Or What?

kovidgoyal · 03-22-2008, 01:07 AM

Quote:

Originally Posted by ddavtian

"import time" is already there and it works with web2lrf.

Attach it here.

kovidgoyal · 03-22-2008, 01:08 AM

Quote:

Originally Posted by Deputy-Dawg

Kovid,
Thanks for the fixed recipe for USAToday. Looks much better to these tired eyes. Also thanks for the tip about cron. I did not realize such a utility was available on the Mac. Maybe its time to take a look under the hood.

Searching the web I found a GUI for cron called croniX_3.0.2. When you run it gives the ability to create a custom crontab file.

When I run the following command from the bash terminal:

feeds2lrf --output=/users/billc/desktop/news.lrf desktop/books/nwa2.py

I produce an output file called news.lrf on my desktop. I then deleted the file and put the same command into cronniX and used the 'Run Now' commmand (under the 'Task' drop down menu) all I got was:

Running command
feeds2lrf --output=/users/billc/desktop/news.lrf desktop/books/nwa2.py
The output will appear below when the command has finished executing
Fetching feeds...

then the program goes off into lala land and produces no output. Clearly there is something wrong! Is there one of those cryptic commands like sh that should precede the main command? Or What?

Use an absolute path to nwa2.py

ddavtian · 03-22-2008, 01:55 AM

Quote:

Originally Posted by kovidgoyal

Attach it here.

Attached, as a txt file.

Thanks in advance.

kovidgoyal · 03-22-2008, 05:19 AM

Move the import statements to just above where the imported modules are used. A proper fix will be in the next release. Why aren't you using the built-in Wall Street Journal?

ddavtian · 03-22-2008, 12:03 PM

Thanks Kovid.

It helped, now it runs. But it didn't get any articles (jumped from "0% Starting download" to "100% Feeds downloaded"). I'll try to fix it myself.

Built-in WSJ is good but it doesn't have many articles from the paper edition. This one was getting all articles from paper.

David

kovidgoyal · 03-22-2008, 12:11 PM

You can still run it using web2lrf instead of feeds2lrf

Deputy-Dawg · 03-22-2008, 02:26 PM

Boy you talk about being invincibly ignorant. I knew enough to use the absolute path to the saved file but it never occurred to me that you should use the absolute path to the recipe file. All of which is to say it works! Thanks.

Do you have any idea what the publication date is for the current edition of the Atlantic Monthly? I would like to set up a command in Crontab to capture it each month.

Deputy-Dawg · 03-22-2008, 04:01 PM

Kovid,
I downloaded the Atlantic Monthly recipe from your website with the intention of modifing it to capture the daily feed from them. I modified the recipe by as follows:

Code:

#!/usr/bin/env  python

##    Copyright (C) 2008 Kovid Goyal kovid@kovidgoyal.net
##    This program is free software; you can redistribute it and/or modify
##    it under the terms of the GNU General Public License as published by
##    the Free Software Foundation; either version 2 of the License, or
##    (at your option) any later version.
##
##    This program is distributed in the hope that it will be useful,
##    but WITHOUT ANY WARRANTY; without even the implied warranty of
##    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
##    GNU General Public License for more details.
##
##    You should have received a copy of the GNU General Public License along
##    with this program; if not, write to the Free Software Foundation, Inc.,
##    51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
'''
thecurrent.theatlantic.com
'''

from libprs500.web.feeds.news import BasicNewsRecipe
from libprs500.ebooks.BeautifulSoup import BeautifulSoup

class TheAtlantic(BasicNewsRecipe):
    
    title = 'THeCrrent.The Atlantic'
    INDEX = 'http://thecurrent.theatlantic.com/'
    
    remove_tags_before = dict(name='div', id='storytop')
    remove_tags        = [dict(name='div', id='seealso')]
    extra_css          = '#bodytext {line-height: 1}'
    
    def parse_index(self):
        articles = []
        
        src = self.browser.open(self.INDEX).read()
        soup = BeautifulSoup(src, convertEntities=BeautifulSoup.HTML_ENTITIES)
        
        issue = soup.find('span', attrs={'class':'issue'})
        if issue:
            self.timefmt = ' [%s]'%self.tag_to_string(issue).rpartition('|')[-1].strip().replace('/', '-')
        
        for item in soup.findAll('div', attrs={'class':'item'}):
            a = item.find('a')
            if a and a.has_key('href'):
                url = a['href']
                url = 'http://www.theatlantic.com/'+url.replace('/doc', 'doc/print')
                title = self.tag_to_string(a)
                byline = item.find(attrs={'class':'byline'})
                date = self.tag_to_string(byline) if byline else ''
                description = ''
                articles.append({
                                 'title':title,
                                 'date':date,
                                 'url':url,
                                 'description':description
                                })
                
        
        return {'Daily Issue' : articles }

When I run it I get:

Macintosh-3:books billc$ feeds2lrf atlantic-1.py
Fetching feeds...
0% [----------------------------------------------------------------------]
Fetching feeds... Traceback (most recent call last):
File "/Users/billc/Downloads/libprs500-1.app/Contents/Resources/feeds2lrf.py", line 9, in <module>
main()
File "libprs500/ebooks/lrf/feeds/convert_from.pyo", line 52, in main
File "libprs500/web/feeds/main.pyo", line 141, in run_recipe
File "libprs500/web/feeds/news.pyo", line 411, in download
File "libprs500/web/feeds/news.pyo", line 514, in build_index
File "<string>", line 37, in parse_index
NameError: global name 'BeautifulSoup' is not defined
Macintosh-3:books billc$

But it seems to me that 'BeautifulSoup' is defined in line 22 e.g.

Code:

rom libprs500.ebooks.BeautifulSoup import BeautifulSoup

What have I done wrong?

I Went back and ran the unmodified recipe in terminal mode and got the same result.

kovidgoyal · 03-22-2008, 05:42 PM

Quote:

Originally Posted by Deputy-Dawg

Boy you talk about being invincibly ignorant. I knew enough to use the absolute path to the saved file but it never occurred to me that you should use the absolute path to the recipe file. All of which is to say it works! Thanks.

Do you have any idea what the publication date is for the current edition of the Atlantic Monthly? I would like to set up a command in Crontab to capture it each month.

Use the pseudo target @monthly in cron and it will be downloaded at 30-day intervals.

kovidgoyal · 03-22-2008, 05:43 PM

There's a bug that causes problems with custom recipes. Just copy the import statement to the line just above where it is used and you should be fine.

Deputy-Dawg · 03-22-2008, 06:58 PM

Kovid,
I modified the code as follows:

Code:

#!/usr/bin/env  python

##    Copyright (C) 2008 Kovid Goyal kovid@kovidgoyal.net
##    This program is free software; you can redistribute it and/or modify
##    it under the terms of the GNU General Public License as published by
##    the Free Software Foundation; either version 2 of the License, or
##    (at your option) any later version.
##
##    This program is distributed in the hope that it will be useful,
##    but WITHOUT ANY WARRANTY; without even the implied warranty of
##    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
##    GNU General Public License for more details.
##
##    You should have received a copy of the GNU General Public License along
##    with this program; if not, write to the Free Software Foundation, Inc.,
##    51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
'''
theatlantic.com
'''
import re
from libprs500.web.feeds.news import BasicNewsRecipe

class TheAtlantic(BasicNewsRecipe):
    
    title = 'The Atlantic'
    INDEX = 'http://www.theatlantic.com/doc/current'
    
    remove_tags_before = dict(name='div', id='storytop')
    remove_tags        = [dict(name='div', id='seealso')]
    extra_css          = '#bodytext {line-height: 1}'
    
    def parse_index(self):
        articles = []
        
        src = self.browser.open(self.INDEX).read()
        from libprs500.ebooks.BeautifulSoup import BeautifulSoup
        soup = BeautifulSoup(src, convertEntities=BeautifulSoup.HTML_ENTITIES)

        issue = soup.find('span', attrs={'class':'issue'})
        if issue:
            self.timefmt = ' [%s]'%self.tag_to_string(issue).rpartition('|')[-1].strip().replace('/', '-')
        
        for item in soup.findAll('div', attrs={'class':'item'}):
            a = item.find('a')
            if a and a.has_key('href'):
                url = a['href']
                url = 'http://www.theatlantic.com/'+url.replace('/doc', 'doc/print')
                title = self.tag_to_string(a)
                byline = item.find(attrs={'class':'byline'})
                date = self.tag_to_string(byline) if byline else ''
                description = ''
                articles.append({
                                 'title':title,
                                 'date':date,
                                 'url':url,
                                 'description':description
                                })
                
        
        return {'Current Issue' : articles }

and now I get:

Macintosh-3:books billc$ feeds2lrf atlantic-2.py
Fetching feeds...
0% [----------------------------------------------------------------------]
Fetching feeds... Traceback (most recent call last):
File "/Users/billc/Downloads/libprs500.app/Contents/Resources/feeds2lrf.py", line 9, in <module>
main()
File "libprs500/ebooks/lrf/feeds/convert_from.pyo", line 52, in main
File "libprs500/web/feeds/main.pyo", line 141, in run_recipe
File "libprs500/web/feeds/news.pyo", line 411, in download
File "libprs500/web/feeds/news.pyo", line 515, in build_index
File "libprs500/web/feeds/__init__.pyo", line 193, in feeds_from_index
ValueError: too many values to unpack
Macintosh-3:books billc$

kovidgoyal · 03-22-2008, 07:17 PM

The return statement should be

Code:

return [('Current Issue', articles)]

You should probably look at the latest atlantic profile in svn. As there we some changes.

ddavtian · 03-25-2008, 06:34 PM

Kovid, I tried to use "feeds2disk" for Newsweek (built-in profile gets very few articles from the latest issue) and got an error message:

C:\Misc\News\Newsweek>feeds2disk --feeds="['http://feeds.newsweek.com/newsweek/NationalNews','http://feeds.newsweek.com/headlines/business','http://feeds.newswe
ek.com/newsweek/WorldNews']"
Fetching feeds...
Traceback (most recent call last):
File "main.py", line 158, in <module>
File "main.py", line 153, in main
File "main.py", line 134, in run_recipe
UnboundLocalError: local variable 'is_profile' referenced before assignment

feeds2disk works fine with built-in profiles, but I always got this error when specifying the feed address.

David

kovidgoyal · 03-25-2008, 06:47 PM

Will be fixed in the next release.

03-22-2008, 07:17 PM	#253
kovidgoyal creator of calibre Posts: 43,869 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	The return statement should be Code: return [('Current Issue', articles)] You should probably look at the latest atlantic profile in svn. As there we some changes. Last edited by kovidgoyal; 03-22-2008 at 07:24 PM.

03-25-2008, 06:34 PM	#254
ddavtian Addict Posts: 271 Karma: 332 Join Date: Nov 2003 Location: San Francisco, USA Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U	feeds2disk Kovid, I tried to use "feeds2disk" for Newsweek (built-in profile gets very few articles from the latest issue) and got an error message: C:\Misc\News\Newsweek>feeds2disk --feeds="['http://feeds.newsweek.com/newsweek/NationalNews','http://feeds.newsweek.com/headlines/business','http://feeds.newswe ek.com/newsweek/WorldNews']" Fetching feeds... Traceback (most recent call last): File "main.py", line 158, in <module> File "main.py", line 153, in main File "main.py", line 134, in run_recipe UnboundLocalError: local variable 'is_profile' referenced before assignment feeds2disk works fine with built-in profiles, but I always got this error when specifying the feed address. David

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
web2lrf to capture blog archive?	Deputy-Dawg	Sony Reader Dev Corner	1	02-14-2008 11:41 PM
web2lrf: La Repubblica	alexxxm	Sony Reader	1	11-13-2007 12:27 PM

03-21-2008, 11:28 PM	#241
Deputy-Dawg Groupie Posts: 153 Karma: 799 Join Date: Dec 2007 Device: sony prs505	Kovid, Thanks for the fixed recipe for USAToday. Looks much better to these tired eyes. Also thanks for the tip about cron. I did not realize such a utility was available on the Mac. Maybe its time to take a look under the hood. Searching the web I found a GUI for cron called croniX_3.0.2. When you run it gives the ability to create a custom crontab file. When I run the following command from the bash terminal: feeds2lrf --output=/users/billc/desktop/news.lrf desktop/books/nwa2.py I produce an output file called news.lrf on my desktop. I then deleted the file and put the same command into cronniX and used the 'Run Now' commmand (under the 'Task' drop down menu) all I got was: Running command feeds2lrf --output=/users/billc/desktop/news.lrf desktop/books/nwa2.py The output will appear below when the command has finished executing Fetching feeds... then the program goes off into lala land and produces no output. Clearly there is something wrong! Is there one of those cryptic commands like sh that should precede the main command? Or What?

03-22-2008, 05:19 AM	#245
kovidgoyal creator of calibre Posts: 43,869 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Move the import statements to just above where the imported modules are used. A proper fix will be in the next release. Why aren't you using the built-in Wall Street Journal?

03-22-2008, 12:03 PM	#246
ddavtian Addict Posts: 271 Karma: 332 Join Date: Nov 2003 Location: San Francisco, USA Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U	Thanks Kovid. It helped, now it runs. But it didn't get any articles (jumped from "0% Starting download" to "100% Feeds downloaded"). I'll try to fix it myself. Built-in WSJ is good but it doesn't have many articles from the paper edition. This one was getting all articles from paper. David

03-22-2008, 12:11 PM	#247
kovidgoyal creator of calibre Posts: 43,869 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	You can still run it using web2lrf instead of feeds2lrf

03-22-2008, 02:26 PM	#248
Deputy-Dawg Groupie Posts: 153 Karma: 799 Join Date: Dec 2007 Device: sony prs505	Boy you talk about being invincibly ignorant. I knew enough to use the absolute path to the saved file but it never occurred to me that you should use the absolute path to the recipe file. All of which is to say it works! Thanks. Do you have any idea what the publication date is for the current edition of the Atlantic Monthly? I would like to set up a command in Crontab to capture it each month.

03-22-2008, 05:43 PM	#251
kovidgoyal creator of calibre Posts: 43,869 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	There's a bug that causes problems with custom recipes. Just copy the import statement to the line just above where it is used and you should be fine.

03-25-2008, 06:47 PM	#255
kovidgoyal creator of calibre Posts: 43,869 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Will be fixed in the next release.

Advert

Advert