Thread: web2lrf
View Single Post
Old 03-22-2008, 04:01 PM   #249
Deputy-Dawg
Groupie
Deputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-books
 
Deputy-Dawg's Avatar
 
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
Kovid,
I downloaded the Atlantic Monthly recipe from your website with the intention of modifing it to capture the daily feed from them. I modified the recipe by as follows:

Code:
#!/usr/bin/env  python

##    Copyright (C) 2008 Kovid Goyal kovid@kovidgoyal.net
##    This program is free software; you can redistribute it and/or modify
##    it under the terms of the GNU General Public License as published by
##    the Free Software Foundation; either version 2 of the License, or
##    (at your option) any later version.
##
##    This program is distributed in the hope that it will be useful,
##    but WITHOUT ANY WARRANTY; without even the implied warranty of
##    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
##    GNU General Public License for more details.
##
##    You should have received a copy of the GNU General Public License along
##    with this program; if not, write to the Free Software Foundation, Inc.,
##    51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
'''
thecurrent.theatlantic.com
'''

from libprs500.web.feeds.news import BasicNewsRecipe
from libprs500.ebooks.BeautifulSoup import BeautifulSoup

class TheAtlantic(BasicNewsRecipe):
    
    title = 'THeCrrent.The Atlantic'
    INDEX = 'http://thecurrent.theatlantic.com/'
    
    remove_tags_before = dict(name='div', id='storytop')
    remove_tags        = [dict(name='div', id='seealso')]
    extra_css          = '#bodytext {line-height: 1}'
    
    def parse_index(self):
        articles = []
        
        src = self.browser.open(self.INDEX).read()
        soup = BeautifulSoup(src, convertEntities=BeautifulSoup.HTML_ENTITIES)
        
        issue = soup.find('span', attrs={'class':'issue'})
        if issue:
            self.timefmt = ' [%s]'%self.tag_to_string(issue).rpartition('|')[-1].strip().replace('/', '-')
        
        for item in soup.findAll('div', attrs={'class':'item'}):
            a = item.find('a')
            if a and a.has_key('href'):
                url = a['href']
                url = 'http://www.theatlantic.com/'+url.replace('/doc', 'doc/print')
                title = self.tag_to_string(a)
                byline = item.find(attrs={'class':'byline'})
                date = self.tag_to_string(byline) if byline else ''
                description = ''
                articles.append({
                                 'title':title,
                                 'date':date,
                                 'url':url,
                                 'description':description
                                })
                
        
        return {'Daily Issue' : articles }
When I run it I get:

Macintosh-3:books billc$ feeds2lrf atlantic-1.py
Fetching feeds...
0% [----------------------------------------------------------------------]
Fetching feeds... Traceback (most recent call last):
File "/Users/billc/Downloads/libprs500-1.app/Contents/Resources/feeds2lrf.py", line 9, in <module>
main()
File "libprs500/ebooks/lrf/feeds/convert_from.pyo", line 52, in main
File "libprs500/web/feeds/main.pyo", line 141, in run_recipe
File "libprs500/web/feeds/news.pyo", line 411, in download
File "libprs500/web/feeds/news.pyo", line 514, in build_index
File "<string>", line 37, in parse_index
NameError: global name 'BeautifulSoup' is not defined
Macintosh-3:books billc$


But it seems to me that 'BeautifulSoup' is defined in line 22 e.g.

Code:
rom libprs500.ebooks.BeautifulSoup import BeautifulSoup
What have I done wrong?

I Went back and ran the unmodified recipe in terminal mode and got the same result.

Last edited by Deputy-Dawg; 03-22-2008 at 04:56 PM. Reason: added info
Deputy-Dawg is offline   Reply With Quote