| 
 | |||||||
|  | 
|  | Thread Tools | Search this Thread | 
|  11-05-2010, 01:29 AM | #1 | 
| Member  Posts: 12 Karma: 10 Join Date: Oct 2010 Location: UK Device: Kindle 3 WiFi, Kindle Paperwhite 2013 |  Recipe works when mocked up as Python file, fails when converted to Recipe Code: import urllib2
from BeautifulSoup import BeautifulSoup
from calibre.web.feeds.news import BasicNewsRecipe
class Counterpunch(BasicNewsRecipe):
    '''
    Parses counterpunch.com for articles
    '''  
    def parse_index(self):
		feeds = []
		title, url = 'Counterpunch', 'http://www.counterpunch.com'
		articles = self.parse_page(url)
		if articles:
			feeds.append((title, articles))
		return feeds
			
			
    def parse_page(self, url):
        fd = urllib2.urlopen(url)
        soup = BeautifulSoup(fd, fromEncoding='iso-8859-1') 
        articles = []
        current_date = ''
        #Gets all dates and entries in the correctly dispersed way e.g. date, list of articles for date, next date, next list of articles
        #first expression gets entries, second gets dates
        dates_and_articles = soup.findAll(lambda tag: (tag.name == 'p' and
                                          tag.attrs == [(u'class', u'style2')] and
                                          len(tag) == 4 and
                                          'Website of the' not in tag.decode('utf-8')) or
                                          (tag.name == 'font' and
                                          tag.attrs == [(u'color', u'#990000'), (u'size', u'-1')]))
        for tag in dates_and_articles:
            #if 'Today\'s\n Stories' in tag.contents:
            if tag.name == 'p':
                #logic to deal with different ways names are printed (color difference I belive)
                if tag.find('span', {'class': 'style1'}):
                    author = tag.contents[0].contents[0] + ': '
                    url = 'http://www.counterpunch.com/' + tag.contents[3].attrs[0][1]
                else:
                    author = tag.contents[0] + ': '
                    url = 'http://www.counterpunch.com/' + tag.contents[3].attrs[0][1]
                title = author + str(tag.contents[3].contents[0])
                articles.append({'title': title, 'url': url, 'description':'', 'date': current_date})
            #if new date, update current_date
            elif tag.name == 'font':
                current_date = tag.contents[0]
                #print('the date is {0}').format(current_date)
        #cut just one days articles for clearer, quicker debugging
        articles = [a for a in articles if a['date'] == 'October 11, 2010']
        return articles
            
#for debugging on the cmd             
#c = Counterpunch()
#print c.parse_index()This is the first recipe I have written. It is for a site that has no rss. The articles are in a table at the side of the page separated by date headings. I mocked it up as a .py file first. I got it to a workable state where it will spit out a list of feeds on the commandline. I then made the few small changes to it to make it into a recipe and test with 'ebook-convert counterpunch.recipe test --test -vv' but I get the below traceback: Code: 1% Converting input to HTML...
InputFormatPlugin: Recipe Input running
1% Fetching feeds...
Traceback (most recent call last):
  File "/tmp/init.py", line 48, in <module>
  File "/home/kovid/build/calibre/src/calibre/ebooks/conversion/cli.py", line 254, in main
  File "/home/kovid/build/calibre/src/calibre/ebooks/conversion/plumber.py", line 836, in run
  File "/home/kovid/build/calibre/src/calibre/customize/conversion.py", line 216, in __call__
  File "/home/kovid/build/calibre/src/calibre/web/feeds/input.py", line 105, in convert
  File "/home/kovid/build/calibre/src/calibre/web/feeds/news.py", line 712, in download
  File "/home/kovid/build/calibre/src/calibre/web/feeds/news.py", line 837, in build_index
  File "/tmp/calibre_0.7.26_tmp_Ep1Dpi/calibre_0.7.26_IUpdj4_recipes/recipe0.py", line 15, in parse_index
    articles = self.parse_page(url)
  File "/tmp/calibre_0.7.26_tmp_Ep1Dpi/calibre_0.7.26_IUpdj4_recipes/recipe0.py", line 28, in parse_page
    dates_and_articles = soup.findAll(lambda tag: (tag.name == 'p' and
  File "/usr/lib/python2.6/site-packages/BeautifulSoup.py", line 768, in findAll
  File "/usr/lib/python2.6/site-packages/BeautifulSoup.py", line 332, in _findAll
  File "/usr/lib/python2.6/site-packages/BeautifulSoup.py", line 890, in search
  File "/usr/lib/python2.6/site-packages/BeautifulSoup.py", line 849, in searchTag
  File "/usr/lib/python2.6/site-packages/BeautifulSoup.py", line 907, in _matches
  File "/tmp/calibre_0.7.26_tmp_Ep1Dpi/calibre_0.7.26_IUpdj4_recipes/recipe0.py", line 31, in <lambda>
    'Website of the' not in tag.decode('utf-8')) or
TypeError: 'NoneType' object is not callableCan anyone get it to run to grab the feeds for calibre? Thanks | 
|   |   | 
|  11-05-2010, 09:55 PM | #2 | 
| Wizard            Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T | 
			
			I tested briefly on another machine, and got your feed parsed correctly.  The articles weren't pulling, and I didn't debug why, but you were parsing the articles and building the feed from your source page just fine. The recipe didn't finish, and I'm not sure if all you articles were parsed correctly, but most were. I started to play with it, added a postprocess_html for debugging, cleaned up some comments, added some print statements and the recipe finished, (empty articles) but that's as far as I went. I know it's not much, but I thought you might want to know you weren't ignored. | 
|   |   | 
| Advert | |
|  | 
|  12-21-2010, 05:15 PM | #3 | 
| Junior Member  Posts: 4 Karma: 10 Join Date: Dec 2010 Device: Kindle 3 | 
			
			Counterpunch is a good web publication and as a calibre user I would appreciate it if its recipe gets debugged and put into the software distribution.
		 | 
|   |   | 
|  07-28-2011, 06:40 PM | #4 | 
| Member  Posts: 19 Karma: 10 Join Date: Jul 2010 Device: Calibre | 
			
			It's been a year and a half since the original post. Does anyone know about any developments? I really would like to get a hold of a working recipe for CounterPunch. Thanks.
		 | 
|   |   | 
|  07-29-2011, 03:21 PM | #5 | 
| Member  Posts: 12 Karma: 10 Join Date: Oct 2010 Location: UK Device: Kindle 3 WiFi, Kindle Paperwhite 2013 |   
			
			I rewrote it and got it working. I have contributed it to Calibre. It will be included from the version released today (0.8.12). If you don't want to update you can use the file attached to this post. Enjoy! | 
|   |   | 
| Advert | |
|  | 
|  07-31-2011, 07:01 PM | #6 | 
| Member  Posts: 19 Karma: 10 Join Date: Jul 2010 Device: Calibre | 
			
			Thank you so much. So far so good! I love it!
		 | 
|   |   | 
|  08-05-2011, 10:00 AM | #7 | 
| Member  Posts: 19 Karma: 10 Join Date: Jul 2010 Device: Calibre | 
			
			There seems to be a limit of 10 entries per day. Actually some days there are less than ten and some days there are more than 10. So how does that work? Is there a way to make sure that no entries are repeated and that all entries eventually get pulled off? I'm new to this, so I am not sure how it works. Thanks.
		 | 
|   |   | 
|  09-04-2011, 04:57 AM | #8 | 
| Member  Posts: 12 Karma: 10 Join Date: Oct 2010 Location: UK Device: Kindle 3 WiFi, Kindle Paperwhite 2013 | 
			
			Counterpunch have redesigned their site and now have an RSS feed, making things easier for the recipe. I have rewritten and submitted it to Calibre. It will be in the next version, which should be released next Friday (9 Sept). You can use the version I attached to this post if you want in the meantime. @aritza The new recipe has a limit of 7 days/100 posts but since it works by RSS now it is really limited by the number of posts in the feed (25 at this time.) | 
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| To MOBI, Chapter detection fails? Works for EPUB | Fmstrat | Calibre | 7 | 08-29-2010 05:37 PM | 
| Help a beginner:Python/Recipe Unicode and ASCII | Starson17 | Calibre | 2 | 02-15-2010 11:10 AM | 
| NY Times Recipe in Calibre 6.36 Fails | keyrunner | Calibre | 1 | 01-28-2010 11:56 AM | 
| Is it possible to specify output format in recipe file | madcow_x2 | Calibre | 3 | 01-07-2010 04:10 PM | 
| Recipe works from 1 machine, not from another | BarryTX | Calibre | 12 | 07-18-2009 12:31 AM |