Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-05-2010, 01:29 AM   #1
ode
Member
ode began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Oct 2010
Location: UK
Device: Kindle 3 WiFi, Kindle Paperwhite 2013
Question Recipe works when mocked up as Python file, fails when converted to Recipe

Code:
import urllib2
from BeautifulSoup import BeautifulSoup
from calibre.web.feeds.news import BasicNewsRecipe

class Counterpunch(BasicNewsRecipe):
    '''
    Parses counterpunch.com for articles
    '''  
    def parse_index(self):
		feeds = []
		title, url = 'Counterpunch', 'http://www.counterpunch.com'
		articles = self.parse_page(url)
		if articles:
			feeds.append((title, articles))
		return feeds
			
			
    def parse_page(self, url):
        fd = urllib2.urlopen(url)
        soup = BeautifulSoup(fd, fromEncoding='iso-8859-1') 
        articles = []
        current_date = ''
        #Gets all dates and entries in the correctly dispersed way e.g. date, list of articles for date, next date, next list of articles
        #first expression gets entries, second gets dates
        dates_and_articles = soup.findAll(lambda tag: (tag.name == 'p' and
                                          tag.attrs == [(u'class', u'style2')] and
                                          len(tag) == 4 and
                                          'Website of the' not in tag.decode('utf-8')) or
                                          (tag.name == 'font' and
                                          tag.attrs == [(u'color', u'#990000'), (u'size', u'-1')]))
        for tag in dates_and_articles:
            #if 'Today\'s\n Stories' in tag.contents:
            if tag.name == 'p':
                #logic to deal with different ways names are printed (color difference I belive)
                if tag.find('span', {'class': 'style1'}):
                    author = tag.contents[0].contents[0] + ': '
                    url = 'http://www.counterpunch.com/' + tag.contents[3].attrs[0][1]
                else:
                    author = tag.contents[0] + ': '
                    url = 'http://www.counterpunch.com/' + tag.contents[3].attrs[0][1]
                title = author + str(tag.contents[3].contents[0])
                articles.append({'title': title, 'url': url, 'description':'', 'date': current_date})
            #if new date, update current_date
            elif tag.name == 'font':
                current_date = tag.contents[0]
                #print('the date is {0}').format(current_date)
        #cut just one days articles for clearer, quicker debugging
        articles = [a for a in articles if a['date'] == 'October 11, 2010']
        return articles
            
#for debugging on the cmd             
#c = Counterpunch()
#print c.parse_index()

This is the first recipe I have written.
It is for a site that has no rss. The articles are in a table at the side of the page separated by date headings.
I mocked it up as a .py file first. I got it to a workable state where it will spit out a list of feeds on the commandline.
I then made the few small changes to it to make it into a recipe and test with 'ebook-convert counterpunch.recipe test --test -vv' but I get the below traceback:


Code:
1% Converting input to HTML...
InputFormatPlugin: Recipe Input running
1% Fetching feeds...
Traceback (most recent call last):
  File "/tmp/init.py", line 48, in <module>
  File "/home/kovid/build/calibre/src/calibre/ebooks/conversion/cli.py", line 254, in main
  File "/home/kovid/build/calibre/src/calibre/ebooks/conversion/plumber.py", line 836, in run
  File "/home/kovid/build/calibre/src/calibre/customize/conversion.py", line 216, in __call__
  File "/home/kovid/build/calibre/src/calibre/web/feeds/input.py", line 105, in convert
  File "/home/kovid/build/calibre/src/calibre/web/feeds/news.py", line 712, in download
  File "/home/kovid/build/calibre/src/calibre/web/feeds/news.py", line 837, in build_index
  File "/tmp/calibre_0.7.26_tmp_Ep1Dpi/calibre_0.7.26_IUpdj4_recipes/recipe0.py", line 15, in parse_index
    articles = self.parse_page(url)
  File "/tmp/calibre_0.7.26_tmp_Ep1Dpi/calibre_0.7.26_IUpdj4_recipes/recipe0.py", line 28, in parse_page
    dates_and_articles = soup.findAll(lambda tag: (tag.name == 'p' and
  File "/usr/lib/python2.6/site-packages/BeautifulSoup.py", line 768, in findAll
  File "/usr/lib/python2.6/site-packages/BeautifulSoup.py", line 332, in _findAll
  File "/usr/lib/python2.6/site-packages/BeautifulSoup.py", line 890, in search
  File "/usr/lib/python2.6/site-packages/BeautifulSoup.py", line 849, in searchTag
  File "/usr/lib/python2.6/site-packages/BeautifulSoup.py", line 907, in _matches
  File "/tmp/calibre_0.7.26_tmp_Ep1Dpi/calibre_0.7.26_IUpdj4_recipes/recipe0.py", line 31, in <lambda>
    'Website of the' not in tag.decode('utf-8')) or
TypeError: 'NoneType' object is not callable
I assumed it has something to do with the decode method. I have played with this for hours and sometimes have changed it to make this traceback different but still get no feeds when the same code, but when called directly on the cmdline it will give me the feeds I need with no problem.

Can anyone get it to run to grab the feeds for calibre?

Thanks
ode is offline   Reply With Quote
Old 11-05-2010, 09:55 PM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by ode View Post
Can anyone get it to run to grab the feeds for calibre?
I tested briefly on another machine, and got your feed parsed correctly. The articles weren't pulling, and I didn't debug why, but you were parsing the articles and building the feed from your source page just fine.

The recipe didn't finish, and I'm not sure if all you articles were parsed correctly, but most were. I started to play with it, added a postprocess_html for debugging, cleaned up some comments, added some print statements and the recipe finished, (empty articles) but that's as far as I went.

I know it's not much, but I thought you might want to know you weren't ignored.
Starson17 is offline   Reply With Quote
Advert
Old 12-21-2010, 05:15 PM   #3
bcaulf
Junior Member
bcaulf began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Dec 2010
Device: Kindle 3
Counterpunch is a good web publication and as a calibre user I would appreciate it if its recipe gets debugged and put into the software distribution.
bcaulf is offline   Reply With Quote
Old 07-28-2011, 06:40 PM   #4
aritza
Member
aritza began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Jul 2010
Device: Calibre
It's been a year and a half since the original post. Does anyone know about any developments? I really would like to get a hold of a working recipe for CounterPunch. Thanks.
aritza is offline   Reply With Quote
Old 07-29-2011, 03:21 PM   #5
ode
Member
ode began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Oct 2010
Location: UK
Device: Kindle 3 WiFi, Kindle Paperwhite 2013
Smile

I rewrote it and got it working.

I have contributed it to Calibre. It will be included from the version released today (0.8.12).

If you don't want to update you can use the file attached to this post.

Enjoy!
Attached Files
File Type: zip Counterpunch.recipe.zip (805 Bytes, 302 views)
ode is offline   Reply With Quote
Advert
Old 07-31-2011, 07:01 PM   #6
aritza
Member
aritza began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Jul 2010
Device: Calibre
Thank you so much. So far so good! I love it!
aritza is offline   Reply With Quote
Old 08-05-2011, 10:00 AM   #7
aritza
Member
aritza began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Jul 2010
Device: Calibre
There seems to be a limit of 10 entries per day. Actually some days there are less than ten and some days there are more than 10. So how does that work? Is there a way to make sure that no entries are repeated and that all entries eventually get pulled off? I'm new to this, so I am not sure how it works. Thanks.
aritza is offline   Reply With Quote
Old 09-04-2011, 04:57 AM   #8
ode
Member
ode began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Oct 2010
Location: UK
Device: Kindle 3 WiFi, Kindle Paperwhite 2013
Counterpunch have redesigned their site and now have an RSS feed, making things easier for the recipe.
I have rewritten and submitted it to Calibre. It will be in the next version, which should be released next Friday (9 Sept).
You can use the version I attached to this post if you want in the meantime.

@aritza The new recipe has a limit of 7 days/100 posts but since it works by RSS now it is really limited by the number of posts in the feed (25 at this time.)
Attached Files
File Type: zip counterpunch.recipe.zip (295 Bytes, 345 views)
ode is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
To MOBI, Chapter detection fails? Works for EPUB Fmstrat Calibre 7 08-29-2010 05:37 PM
Help a beginner:Python/Recipe Unicode and ASCII Starson17 Calibre 2 02-15-2010 11:10 AM
NY Times Recipe in Calibre 6.36 Fails keyrunner Calibre 1 01-28-2010 11:56 AM
Is it possible to specify output format in recipe file madcow_x2 Calibre 3 01-07-2010 04:10 PM
Recipe works from 1 machine, not from another BarryTX Calibre 12 07-18-2009 12:31 AM


All times are GMT -4. The time now is 01:24 AM.


MobileRead.com is a privately owned, operated and funded community.