View Single Post
Old 07-08-2011, 09:19 AM   #4
Rasmus
Junior Member
Rasmus began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jul 2011
Device: Kindle3
I will need a bit more help, it seems.

This is a clean Python example:

Code:
>>> u = 'http://www.spiegel.de/international/europe/0,1518,773071,00.html'
>>> v = urllib2.urlopen(u).read()
>>> soup = BeautifulSoup(v) # this should be identical to Calibre's Soup
>>> img = soup.find('div', {'class' : 'spGalleryBigPic'}).find('img')
>>> type(img)
     <class 'BeautifulSoup.Tag'>
So basically, I want to insert this into the article (below the heading, but for now I just want to get it into the epub article).

So I wrote the following preprocess function:
Code:
    def preprocess_html(self, soup):
        soup = soup.find('div', {'class' : 'spGalleryBigPic'}).find('img')
        return soup
which returns what I called img above.

I get the article using:
Code:
    def print_version(self, url): 
        'from Spigelde.receipt'
        rmt = url.rpartition('#')[0]
        main, sep, rest = rmt.rpartition(',')
        rmain, rsep, rrest = main.rpartition(',')
        purl = rmain + ',druck-' + rrest + ',' + rest
        return purl
But currently it only works when I do not use preprocess_html.
Rasmus is offline   Reply With Quote