Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 07-07-2011, 09:12 PM   #1
Rasmus
Junior Member
Rasmus began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jul 2011
Device: Kindle3
Grabbing and including image from another url

Hi,
I am trying to improve the Spiegel Int'l receipt.

So far I got sections going and removed a lot of noise by using the print version of articles.

However, printed articles does not include images. I like these and would prefer to include them.

I have written the following simple script which grabs an image from the non-printed version of an article:

Code:
    def get_img(self, url):
        txt = BeautifulSoup(urllib2.urlopen(url))
        img = soup.find('div', {'class' : 'spGalleryBigPic'}).find('img')['src']
I assume that using some magic trick ebook-convert knows url (it does in the print function).

My questions are
  1. How do I include this picture in the actual ebook?
  2. Am I 'allowed' to use urllib2 in receipts or is some other method preferred? (Most likely it is)
  3. Should I add some kind of robustness to img or will Calibre handle it? (Probably not; I guess I could just try img to avoid exceptions)

These questions are probably obvious, but I did not seem to be able to find all of the documentation that I wanted in on receipt...

Cheers,
Rasmus

PS: I wrote a receipt for Economist's Daily Chart. Should I share it or what is the custom for these things? Should I share my improved Spiegel receipt?
Rasmus is offline   Reply With Quote
Old 07-07-2011, 10:49 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 24,765
Karma: 4369667
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Implement preprocess_html in your recipe and inset the img elements there, calibre will do the downloading for you. Sharing your recipes is always a good thing.

Last edited by kovidgoyal; 07-07-2011 at 10:58 PM.
kovidgoyal is offline   Reply With Quote
Old 07-08-2011, 05:54 AM   #3
Rasmus
Junior Member
Rasmus began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jul 2011
Device: Kindle3
Thanks for the quick reply. I wasn't specific enough.

I do
Code:
import urllib2
in order to download the html page, which in turn is turned into soup. How could I internalize this?

I'll get the new Spiegel int'l receipt to you once I have gotten images integrated. Thanks for the tip on preprocess_html.

And thanks for Calibre! I run the cli tools from a server every morning and enjoy all of my favorite news when I get up. It is great!

--Rasmus
Rasmus is offline   Reply With Quote
Old 07-08-2011, 09:19 AM   #4
Rasmus
Junior Member
Rasmus began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jul 2011
Device: Kindle3
I will need a bit more help, it seems.

This is a clean Python example:

Code:
>>> u = 'http://www.spiegel.de/international/europe/0,1518,773071,00.html'
>>> v = urllib2.urlopen(u).read()
>>> soup = BeautifulSoup(v) # this should be identical to Calibre's Soup
>>> img = soup.find('div', {'class' : 'spGalleryBigPic'}).find('img')
>>> type(img)
     <class 'BeautifulSoup.Tag'>
So basically, I want to insert this into the article (below the heading, but for now I just want to get it into the epub article).

So I wrote the following preprocess function:
Code:
    def preprocess_html(self, soup):
        soup = soup.find('div', {'class' : 'spGalleryBigPic'}).find('img')
        return soup
which returns what I called img above.

I get the article using:
Code:
    def print_version(self, url): 
        'from Spigelde.receipt'
        rmt = url.rpartition('#')[0]
        main, sep, rest = rmt.rpartition(',')
        rmain, rsep, rrest = main.rpartition(',')
        purl = rmain + ',druck-' + rrest + ',' + rest
        return purl
But currently it only works when I do not use preprocess_html.
Rasmus is offline   Reply With Quote
Old 07-08-2011, 12:06 PM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 24,765
Karma: 4369667
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Use the index_to_soup() method instead of urllib2. WHat you want to do in preprocess_html is

run index_to_soup, find the img element you want in the soup returned by index_to_soup and insert it into the soup passed into preprocess_html. See the BeautifulSoup docs for how to do that.
kovidgoyal is offline   Reply With Quote
Reply

Tags
calibre, images, receipt

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
get print-url and somtimes non-print-url schuster Recipes 4 05-28-2011 03:01 AM
Grabbing pages with wget and using ebook-convert TheLazy1 Conversion 2 05-26-2011 10:40 AM
Including images from PML macr0t0r Calibre 13 12-05-2009 01:52 AM
Need help "grabbing the intrest" of 12-year-old boy purl4peace Reading Recommendations 39 07-04-2009 02:08 PM


All times are GMT -4. The time now is 02:52 AM.


MobileRead.com is a privately owned, operated and funded community.