recipe to ignore image size constraint on web page

rutmang · 07-27-2014, 08:09 PM

Hello All,

Apologies if this is present elsewhere. I have searched as well as I can and have found some interesting things, but nothing that is getting me where I'd like.

What I would like to do is bring down a Wordpress RSS feed that contains mainly posts of single images. The source of the image is also its hyperlink when the image is clicked on the webpage. Since the image presented on the webpage has been size-constrained, it essentially acts like a very simple "click to enlarge" link, opening the full-sized image as a new page in the browser.

Source looks like this, for example:

<a href="http://path.to.image.com/02-01.jpg"><img src="http://path.to.image.com/02-01.jpg?w=233&h=300"

What I would calibre to do when fetching is simply download and save the image from the "a href=" and present it according to the fetching output profile rather than with the web page's "?w=233&h=300" constraint.

I've looked at a lot of code today and though I had found something close here: https://www.mobileread.com/forums/sho...0&postcount=21

But it didn't seem to perform what I was asking or I didn't place it correctly in the Advanced Mode (which otherwise is all default values).

Trying to learn and appreciate any help! Thanks!

kovidgoyal · 07-27-2014, 10:07 PM

Code:

def preprocess_html(self, soup):
   for img in soup.findAll('img'):
       a = img.findParet('a')
       if a is not None:
           img['src'] = a['href']
       return soup

rutmang · 07-30-2014, 07:34 PM

Thank you, Kovid, for such a quick response. I have tried it and corrected a typo ("img.findParet" to "img.findParent") and tried putting it at the top, at the bottom of my recipe to no avail. Calibre is still downloading the image as the size it appears on the web page, not the original size of the file it is reading from.

It appears I didn't give the full text of the image statement, but I don't believe it would have any bearing as your code appears to simply tell it to get the original image:

<a href="http://path.to.image.com/02-01.jpg"><img src="http://path.to.image.com/02-01.jpg?w=233&h=300" alt="filedescription" width="240" height="300" /></a>

Maybe there is something else happening I can't describe, but I certainly do appreciate your time! I will keep playing around with it!

Thank you.

kovidgoyal · 07-30-2014, 10:15 PM

Remember that the function has to be part of the recipe class, which means it has to be at th end of the class and properly indented.

rutmang · 07-31-2014, 04:52 PM

Fantastic! I added four spaces to the beginning of each of your lines and worked just as I had hoped!

Thank you so much for your wonderful product and work!

07-27-2014, 08:09 PM	#1
rutmang Junior Member Posts: 7 Karma: 10 Join Date: Aug 2013 Device: Kindle Fire 8.9	recipe to ignore image size constraint on web page Hello All, Apologies if this is present elsewhere. I have searched as well as I can and have found some interesting things, but nothing that is getting me where I'd like. What I would like to do is bring down a Wordpress RSS feed that contains mainly posts of single images. The source of the image is also its hyperlink when the image is clicked on the webpage. Since the image presented on the webpage has been size-constrained, it essentially acts like a very simple "click to enlarge" link, opening the full-sized image as a new page in the browser. Source looks like this, for example: <a href="http://path.to.image.com/02-01.jpg"><img src="http://path.to.image.com/02-01.jpg?w=233&h=300" What I would calibre to do when fetching is simply download and save the image from the "a href=" and present it according to the fetching output profile rather than with the web page's "?w=233&h=300" constraint. I've looked at a lot of code today and though I had found something close here: https://www.mobileread.com/forums/sho...0&postcount=21 But it didn't seem to perform what I was asking or I didn't place it correctly in the Advanced Mode (which otherwise is all default values). Trying to learn and appreciate any help! Thanks!

07-27-2014, 10:07 PM	#2
kovidgoyal creator of calibre Posts: 45,395 Karma: 27756918 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Code: def preprocess_html(self, soup): for img in soup.findAll('img'): a = img.findParet('a') if a is not None: img['src'] = a['href'] return soup

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Page blank before and after book image page	osiris12	Sigil	12	05-28-2015 04:27 PM
Script to scrape page for a cover image for recipe?	adoucette	Recipes	12	02-29-2012 06:24 PM
image on separate page without half-page text next	Toxaris	ePub	2	01-26-2011 03:32 AM
recipe to pull web page similar to 'print/save as pdf'	JPD	Recipes	15	09-29-2010 09:20 AM
web page image incorrecly appears at top of conversion	JPD	Calibre	7	09-28-2010 11:59 AM

07-30-2014, 07:34 PM	#3
rutmang Junior Member Posts: 7 Karma: 10 Join Date: Aug 2013 Device: Kindle Fire 8.9	Thank you, Kovid, for such a quick response. I have tried it and corrected a typo ("img.findParet" to "img.findParent") and tried putting it at the top, at the bottom of my recipe to no avail. Calibre is still downloading the image as the size it appears on the web page, not the original size of the file it is reading from. It appears I didn't give the full text of the image statement, but I don't believe it would have any bearing as your code appears to simply tell it to get the original image: <a href="http://path.to.image.com/02-01.jpg"><img src="http://path.to.image.com/02-01.jpg?w=233&h=300" alt="filedescription" width="240" height="300" /></a> Maybe there is something else happening I can't describe, but I certainly do appreciate your time! I will keep playing around with it! Thank you.

07-30-2014, 10:15 PM	#4
kovidgoyal creator of calibre Posts: 45,395 Karma: 27756918 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Remember that the function has to be part of the recipe class, which means it has to be at th end of the class and properly indented.

07-31-2014, 04:52 PM	#5
rutmang Junior Member Posts: 7 Karma: 10 Join Date: Aug 2013 Device: Kindle Fire 8.9	Fantastic! I added four spaces to the beginning of each of your lines and worked just as I had hoped! Thank you so much for your wonderful product and work!

Advert

Advert