Thanks to your tips, I've solved this problem. Here's my solution:
Code:
valid_filename_chars = "-_.%s%s" % (string.ascii_letters, string.digits)
def image_url_processor(self, baseurl, url):
self.log("===================\nbaseurl: ", baseurl, "\nurl: ", url)
# This is a hack because some of the URLs just have a leading
# // instead of http://
if url.startswith("//"):
url = "http:" + url
url = self.get_image(url)
self.log("url out: ", url, "\n===================")
return url
def get_image(self, url):
# Another hack - sometimes the URLs just have a leading /,
# in which case I stick on "http://" and the correct domain
if url.startswith("/"):
url = self.make_url(url)
# Get the image bytes
br = BasicNewsRecipe.get_browser()
response = br.open(url)
data = response.get_data()
# write it to a local file whose name is based on the URL
filename = ''.join(c for c in url if c in self.valid_filename_chars)
self.log("filename=%s" % filename)
f = open(filename, "wb")
f.write(data)
f.close()
# Try to read it with PIL, which is what the containing app will do
try:
im = PIL.Image.open(filename)
except:
# If it failed, read it with ImageMagick and write it to a new file,
# changing the URL to point to the new file
self.log("Could not open ", filename, " from ", url)
self.log("Trying to open and re-save with ImageMagick")
image = calibre.utils.magick.Image()
image.read(filename)
image.save("new_" + filename)
url = os.getcwd() + "/new_" + filename
url = "file:///" + url.replace("\\", "/")
self.log("Succeeded. Using local file")
return url
Luckily, ImageMagick manages to load the file successfully AND heal it when saving it back out, so I didn't have to look into what exactly was wrong with these files. For curiosity's sake, I did do a comparison of the old and new, and there are differences, but since I don't know squat about the PNG format (and don't have the time or energy to learn), I don't know exactly what they mean.
If I had the time, I'd grab the calibre source, find where it's doing the image load, put something like this in, and submit it as a bug fix, but unfortunately, I don't. If anyone else out there wants to do it, go ahead - I happily release this code (particularly the try...except bit that solves the problem) into the public domain.