View Single Post
Old 02-09-2010, 02:48 PM   #1404
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kovidgoyal View Post
There isn't (recursion following happens in a whole different module). The only workaround is to use preprocess_html and index_to_soup to do it manually
Thank you. Your comment still saves me a lot of effort.
Another question ( I know, they are endless - I will not be offended if you do not answer - I'm sure your time is better spent writing code than dragging me up the learning slope.)

I keep getting this error (below 'Processing images...') when trying to get food recipe images:

Spoiler:
Processing links...
http://feeds.epicurious.com/~r/healt...Paprika-241851 saved to appdata\local\temp\calibre_0.6.38_q3s9rh_plumber\f eed_0\article_1\index.xhtml
Downloading
Fetching http://feeds.epicurious.com/~r/newre...MVm7DSk/357252
Downloaded article: Mussels with Sherry, Saffron and Paprika from http://feeds.epicurious.com/~r/healt...Paprika-241851
17% Article downloaded: u'Mussels with Sherry, Saffron and Paprika'
Processing images...
Fetching http://www.epicurious.com/images/rec...2010_february/
357252_116.jpg
Could not fetch image http://www.epicurious.com/images/rec...2010_february/
357252_116.jpg
Traceback (most recent call last):
File "C:\Util\Calibre2\src\src\calibre\web\fetch\simple .py", line 315, in process_images
File "C:\Util\Calibre2\src\src\calibre\web\fetch\simple .py", line 208, in fetch_url
FetchError: Not Found


The image it says is "Not Found," however, is easily retrieved in FireFox. I've tried looking at the headers in a FireFox session, I've considered, maybe it is a robots.txt, cookies or user agent issue, but I can't seem to figure it out. It retrieves fine in FF when I block cookies, and AFAICT, the fetch process uses a FF user agent and ignores robots.txt. I've even tried using a delay. Is this something I need to use mechanize for and fetch the image in a browser session, or am I missing something simpler?

Edit:
I think I've figured it out. There is an ASCII 0A character in the middle of the link in the page source, right where it breaks after 'http://www.epicurious.com/images/recipesmenus/2010/2010_february/' before '357252_116.jpg.'

I see another error in the output where it says it can't find 'http://www.epicurious.com/images/recipesmenus/2010/2010_february/%20357252_116.jpg' (Note percent 20 char).

The problem seems to be in the page source, but I'm not sure why it works in FF? Perhaps FF is cleaning it up somehow. Do I need to do a preprocess_html to fix this?

Last edited by Starson17; 02-09-2010 at 04:20 PM.
Starson17 is offline