Quote:
Originally Posted by kovidgoyal
What preprocess_html is doing is extracting the div containing the cartoon and returning that. Probably the rest of the HTML on the page has something that causes an error. The download log should tell you what the error is
|
Thanks for your reply. But what I don't understand is that I already constrain the input to that very same div by setting the keep_only_tags property.
When is this property applied? Before or after calling preprocess_html()? I know I could look it up in the source, but as I said, my Python is a little rusty.
Quote:
Originally Posted by kovidgoyal
The download log should tell you what the error is
|
Yes, it should. But I can make head nor tails of it. Except this oddity: without preprocess_html implemented is starts with
Code:
Download nieuws van Fokke en Sukke - debug
InputFormatPlugin: Recipe Input running Downloading
Downloading
FetchingFetching http://foksuk.nl/nl?cm=79&ctime=1257807600&session=52dd92d33ef2789f432ec37762afe338http://foksuk.nl/nl?cm=79&ctime=1257721200&session=52dd92d33ef2789f432ec37762afe338
Processing images...
and with
Code:
Download nieuws van Fokke en Sukke - debug
InputFormatPlugin: Recipe Input running DownloadingDownloading
Fetching http://foksuk.nl/nl?cm=79&ctime=1257721200&session=89d846cf5dd9f1b85e89b24d566680ec
Fetching http://foksuk.nl/nl?cm=79&ctime=1257807600&session=89d846cf5dd9f1b85e89b24d566680ec
Processing images...
But I have no idea what this might mean, if anything.
But whatever the problem is, it can be worked around. I'll post the finished recipe soon.