View Single Post
Old 04-25-2011, 01:20 PM   #9
aerodynamik
Enthusiast
aerodynamik doesn't litteraerodynamik doesn't litter
 
Posts: 43
Karma: 136
Join Date: Mar 2011
Device: Kindle Paperwhite
Quote:
Originally Posted by kovidgoyal View Post
That can happen in various ways when you are manipulating the HTML. To avoid it, I typically just strip all comments with a regexp in preprocess_regexps
Okay.

Gave postprocess_html a quick try. Obviously it has the processed page that was downloaded by adding it to feeds. However, the additional multi-pages that I download within this method are obviously not processed with remove_tags.

Not sure, I understood your original comment correctly. Did you mean that I should implement "remove all tags in remove_tags" in postprocess_html, since the pages I download in preprocess_html would then also be processed in postprocess_html?
aerodynamik is offline   Reply With Quote