Quote:
Originally Posted by kovidgoyal
That can happen in various ways when you are manipulating the HTML. To avoid it, I typically just strip all comments with a regexp in preprocess_regexps
|
Okay.
Gave postprocess_html a quick try. Obviously it has the processed page that was downloaded by adding it to feeds. However, the additional multi-pages that I download within this method are obviously not processed with remove_tags.
Not sure, I understood your original comment correctly. Did you mean that I should implement "remove all tags in remove_tags" in postprocess_html, since the pages I download in preprocess_html would then also be processed in postprocess_html?