MobileRead Forums - View Single Post - Multipage questions (Sueddeutsche Magazin)

aerodynamik · 04-25-2011, 02:20 PM

Quote:

Originally Posted by kovidgoyal

That can happen in various ways when you are manipulating the HTML. To avoid it, I typically just strip all comments with a regexp in preprocess_regexps

Okay.

Gave postprocess_html a quick try. Obviously it has the processed page that was downloaded by adding it to feeds. However, the additional multi-pages that I download within this method are obviously not processed with remove_tags.

Not sure, I understood your original comment correctly. Did you mean that I should implement "remove all tags in remove_tags" in postprocess_html, since the pages I download in preprocess_html would then also be processed in postprocess_html?