MobileRead Forums - View Single Post - Bad DOCTYPE declaration causes BS to crash

macpablus · 09-04-2011, 03:30 PM

Quote:

Originally Posted by kovidgoyal

Just stick the regexp in your recipe as

Code:

preprocess_regexps= [(re.compile(r'<!DOCTYPE[^>]+>', re.I), lambda m:'')]

That should strip any doctype declarations from downloaded HTML.

Didn't work. "Downloaded HTML" includes the index file?. 'Cause that's the one causing the problem, in fact.