Quote:
Originally Posted by kovidgoyal
Just stick the regexp in your recipe as
Code:
preprocess_regexps= [(re.compile(r'<!DOCTYPE[^>]+>', re.I), lambda m:'')]
That should strip any doctype declarations from downloaded HTML.
|
Didn't work. "Downloaded HTML" includes the index file?. 'Cause that's the one causing the problem, in fact.