View Single Post
Old 09-04-2011, 02:30 PM   #5
macpablus
Enthusiast
macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.
 
Posts: 25
Karma: 1896
Join Date: Aug 2011
Device: Kindle 3
Quote:
Originally Posted by kovidgoyal View Post
Just stick the regexp in your recipe as

Code:
preprocess_regexps= [(re.compile(r'<!DOCTYPE[^>]+>', re.I), lambda m:'')]
That should strip any doctype declarations from downloaded HTML.
Didn't work. "Downloaded HTML" includes the index file?. 'Cause that's the one causing the problem, in fact.
macpablus is offline   Reply With Quote