Thread
:
Bad DOCTYPE declaration causes BS to crash
View Single Post
09-04-2011, 02:30 AM
#
4
kovidgoyal
creator of calibre
Posts: 45,377
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Just stick the regexp in your recipe as
Code:
preprocess_regexps= [(re.compile(r'<!DOCTYPE[^>]+>', re.I), lambda m:'')]
That should strip any doctype declarations from downloaded HTML.
kovidgoyal
View Public Profile
Visit kovidgoyal's homepage!
Find More Posts by kovidgoyal
Track Posts by kovidgoyal via RSS