View Single Post
Old 12-14-2011, 01:50 PM   #2
Barty
Wizard
Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.Barty ought to be getting tired of karma fortunes by now.
 
Posts: 1,571
Karma: 3139999
Join Date: Sep 2010
Device: Kindle 3, PW2, iPad 3
try using a preprocess regex

Code:
    preprocess_regexps = [
        (re.compile(r'<p><strong>MORE:</strong>.+?</p>', re.I|re.DOTALL), lambda x:''),
        ]
You can use just re.DOTALL instead of re.I|re.DOTALL if you know the case will always be exactly like that (re.I means ignore case).
Barty is offline   Reply With Quote