Selective preprocess_regexps
Hi,
Is there a way to selectively turn on/off the usage of preprocess_regexps?
My recipe's parse_index() works as follows:
1. first visit the newspaper's main page, extract section names and section url's
2. visit the section url to extract the articles within that section
With the latest update to newspaper's site, step 2 fails because preprocess_regexps strips out the html part containing the article titles and urls.
I need the preprocess_regexps because it strips out all the crap in the actual article contents; however, I don't need it/want it during the parse_index() stage.
Is there a solution for my problem?
Thanks!
|