View Single Post
Old 12-05-2011, 04:40 PM   #1
dasp
Junior Member
dasp began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jul 2011
Device: Kindle
Selective preprocess_regexps

Hi,

Is there a way to selectively turn on/off the usage of preprocess_regexps?

My recipe's parse_index() works as follows:

1. first visit the newspaper's main page, extract section names and section url's
2. visit the section url to extract the articles within that section

With the latest update to newspaper's site, step 2 fails because preprocess_regexps strips out the html part containing the article titles and urls.

I need the preprocess_regexps because it strips out all the crap in the actual article contents; however, I don't need it/want it during the parse_index() stage.

Is there a solution for my problem?

Thanks!
dasp is offline   Reply With Quote