I have not had luck finding a solution within my searches (probably not using the correct terminology.)
I have successfully modified the Cincinnati Enquirer recipe as the basis of my Appleton Post Crescent recipe as they both used similar web templates. However, I am not having luck with the following...
1. Remove the Additional Information box that comes up after a couple of paragraphs of each article. I have tried
Quote:
preprocess_regexps = [
(re.compile(r'<p></p><div*.</div>', re.IGNORECASE | re.DOTALL), lambda match : r''),
]
|
without success.
2. Remove any RSS feeds that start with the word "Photo" or "Photos:"
Any guidance that you can give would be very helpful.