Quote:
Originally Posted by schnortz
I am not having luck with the following...
|
I enjoy answering these little puzzles, but it's a lot easier if you provide a link to the page that you are having trouble with, and a copy of the recipe you're using.
Here you are asking why this doesn't match something. Usually, that would be impossible without a link to the "something," but I do see an error in this.
Quote:
1. Remove the Additional Information box that comes up after a couple of paragraphs of each article. I have tried
Code:
preprocess_regexps = [
(re.compile(r'<p></p><div*.</div>', re.IGNORECASE | re.DOTALL), lambda match : r''),
]
without success.
|
I assume you wanted to delete everything in the <div> tag, but you reversed the "everything." it should be ".*" not "*."
Quote:
2. Remove any RSS feeds that start with the word "Photo" or "Photos:"
Any guidance that you can give would be very helpful.
|
I suspect you want to remove any articles that start with those words, not "feeds" - correct? You control the list of feeds.
For articles, I used to think that filter_regexps would do that job, but I never got it to work. Maybe it only works on recursed links, not the main article link.