View Single Post
Old 08-22-2010, 06:24 PM   #2494
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:

Am I correct in thinking that your next question is "How?"

Spoiler:
Many ways. Let me send you here first, then get back to you.

Also remove_tags, preprocess_regexps, preprocess_html or postprocess_html. I've got to go - back later if you have Q's
Man i wish there was a way i could ask questions without flooding this board and all.
lets say in every parse i get something that has a doubleclick.net ad in it
I tried
Code:
filter_regexps = [r'feedads\.g\.doubleclick\.net']
and yeah i didn't see any indent errors this time.
thought well maybe if i use preprocess_regexps and remove all the instances of doubleclick first.
So then i looked in the beautiful soup documentation and after a big headache i'm still kinda lost
I tried this as well...
Code:
preprocess_regexps     = [(re.compile(r'feedads\.g\.doubleclick\.net', re.DOTALL), lambda m: '')]
thanks again
TonytheBookworm is offline