Quote:
Am I correct in thinking that your next question is "How?"
Spoiler:
Many ways. Let me send you here first, then get back to you.
Also remove_tags, preprocess_regexps, preprocess_html or postprocess_html. I've got to go - back later if you have Q's
|
Man i wish there was a way i could ask questions without flooding this board and all.
lets say in every parse i get something that has a doubleclick.net ad in it
I tried
Code:
filter_regexps = [r'feedads\.g\.doubleclick\.net']
and yeah i didn't see any indent errors this time.
thought well maybe if i use preprocess_regexps and remove all the instances of doubleclick first.
So then i looked in the beautiful soup documentation and after a big headache i'm still kinda lost

I tried this as well...
Code:
preprocess_regexps = [(re.compile(r'feedads\.g\.doubleclick\.net', re.DOTALL), lambda m: '')]
thanks again