View Single Post
Old 10-13-2012, 01:00 PM   #1
BRGriff began at the beginning.
Posts: 58
Karma: 12
Join Date: May 2011
Location: Deland, Florida
Device: Kindle 3
USA Today "pageinfo data hidden"

I am trying to remove several tags from the USA Today recipe that I personally find annoying. Here is my remove_tags recipe:

remove_tags = [dict(name='aside', attrs={'class':['comp story-highlights','right partner','right']}),
               dict(name='span', attrs={'class':['last-updated']}),
               dict(name='div', attrs={'class':['pageinfo  data hidden']}),
The first two tag removals are working fine. But, for the life of me, I can not get the third tag removal to work. I cut and pasted the "pageinfo data hidden" directly from the source code so if there appears to be two spaces between "pageinfo" and "data" that was the way it was written. In fact, I have tried it both ways (one space and two spaces) without success.

Here is the original source code from the RSS feed:

<div class="pageinfo data hidden">
"assetid": "1631253",
"aws": "tech",
"aws_id": "tech",
"blogname": "",
"contenttype": "story pages ",
"pagename": "Space shuttle Endeavour continues trek through L.A.",
"seotitle": "Shuttle-endeavour-los-angeles",
"seotitletag": "Space shuttle Endeavour continues trek through L.A.",
"ssts": "tech",

"taxonomykeywords":"Traffic congestion,Space exploration,Los Angeles,California,Manchester,Manchester,Long Beach,Manchester",

"templatename": "stories/default",





All of that garbage is appearing after every article and I would very much appreciate any help in assisting me in removing it from download.

Thank you,
BRGriff is offline   Reply With Quote