MobileRead Forums - View Single Post - Having problems using h1 tag since website changed

clintiepoo · 12-28-2011, 03:25 PM

Quote:

Originally Posted by Barty

you could probably do it with preprocess_regexps but why? Are you sure want to remove all links? You're going to get missing text, e.g.,

As we (link)argued in this column last month(link), the current situation is...

becomes

As we, the current situation is...

If you want to remove certain links, then target them, for example

remove_tags= [ dict(name='a',attrs={'href':re.compile(r'doublecli ck\.net',re.I)}) ]

to remove doubleclick links

I understand your logic, and now I've been able to get rid of all the links individually by using tags within them. But, I'm still having trouble getting the image out of this:

Code:

<a href="http://bloximages.chicago2.vip.townnews.com/jg-tc.com/content/tncms/assets/v3/editorial/0/c1/0c16b29b-e8fc-55a6-8d20-f9ba420f8230/4ef38d670cd77.image.jpg" rel="facebox">
            
                <img id="img-holder" src="http://bloximages.chicago2.vip.townnews.com/jg-tc.com/content/tncms/assets/v3/editorial/0/c1/0c16b29b-e8fc-55a6-8d20-f9ba420f8230/4ef38d679d43b.preview-300.jpg" alt=" " width="300px">
            </a>

The image is buried beneath the link. I used to use "img-holder to isolate it and just keep that tag, but I'm not sure how to do it now. If I try to keep this whole link (ie not remove it) the whole story blows up and fails to download. I'm getting close, just not there.