MobileRead Forums - View Single Post - Remove hyperlink properties from inside <i> etc

thearr · 03-07-2011, 02:09 AM

Quote:

Originally Posted by mufc

BUT how do you convert links to text that are hidden in 'h2', 'strong' 'i' etc

<h2>
<a href="http://www.filmcritic.com/reviews/in-theaters">In Theaters</a>
</h2>

I use regular expressions. If you just want to remove hyperlink properties, then it is possible to do, for example, this way:

Code:

preprocess_regexps = [
        (re.compile(r'<a.*?>'), lambda h1: ''),
        (re.compile(r'</a>'), lambda h2: '')]

<h2>
In Theaters
</h2>

If you want to preserve a link as a text, then you can do smth like this:

Code:

preprocess_regexps = [
        (re.compile(r'(<a href=")([^"]+)(">)(.*)(</a>)'), 
           lambda h: '%s (%s)' % (h.group(4), h.group(2))]

<h2>
In Theaters (http://www.filmcritic.com/reviews/in-theaters)
</h2>

This is my the DogHouseDiaries webcomics recipe with something similar I wrote above:

Spoiler: