![]() |
#1 |
Connoisseur
![]() ![]() Posts: 99
Karma: 170
Join Date: Nov 2010
Location: Airdrie Alberta
Device: Sony 650
|
Remove hyperlink properties from inside <i> etc
I know this:
Small piece of code to convert all links to text: def preprocess_html(self, soup): for alink in soup.findAll('a'): if alink.string is not None: tstr = alink.string alink.replaceWith(tstr) return soup BUT how do you convert links to text that are hidden in 'h2', 'strong' 'i' etc <h2> <a href="http://www.filmcritic.com/reviews/in-theaters">In Theaters</a> </h2> OR like this <a href="http://www.filmcritic.com/reviews/1937/snow-white-and-the-seven-dwarfs/"><i>Snow White</i></a> |
![]() |
![]() |
![]() |
#2 | |
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 22
Karma: 1756
Join Date: Jan 2011
Location: Moscow, RU
Device: Kindle3, iPhone4, iPad2
|
Quote:
Code:
preprocess_regexps = [ (re.compile(r'<a.*?>'), lambda h1: ''), (re.compile(r'</a>'), lambda h2: '')] <h2> In Theaters </h2> Code:
preprocess_regexps = [ (re.compile(r'(<a href=")([^"]+)(">)(.*)(</a>)'), lambda h: '%s (%s)' % (h.group(4), h.group(2))] <h2> In Theaters (http://www.filmcritic.com/reviews/in-theaters) </h2> Spoiler:
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Remove color behind hyperlink | mufc | Recipes | 4 | 11-24-2010 07:56 AM |
Adding properties to books | IzzyMad | Workshop | 3 | 10-15-2010 11:05 AM |
Hyperlink? | fcoulter | Sigil | 3 | 03-28-2010 10:31 AM |
Unsetting properties in CSS | Jellby | ePub | 2 | 06-03-2009 04:29 AM |
Changing pdf properties | Puddytat purr | 2 | 02-22-2008 09:27 AM |