|
|
#1 |
|
Connoisseur
![]() ![]() Posts: 99
Karma: 170
Join Date: Nov 2010
Location: Airdrie Alberta
Device: Sony 650
|
Remove hyperlink properties from inside <i> etc
I know this:
Small piece of code to convert all links to text: def preprocess_html(self, soup): for alink in soup.findAll('a'): if alink.string is not None: tstr = alink.string alink.replaceWith(tstr) return soup BUT how do you convert links to text that are hidden in 'h2', 'strong' 'i' etc <h2> <a href="http://www.filmcritic.com/reviews/in-theaters">In Theaters</a> </h2> OR like this <a href="http://www.filmcritic.com/reviews/1937/snow-white-and-the-seven-dwarfs/"><i>Snow White</i></a> |
|
|
|
|
|
#2 | |
|
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 22
Karma: 1756
Join Date: Jan 2011
Location: Moscow, RU
Device: Kindle3, iPhone4, iPad2
|
Quote:
Code:
preprocess_regexps = [
(re.compile(r'<a.*?>'), lambda h1: ''),
(re.compile(r'</a>'), lambda h2: '')]
<h2>
In Theaters
</h2>
Code:
preprocess_regexps = [
(re.compile(r'(<a href=")([^"]+)(">)(.*)(</a>)'),
lambda h: '%s (%s)' % (h.group(4), h.group(2))]
<h2>
In Theaters (http://www.filmcritic.com/reviews/in-theaters)
</h2>
Spoiler:
|
|
|
|
|
| Advert | |
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Remove color behind hyperlink | mufc | Recipes | 4 | 11-24-2010 08:56 AM |
| Adding properties to books | IzzyMad | Workshop | 3 | 10-15-2010 12:05 PM |
| Hyperlink? | fcoulter | Sigil | 3 | 03-28-2010 11:31 AM |
| Unsetting properties in CSS | Jellby | ePub | 2 | 06-03-2009 05:29 AM |
| Changing pdf properties | Puddytat purr | 2 | 02-22-2008 10:27 AM | |