MobileRead Forums - View Single Post - Remove <br /> together with span, and only span

Bonex · 05-29-2011, 11:41 AM

Add the preprocess_regexps option:

Code:

  preprocess_regexps = [ (re.compile(r'</?a[^>]*>'),lambda match: ''),
                         (re.compile(r'<span[^>]*article-link-id.*?<br\s*\/?><br\s*\/?>'), lambda match: '')]

  keep_only_tags = [dict(name='div', attrs={'class':'article'})]

  remove_tags = [
   dict(name='p',attrs={'class':'meta links'}),
   dict(name='div',attrs={'class':'float-right'}),
   #dict(name='span',attrs={'class':'article-link-id'})
  ]

  feeds = [

The first one removes all <a> and </a> tags leaving the text inside, which I think is what you wanted to do with the preprocess_html function, the second ugly one removes all <span class="article-link-id">blabla</span> followed by two <br /> tags.
If you want a suggestion, you can add an extra_css option to tweak the final appearence of the article when displayed.

05-29-2011, 11:41 AM	#2
Bonex Connoisseur Posts: 63 Karma: 10 Join Date: Oct 2010 Device: KDXG, Kobo Glo, Kobo Aura HD	Add the preprocess_regexps option: Code: preprocess_regexps = [ (re.compile(r'</?a[^>]>'),lambda match: ''), (re.compile(r'<span[^>]article-link-id.?<br\s\/?><br\s*\/?>'), lambda match: '')] keep_only_tags = [dict(name='div', attrs={'class':'article'})] remove_tags = [ dict(name='p',attrs={'class':'meta links'}), dict(name='div',attrs={'class':'float-right'}), #dict(name='span',attrs={'class':'article-link-id'}) ] feeds = [ The first one removes all <a> and </a> tags leaving the text inside, which I think is what you wanted to do with the preprocess_html function, the second ugly one removes all <span class="article-link-id">blabla</span> followed by two <br /> tags. If you want a suggestion, you can add an extra_css option to tweak the final appearence of the article when displayed.