MobileRead Forums - View Single Post

Perkin · 04-07-2014, 03:35 PM

I suppose you could...
In the strip_span_for_page() add the line

Code:

html_text = re.sub(r'<([^>]+)></\1>', '', html_text)

OR

Code:

html_text = re.sub(r'(<(.*)[^>]+)></\2>', r'\1/>', html_text)

before the line

Code:

            entities = re.split(r'(<.+?>)', html_text)

The first will strip them completely, the second would turn them into self-closing tags, which you could then catch later, with your 'if equals...'

I'm trying to think if there's any tags which this would strip, that you shouldn't strip.
Are there any?

04-07-2014, 03:35 PM	#634
Perkin Guru Posts: 657 Karma: 64171 Join Date: Sep 2010 Location: Kent, England, Sol 3, ZZ9 plural Z Alpha Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)	I suppose you could... In the strip_span_for_page() add the line Code: html_text = re.sub(r'<([^>]+)></\1>', '', html_text) OR Code: html_text = re.sub(r'(<(.*)[^>]+)></\2>', r'\1/>', html_text) before the line Code: entities = re.split(r'(<.+?>)', html_text) The first will strip them completely, the second would turn them into self-closing tags, which you could then catch later, with your 'if equals...' I'm trying to think if there's any tags which this would strip, that you shouldn't strip. Are there any?