MobileRead Forums - View Single Post

scissors · 03-24-2012, 11:21 AM

Quote:

Originally Posted by JayKindle

I am fetching some news from a website. But it seems it has this HTML code in between each paragraph causing my Kindle to see a large gap between the paragraphs.

Here is the HTML code:

Code:

<p>&nbsp;</p>

How can I write the remove_tags code to avoid this HTML code?

maybe

preprocess_regexps = [
(re.compile(r'<p> </p>', re.IGNORECASE | re.DOTALL), lambda match: '')]

and just dump it?