Quote:
Originally Posted by JayKindle
I am fetching some news from a website. But it seems it has this HTML code in between each paragraph causing my Kindle to see a large gap between the paragraphs.
Here is the HTML code:
How can I write the remove_tags code to avoid this HTML code?
|
maybe
preprocess_regexps = [
(re.compile(r'<p> </p>', re.IGNORECASE | re.DOTALL), lambda match: '')]
and just dump it?