View Single Post
Old 01-22-2017, 05:26 PM   #171
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Teom@n View Post
could you share the script/commands that you were using for converting html to text? I have a html file(124mb) which I took it from a mobi file. I installed the python and beautifulsoap.
Unfortunately, I can't help you with that, because scripts and commands will vary depending on the exact input and output formats. However, BS4 is well documented. For example:

Code:
soup.get_text()
will strip all tags from an HTML file.

If you're not a Python programmer, you could also use a text editor with regular expressions support, e.g. Notepad++, to remove unwanted tags or convert them to a different format.
Doitsu is offline   Reply With Quote