How to Clean/Strip HTML from epub file?
I want to clean up a book. It is an absolute mess. I know how to convert it to an epub file, open it in Sigil, and extract the text files into a directory. Now, I want to strip essentially all the HTML other than the paragraph code and then fix up the text without changing the paragraph structure. (Afterward, I will add chapter, header, and other codes as needed, then run it through Calibre to recreate the ebook.) What is the easiest way to get rid of everything except the paragraph structure?
Sigil is useless beyond extracting the text files from the epub file.
I had a problem with Word making every line a separate paragraph.
I use a Mac.
|