View Single Post
Old 12-11-2012, 10:51 PM   #1
Jimbo724
Connoisseur
Jimbo724 began at the beginning.
 
Posts: 60
Karma: 10
Join Date: Jun 2012
Device: Kindle Touch
How to Clean/Strip HTML from epub file?

I want to clean up a book. It is an absolute mess. I know how to convert it to an epub file, open it in Sigil, and extract the text files into a directory. Now, I want to strip essentially all the HTML other than the paragraph code and then fix up the text without changing the paragraph structure. (Afterward, I will add chapter, header, and other codes as needed, then run it through Calibre to recreate the ebook.) What is the easiest way to get rid of everything except the paragraph structure?

Sigil is useless beyond extracting the text files from the epub file.

I had a problem with Word making every line a separate paragraph.

I use a Mac.
Jimbo724 is offline   Reply With Quote