Quote:
Originally Posted by Fabe
PatNY - I use Word but never its HTML. After I do the editing and add the tags I want in Word, clear all the formatting, select all, copy, and paste it into a simple text editor where I save the file as Unicode (UTF-8) with an HTML extension. I then open this file in Sigil and finish my ePub work there.
Does this idea help? - Fabe
|
I do exactly the same thing. The attached file will unzip into a Word template file that contains several macros. Use this for the document you want to convert to HTML, then run the macro called Word2HTML. It will clean up the double-paragraph markers and end-of-line paragraph markers you commonly get in text documents, mark word heading1 - heading5 with <h1> - <h5>, replace special characters with escape codes, double-hyphens with em dashes, and more. (If you don't want to do all of the operations--see CAUTION, below--you can run the other individual macros one at a time, if you prefer.)
Now save the document as a text file, add the proper <html>, <body>, etc. tags at the top and bottom, and you'll have something fit for a clean import into Sigil. Hope this helps.
CAUTION: This is quite useful, but not perfect and so is provided as-is, no warranty, use at your own risk, and the other usual disclaimers. It assumes you have a double-paragraph mark between paragraphs, as is common for Gutenberg and other text files. If you actually have just a single paragraph marker at the end of each paragraph, it'll turn the whole document into one huge paragraph. It also clears "unnecessary" white space, so if you have a table or tabs/spaces at the start of a paragraph, or other such formatting, you'll lose it. This is basically intended for documents that are paragraphs of text with chapter headings.