View Single Post
Old 10-04-2010, 09:13 AM   #6
DTM
Intentionally Left Blank
DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.DTM ought to be getting tired of karma fortunes by now.
 
DTM's Avatar
 
Posts: 172
Karma: 300106
Join Date: Feb 2006
Location: Royal Oak, MI, USA
Device: Nook STR
Quote:
Originally Posted by Fabe View Post
PatNY - I use Word but never its HTML. After I do the editing and add the tags I want in Word, clear all the formatting, select all, copy, and paste it into a simple text editor where I save the file as Unicode (UTF-8) with an HTML extension. I then open this file in Sigil and finish my ePub work there.

Does this idea help? - Fabe
I do exactly the same thing. The attached file will unzip into a Word template file that contains several macros. Use this for the document you want to convert to HTML, then run the macro called Word2HTML. It will clean up the double-paragraph markers and end-of-line paragraph markers you commonly get in text documents, mark word heading1 - heading5 with <h1> - <h5>, replace special characters with escape codes, double-hyphens with em dashes, and more. (If you don't want to do all of the operations--see CAUTION, below--you can run the other individual macros one at a time, if you prefer.)

Now save the document as a text file, add the proper <html>, <body>, etc. tags at the top and bottom, and you'll have something fit for a clean import into Sigil. Hope this helps.

CAUTION: This is quite useful, but not perfect and so is provided as-is, no warranty, use at your own risk, and the other usual disclaimers. It assumes you have a double-paragraph mark between paragraphs, as is common for Gutenberg and other text files. If you actually have just a single paragraph marker at the end of each paragraph, it'll turn the whole document into one huge paragraph. It also clears "unnecessary" white space, so if you have a table or tabs/spaces at the start of a paragraph, or other such formatting, you'll lose it. This is basically intended for documents that are paragraphs of text with chapter headings.
Attached Files
File Type: zip Word2HTML.zip (34.8 KB, 394 views)
DTM is offline   Reply With Quote