MobileRead Forums - View Single Post

Greg Anos · 01-24-2010, 06:24 PM

Quote:

Originally Posted by Skydog

Indeed, but aren't most word processors, particularly Word, known for terrible HTML output?

First, let us define terrible. Seriously. I define bad two ways, One, it doesn't function. Two, other programs I need to read the HTML won't read it.

I don't insist on pretty, tight HTML, just does it work, and can it be converted to Epub and Mobi, and be read in FBReader from a zipped file. (I use OpenInkPot on my Hanlin readers....

Now this is a very loose standard, inasmuch as I will go in and add my own bits of HTML to the output. (Usually images and internal hyperlinks.)

Now by this standard, Word 97 works somewhat (won't convert into anything in Calibre, but will convert to epub in Sigil and the Epub can be back converted into mobi by Calibre), Word 2000 works well, but produces XML (XHTML) which is needlessly wordy for e-book use compared to HTML, Atlantis doesn't work (can't handle RTF conversion properly and sets some lines of text to zero height.), and Open Office messes up the RTF formatting while editing it, which leads to bad HTML formatting output.

So currently I use scanner to RTF, spell-check in Open Office despite it's problems, because it highlights hypenated words nicely, clean up the format mangles in Wordpad, load into Word 2000 and save as web page, and load into Calibre, which zips up the XML (and converts to Epub and Mobi, if required.)

Oh yes, I also use Hexedit 3.0 to clean up RTF control language at byte level, as needed.

Anyway you look at it, lots of work...