Originally Posted by Leverpullr
Been a lurker for a while now, but after working on several ebook projects using Sigil I have a question for other regarding best practices / tools for getting an ebook manuscript into shape BEFORE importing into Sigil.
I know it isn't technically a Sigil question, but after struggling with _horrible_ html output from MSWORD 97 (really really bad), and WORD2003 (better, but so so ugly and bloated..) I figured there were better workflow options and tools that would help me avoid fixing hundreds of EPUB validation issues with every book.
My key questions are:
1) What word processors can export to html that is nmore EPUB/xhtml clean.
2) What is your workflow like: i.e. original manuscript in MS WORD (as most start there..) to application X to do Y -->Then use __ to ___ --> import to Sigil --> Save to .epub.
I find most of the "anti-Word" hysteria to be hyperbolic. Yes, it produces messy HTML, but so does OO, no matter what the evangelists for it say, ditto LO. WordPerfect's output is just as bad. Scrivener MOBI's, I think I read, are currently being rejected (which makes me wonder if they forked Calibre?), as are Calibre's, by and large. I was underwhelmed with Atlantis' output; I think Jutoh works pretty well for DIY'ers.
The bottom line for me is that you need to know regex. At least a little. At the end of the day, I haven't found a single magic bullet that will automagically clean up Word or any other word-processing output. We very simply clean up the Word file if necessary (we get a lotta, lotta
crappy files--I mean, really awful
), but mostly we clean the files in HTML, and the tool of choice here is NoteTab Pro. Not NotePad, NoteTab. We extract the HTML and then run a variety of standardized clips to clean it; then we clean up any residual oddities.
So, our process is:
Word (or other input source)-->HTML-->NoteTabPro-->Sigil.
From the ePUBs, we have custom PERL and, again NTP clips that we use to create an inline TOC from the ncx, as well as make some other mods (usually the guide), and then drop it on Kindle Previewer/Kindlegen for MOBI versions.
That's it. So, our "magic bullet" is simply to work in a super HTML editor. We do the finalization in Sigil, and any post-production copyedits there as well. That's it.