View Single Post
Old 04-02-2009, 12:44 AM   #17
brewt
Boo-Frickety-Hoo-Erizer
brewt will become famous soon enoughbrewt will become famous soon enoughbrewt will become famous soon enoughbrewt will become famous soon enoughbrewt will become famous soon enoughbrewt will become famous soon enough
 
brewt's Avatar
 
Posts: 251
Karma: 686
Join Date: Oct 2007
Device: Kobo Glo HD!
Did someone say Word?

(Bahoo-hoohoo-haha-haha).

"Clean" html from Word isn't all that possible. Now, this isn't to say one can't use word to produce "viable" files that can (and do) convert well into ebook formats. But "Clean"? Noo, not in my observations.

Personally, I got over being clean. I am most of the time happy to let Word mangle the styles it wants to embed as "css" into the html file all it wants. There's just waaay to much other usefulness in Word to overcome my fear of evil.

Saving the file as [Web Page, Filtered] goes a long way of extracting the extra junk word generically implants - that's all there and well and fine if you need to reconstruct an actual Word Document with all of Word's formatting tricks intact from the html file. Which isn't the usual goal here - MobiCreator, if you import a real Word Document, converts it to a filtered html file before it converts it to a mobi file.

If you just have to use CSS in Word, remember, it's css 1 only, and there are weirdnesses in css you can't construct using Word Properly (see my thread in the epub forum about using css in Word to make a drop cap work in an epub - it doesn't look like a drop cap in Word, but it works out ok in the epub). And Word is all too anxious to over-impose changes into the embedded overlay of the html file - just TRY to redefine "Normal" in css without forcing it into normal.dot and see what happens in your html file.

Unless you intend to hand-re-edit the htm file after you've made it in Word, what what do you care if it's "clean"? Is it oh-so-much smaller? Is it really worth your time? Wouldn't better metatags be more useful in the long run when the formats change (again)? Or more care toward managing your stylesets and where to use them?

pennypenny.

-bjc

p.s. be sure to check in on the word document properties - you might be surprised that Word could be embedding your work computer name, company name, logon name, things you really maybe don't want in the html meta-info. If, you know, you use work machines to do any of this.
brewt is offline   Reply With Quote