View Single Post
Old 10-30-2012, 04:45 PM   #14
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by mncowboy View Post
Toxaris is correct. If you make sure and assign styles to everything in the Word document, save it as a filtered web page. Then just delete everything listed between the style element in the head. This also assumes that you created a stylesheet that has all the styles you assigned in Word.
This will give you a good clean file.
Bob
Yes, I have to stick my $.02 in about this perception, also. Everyone acts like obtaining clean HTML from Word is the equivalent of holding off the Four Horsemen of the Apocalypse with a piece of wet spaghetti. But it's just not accurate. The trick--like anything else--is to start with a clean Word file. Everything in computers is GIGO. Does not matter what it is--fundamental data entry, databases, programs, ebooks...GIGO. The same is true with Word and HTML.

What I find frustrating is that so many writers clearly have zero idea how Word really works. What we get here is a mish-mosh of ad hoc styles, enough to make your head explode. Hell, two weeks ago I got a file that was typed typewriter-style--type, type type, ENTER-ENTER, type type type ENTER-ENTER (it was to be double-spaced, you see). Yeah, that would come out like trash.

The trick to a clean HTML file in Word is not to disdain working in Word for the relative ease of working in HTML. If you clean the styles first, using Word's built-in Styles management, the output can be as clean as a whistle, and we do it all day, every day. It's not undoable by any means, and, moreover, despite claims to the contrary, I have yet to see any output from any other word-processing program that is one iota cleaner.

Neither OO nor LO output better html and WordPerfect certainly does not. Ditto Atlantis. I admit I don't think I've tried to view Jutoh's output in HTML, but I don't think of that as a word-processor. I don't know what so-called "writing software" outputs, really; LSB XE outputs RTF, but I've had issues with getting that RTF to work in other programs (Word, etc.), which doesn't bode well. I've been told that Scrivener's output is groovy, but not by anyone (yet) who would really know. I like Simon's YWriter RTF output, but again, that's to be expected given his background. So, really...I don't get all the complaining about Word's allegedly bad HTML output. No big deal. Use styles correctly, export the HTML and nuke the internal stylesheet that gets crammed in there, and bob's-yer-uncle. You can also do it with regex fairly easily, but it's FAR FAR easier to clean the styles in Word first.

Just my $.02, FWIW.

Hitch
Hitch is offline   Reply With Quote