View Single Post
Old 08-10-2012, 11:48 AM   #18
Dillinquent
eBook pro
Dillinquent shares his or her toysDillinquent shares his or her toysDillinquent shares his or her toysDillinquent shares his or her toysDillinquent shares his or her toysDillinquent shares his or her toysDillinquent shares his or her toysDillinquent shares his or her toysDillinquent shares his or her toysDillinquent shares his or her toysDillinquent shares his or her toys
 
Dillinquent's Avatar
 
Posts: 71
Karma: 5634
Join Date: Jan 2011
Location: Hertford, UK
Device: PC, iPad, Kindle, Kindle Fire, Galaxy Ace
Pre-Sigil Workflow

I used to use Word(2010) to clean up the styles and then save as filtered html (I don't know why MS call it filtered as the output is invariably full of junk CSS and every font on my system is embedded as panose definitions. - WTF?).
I then used DreamWeaver's 'Clean up Word HTML' (which has stagnated since Macromedia were Borged by Adobe) function to get rid of most of the cruft and converted the the file to XHTML 1.1 to make the resultant code valid. Usually I would then have to use DW's excellent search & replace to clear up any anomalies before finally importing into Sigil.

Nowadays I usually use Toxaris' rather good Transform_HTML macro, or LibreOffice with Writer2ePub and cut out the DW stage entirely.

Until 1999 MS used to have a tool called MSFilter which was very good at clearing all the crap out of Word HTML files and returned very clean HTML4, even from 'unfiltered' exports.
For some reason it is no longer available from MS but it is so useful that I have kept it on my HDD ever since and have attached it to this post.
Somewhat surprisingly it still works on Windows7, just follow the instructions in the .txt file. [please don't attach copyrighted files - moderator]

Last edited by dreams; 08-13-2012 at 02:14 PM.
Dillinquent is offline   Reply With Quote