my favourite tool for converting html books is a commandline program demoroniser.
http://www.fourmilab.ch/webtools/demoroniser/
it processes html sources and removes all the fancy characters that Microsoft tools insert into html code. Problem is all those fancy curly quotes, non-breaking spaces, optional hyppens, em-dashes, en-dashes, ... in some html files are non standard and display on my reader as very distracting two-exotic-character-combinations. Such files are practically unreadable.
I know that quite a lot of people here are very fond of their properly formated curly braces, em-dashes, em-dashes, and other typographical sugar, but when such things display on my reader as unreadable characters I resort to demoroniser or my own scripts written in vim.
By the way, I always use nvu generated html code as a textbook example how a really well written html code is supposed to look. I also use MSWord generated html code as an oposite example.