View Single Post
Old 10-13-2008, 06:11 AM   #2
kacir
Wizard
kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.
 
kacir's Avatar
 
Posts: 3,463
Karma: 10684861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
my favourite tool for converting html books is a commandline program demoroniser.
http://www.fourmilab.ch/webtools/demoroniser/
it processes html sources and removes all the fancy characters that Microsoft tools insert into html code. Problem is all those fancy curly quotes, non-breaking spaces, optional hyppens, em-dashes, en-dashes, ... in some html files are non standard and display on my reader as very distracting two-exotic-character-combinations. Such files are practically unreadable.

I know that quite a lot of people here are very fond of their properly formated curly braces, em-dashes, em-dashes, and other typographical sugar, but when such things display on my reader as unreadable characters I resort to demoroniser or my own scripts written in vim.

By the way, I always use nvu generated html code as a textbook example how a really well written html code is supposed to look. I also use MSWord generated html code as an oposite example.
kacir is offline   Reply With Quote