MobileRead Forums - View Single Post - Tool to easily clean and refurbish html-text before conversion

kacir · 10-13-2008, 06:11 AM

my favourite tool for converting html books is a commandline program demoroniser.
http://www.fourmilab.ch/webtools/demoroniser/
it processes html sources and removes all the fancy characters that Microsoft tools insert into html code. Problem is all those fancy curly quotes, non-breaking spaces, optional hyppens, em-dashes, en-dashes, ... in some html files are non standard and display on my reader as very distracting two-exotic-character-combinations. Such files are practically unreadable.

I know that quite a lot of people here are very fond of their properly formated curly braces, em-dashes, em-dashes, and other typographical sugar, but when such things display on my reader as unreadable characters I resort to demoroniser or my own scripts written in vim.

By the way, I always use nvu generated html code as a textbook example how a really well written html code is supposed to look. I also use MSWord generated html code as an oposite example.

10-13-2008, 06:11 AM	#2
kacir Wizard Posts: 3,463 Karma: 10684861 Join Date: May 2006 Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20	my favourite tool for converting html books is a commandline program demoroniser. http://www.fourmilab.ch/webtools/demoroniser/ it processes html sources and removes all the fancy characters that Microsoft tools insert into html code. Problem is all those fancy curly quotes, non-breaking spaces, optional hyppens, em-dashes, en-dashes, ... in some html files are non standard and display on my reader as very distracting two-exotic-character-combinations. Such files are practically unreadable. I know that quite a lot of people here are very fond of their properly formated curly braces, em-dashes, em-dashes, and other typographical sugar, but when such things display on my reader as unreadable characters I resort to demoroniser or my own scripts written in vim. By the way, I always use nvu generated html code as a textbook example how a really well written html code is supposed to look. I also use MSWord generated html code as an oposite example.