Quote:
Originally Posted by DaleDe
Good tips, applicable to many formats these days. How are you getting your CHM to work on you Sony?
|
Essentially, it's a decompress of the CHM (via "Tubby" for the Mac), cleaning, and then html2epub specifying --breadth-first on the toc.html file.
The cleaning is the difficult part... sometimes
much more labourious than it's worth. Some books are dead simple, but with books that have a lot of <pre> code, tons of tables, etc... these can be a real bear.
Honestly, my best tool for this (as with most things I do) is VIM (
http://www.vim.org). I'm a seasoned VIM user and can make use of its vast amount of power for editing text, but for the average user to get a hold of and make good use of... it's tough. I've been using it for over 10 years, and I still have a lot to learn.
One of the most powerful features I use is the macro record and the ex commands. Most of the files "look" the same (i.e. have the same structure), so removing the big tables, getting rid of the annoying "next" and "previous" buttons, cleaning up the cruft, this is all something you can do "once" and then repeat automatically for all files. Standard search and replace across files is also very easy (like changing the unicode 0x97 character to a '-'), getting rid of the '\r' characters, which screw up the <pre> tags, etc... all of this stuff is quite easy.
I actually have been reversing the <pre> tag output to white on black instead of black on white and it looks a bit better.
Fixing up <pre> tag data can be simpler if you set VIM up to treat the HTML as Java, or C++ or whatever the <pre> contains and then tell it to reformat the <pre> section automatically - again, with a pre-recorded macro.
It's a bit of black art, voodoo kinda stuff that I haven't managed to script in any general way... I started by trying to write up some Ruby to do it, but the human element is hard to get rid of (i.e. does this table have a header that isn't declared as <thead>? where's the "right" spot to break this line of C++ code? Is this table superfluous or can it stay? etc...)