Ok, I tried three... no four (Spanish Inquisition!) different possibilities. Here are the results.
1st: English TXT from Project Gutenberg, direct to .wol: OK
It did a good job, but then again, this is only TXT. The text outputs non-stop, with no separation between paragraphs.
No increase in file size.
2nd: English HTML from Project Gutenberg, direct to .wol: BAD
Output was 100% like TXT. No respect for any tag whatsoever. Italics lost, list lost (TD/TR), paragraph justifying lost. It looks like it just stripped away everything non-text from HTML.
No increase in file size.
3rd: Spanish DOC, printed with Hanlin printer: GOOD
Non-English characters (á, ñ, ¿) were respected, as well as all formatting (italics, bold, underline, centered, different font size). Spaces between paragraph was respected, too.
I had to manually reduce all margins to 0, and manually increase font size, otherwise text is too small to read and margins take too much space.
"Printed" book closely resembles original, as it is just like an image scan.
Doc file: 32 Kb, WOL file: 319 Kb (yes, tenfold increase).
4th: Spanish TXT down-converted from DOC, direct to .wol: BAD
Interestingly enough, space between paragraphs was respected when converting from DOC/RTF. However, non-English characters became corrupted. I changed coding from Windows to various types of Unicode, but they kept disappearing.
So, for English books I would go for method #4 (although I really don't like to live without italics), while for non-English books I would have to stick to #3. It needs more work done, specially some real HTML filter.
|