View Single Post
Old 07-28-2008, 01:24 PM   #1
monojohnny
Junior Member
monojohnny began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jul 2008
Device: Kindle
Quick n' dirty Ruby Program: convert text files (Kindle - others?)

http://www.gutenberg.org of course is great for free books - which are readily viewable on a Kindle...however anybody who has tried this will have noticed the Kindle renders them in a rather awkward way, something like this:

//
The text seems to
wrap
at funny places when
reading books downloaded from Gutenberg.org
and it
makes for a not-too-pleasant reading
experience...
//

The following Ruby program seems to do a decent job of pre-converting the Gutenberg texts so they look semi-decent on a Kindle:

--- 'split.rb' CUT HERE ---
if ($_.size==2) then
printf("\n\n");
else
chomp!
printf("%s ",$_);
end
--- CUT HERE ---

Run like this:

ruby -an strip.rb war_of_the_worlds.txt > converted_war_of_the_worlds.txt

I hope this helps other people !

Cheers

John
----


Notes:

The program above ASSUMES that any line of exactly two characters is a blank line (CTRL-R+CTRL+M, no text) : so we want to break here - as a paragraph break - hence *double* newlines. Otherwise the 'chomp!' just removes any end-of-line chars - and lets the paragraph flow (essentially each paragraph is one-big-line - which is what the Kindle seems to like: normal text editors incidently DON'T like this much (unless you turn on word-wrap of course!).

I think I have worked out why: the Gutenberg texts (the ones I looked at, at least) seem to be pre-wordwrapped and terminated with a DOS-style ending : CTRL R/CTRL M.

The kindle will automatically wrap text, so there's no need to have it pre-wrapped: (in fact, because the font is not monospaced, it would be incredibly difficult to do this). When the kindle sees any 'end of line' (for instance the CTRL-R/CTRL-M above) it will honour that.

The result is the swewed text you see - with original wrapping preserved and the kindle applies it's own.

The Ruby program above (Perl programmers should be able to convert this quite easily), seems to do a decent job of pre-converting so that it cuts out all the CTRL-R/CTRL-Ms and puts in a double newline character to separate paragraphs.

Ruby Language is here: http://www.ruby-lang.org/en/

Last edited by monojohnny; 07-28-2008 at 01:45 PM. Reason: Correcting typo.
monojohnny is offline   Reply With Quote