View Single Post
Old 04-12-2007, 07:53 AM   #10
Fitzwaryn
Connoisseur
Fitzwaryn has a complete set of Star Wars action figures.Fitzwaryn has a complete set of Star Wars action figures.Fitzwaryn has a complete set of Star Wars action figures.Fitzwaryn has a complete set of Star Wars action figures.
 
Posts: 54
Karma: 395
Join Date: Jul 2006
Use a Hex Editor.

If you can get the book dumped out in txt format then open it with a hex editor. There are plenty of free ones out on the net.

Look for the hard returns in the text in the text window and find the matching hex chars. They are usually 0D 0A. You can also scan for double returns which often are paragraph breaks. Different word processors will produce different patterns. Personally I prefer a double return between paragraphs but that's just my personal preference.

The most common case I encounter is lines each broken by a 0D 0A and paragraphs broken by 0D 0A 0D 0A (double returns).

In that case I do a global replace of the 0D 0A 0D 0A with a non text string like FF EE FF EE then do a global replace of 0D 0A with 20 (space). Then do a replace of 20 20 with 20 and repeat that global replace until there aren't any more. That takes care of spaces just before or after the hard returns you got rid of.

Once that's done I just replace FF EE FF EE with 0D 0A 0D 0A and I have the paragraph breaks back.

I do other clean up while in the editor to get rid of indentations, odd characters that might be unicode which displays as question marks and any other odd little irritants..

Get comfortable with using a hex editor and you can do all sorts of fast and nice cleanups in txt files. Once that's done, you save the file back out then open it with Word, set the font size you want, enter for Info information and save it as an RTF. It's then ready to load onto the Reader

I do that so often that sometimes I'll start reading a book, notice some irritating little anomaly that I missed before and re-edit the txt file with the hex editor to correct it then reformat it as RTF again and reload it.

Once you get comfortable using a Hex Editor you can usually clean up a txt file in 20-30 seconds.

Even some really badly formatted files can be cleaned up globally with
Fitzwaryn is offline   Reply With Quote