View Single Post
Old 07-31-2007, 08:51 PM   #4
mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.
mogui's Avatar
Posts: 503
Karma: 1335
Join Date: Dec 2006
Location: The Philippines
Device: HTC G1 Android FBReader
Hi Peter.

The short answer is DO IT! The long answer follows:

I am using Acrobat 7 on my PC to read PDFs. Under the "File" menu there is an option to "Save as Text". Having done so, I now have a *.txt file I can load into my Sony Reader. The Reader will give you a choice of three font sizes to display. The txt file is my favorite file format because the font sizes come out very readable.

Sometimes saving from a PDF file can give you fixed line lengths. If this happens you can a/ ignore it. b/ reformat the paragraphs using a programmer's editor. I pulled the following from an earlier post to show how this is done. There is a link to a utility as well, so it is not all that difficult. BTW, when I wrote this I was reformatting text to fit an MP4 player
I converted the Palm TX manual from pdf to txt to use here as an example. It is freely downloadable. I saved it as text using Acrobat. I brought it up in PSPad, an excellent freeware programmer's editor. Then I was able to view the txt file in hex mode. The first little bit looks like this:

"User Guide 0D0A 0D0A 0D0A 0C 0D0A Copyright and"

The ASCII codes are:
0D0A = carriage return, line feed (CRLF).
0C = form feed (FF), or page break.

The CRLF is what you are calling a paragraph mark. It commonly shows in most editors as a paragraph symbol. This symbol is not a part of the ASCII character set. The form feed is a page break. Often the FF is close to a number or a repeated string. This is a good clue for the identification of text you might wish to remove.

My problem was to reformat the text to fit a 24 character line and reflow the text at word breaks. To do this I followed the following steps:
1/ Replace all CRLF sequences with <*>CRLF.
2/ Select all the text in my editor (PFE in this case) and reflow it.
3/ Replace all <*>CRLF sequences with CRLF.

Some good free programmer's editors are : PSPad, ConTEXT and PFE32.

Now the text has the proper paragraph formatting and text breaks occur between words. In other words, it is readable. Now if I wish, I can replace CRLFCRLF sequences with CRLF to eliminate extra line spacing. The form feed "0C" character can be replaced by a space or a CRLF sequence -- your choice. I usually replace tabs with 2 spaces and later crunch spaces down by repeatedly replacing SPSP with SP.

There is a small free conversion utility called Storymaster ASCII text reformatting tool that will do the above operations quite satisfactorily. The site will link you to a free download.
Reading txt files on the Reader is quite satisfying. If you wish, you can go through yet another conversion to RTF or LRF and play with font choices. I hope you enjoy your reader.

Last edited by mogui; 07-31-2007 at 09:05 PM.
mogui is offline   Reply With Quote