View Single Post
Old 11-12-2007, 03:15 PM   #1
bob_ninja will become famous soon enoughbob_ninja will become famous soon enoughbob_ninja will become famous soon enoughbob_ninja will become famous soon enoughbob_ninja will become famous soon enoughbob_ninja will become famous soon enough
Posts: 208
Karma: 582
Join Date: Aug 2006
Device: Zire71
Text tool for formatting Gutenberg text files


What "Cleaning Up" Do Project Gutenberg Texts Need Part 2

Here is the download link for the

Txt4EBook tool

The tool is written in Java so you'll need the latest Java 6 software to be installed on your system. For downloads and more info go to

Sun's Java download page

The program file is already configured to run so long as the OS has Java system installed. In general that means you can start it either:

1) simply by double-clicking on the program file txt4ebook.jar icon in a GUI file manager

2) using the command in a console:

java -jar txt4ebook.jar

Either method should work for most machines. If you have problems then consult the Sun's help pages. Again, you need the latest VERSION 6 of Java!!!

I only created it the other week, so it still doesn't even have a version number. I'll try to incorporate more functionality based on your comments, but don't expect too much. It is only a side project for me, limited time.

Its primary goal is to simply process a text file and not change its formatting to another more advanced format like HTML. So my goals are very modest. The primary goal is to do whatever processing is necessary to prepare a Gutenberg text file for a reader device (including text 2 voice reading software). That means simple manipulations.

Still, I will include ability to add custom defined manipulations so that you can process ANY text file for ANY purpose (keep the processor more or less general purpose). However, defaults are preset for Gutenberg text based on my preferences. At some point I'll try to add other preferences and/or ability to load/save user preferences.

Anyway so much for now. This version simply formats a paragraph lines by removing extra line breaks. There is also optional paragraph indentation option. Next I'll add tab processing and custom regular expression filters (for removing things like Page XXX).

I hope you find it useful.

P.S.: I am using the latest Cybook reader, so default settings are geared for it.

Last edited by bob_ninja; 11-12-2007 at 03:18 PM.
bob_ninja is offline   Reply With Quote