a.k.a.
What "Cleaning Up" Do Project Gutenberg Texts Need Part 2
Here is the download link for the
Txt4EBook tool
The tool is written in Java so you'll need the latest Java 6 software to be installed on your system. For downloads and more info go to
Sun's Java download page
The program file is already configured to run so long as the OS has Java system installed. In general that means you can start it either:
1) simply by double-clicking on the program file
txt4ebook.jar icon in a GUI file manager
2) using the command in a console:
java -jar txt4ebook.jar
Either method should work for most machines. If you have problems then consult the Sun's help pages. Again, you need the latest
VERSION 6 of Java!!!
I only created it the other week, so it still doesn't even have a version number. I'll try to incorporate more functionality based on your comments, but don't expect too much. It is only a side project for me, limited time.
Its primary goal is to simply process a text file and not change its formatting to another more advanced format like HTML. So my goals are very modest. The primary goal is to do whatever processing is necessary to prepare a Gutenberg text file for a reader device (including text 2 voice reading software). That means simple manipulations.
Still, I will include ability to add custom defined manipulations so that you can process ANY text file for ANY purpose (keep the processor more or less general purpose). However, defaults are preset for Gutenberg text based on my preferences. At some point I'll try to add other preferences and/or ability to load/save user preferences.
Anyway so much for now. This version simply formats a paragraph lines by removing extra line breaks. There is also optional paragraph indentation option. Next I'll add tab processing and custom regular expression filters (for removing things like Page XXX).
I hope you find it useful.
P.S.: I am using the latest Cybook reader, so default settings are geared for it.