View Single Post
Old 12-22-2007, 09:30 AM   #1
dstampe
dstampe
dstampe began at the beginning.
 
Posts: 50
Karma: 17
Join Date: Jan 2007
Location: Canada
Device: Sony PRS-500
Question Cleaning books--Book Designer or other?

I have a number of rather battered e-book files that I need to massage into something useable. The biggest problem with these is that the paragraphs have been broken up into individual lines, in a way that is not easily recoverable into flowable unitary paragraphs.

For many cases, I've managed to create rather convoluted sequences of search and replace sequences in Word to join the paragraphs back together, if the original paragraphs were consistently flagged in some way (blank line after or indented with spaces/tabs. However, some documents don't even have these. Plus, the process is rather interactive and time consuming.

Also, there is the issue of all the chapter headings being wiped out, then having to be searched for and searated from the rest of the text. Sometimes search and replace can be used for this, but this is variable depending on each book.

Are there any tools that could automate or at least streamline this process?

I've considered Boook Designer for this, as a few experiments have shown that it can sometime recreate the original paragraphs in an acceptable way (breaking dialog, etc). But I'm not sure about whether it can do what I want or will just create a new mess to clean up. I've attempted to use it a number of times, but have always been stymied by one or more issues. Maybe someone can clear these up for me. Maybe some of these are obvious, but the sketchy help files, busy interface and tiny text means that I have trouble seeing some things with ny poor vision.

My ideal goal would be to use BookDesigner as a single tool to extract the text from PDF, PDB, LIT, and text files without going through the Word conversion and preclean stage. I would prefere RTF output, as this is the format I read in. It would also be ideal if original styles from DOC, RTF, and PDF files was left intact (italics seem most important). I am not interested in producing LRF eBooks because of the lack of left-justification which is needed for using large fonts properly. I am also not interested in pretty formatting--page breaks before chapter headings would be nice if BD does not have the flexibility to add 4 blank lines before (which I gather it doesn't).

Here are some of the issues that are keeping this from happening:

- I would prefer to save output in RTF format, but this always has hyphenation and indentation garbage added. I've enabled advanced RTF output, but this doesn't seem to do anything, and no options dialog comes up either from "Save As" or the "Make eBooks" route.

- Identifying chapters by keywords seems fine for "Chapter" and "CHAPTER" keywords, but what about numbered and Roman numeral chapter labels? Does someone have a list of chapter keywords that will work for most cases?

- How well does the "reformat completely" option really work at recombining paragraphs? I read that some users prefer to use Word to recombine Project Gutenberg books before using BD--this would seem to imply it doesn't work too well.

- Does BD preserve italics originally present in PDF and DOC/RTF files?

- Is there an easy way to strip headers/footers from pages during import? I have some text, PDB, and PDF files where it looks like someone has taken a perfectly good text document with flowed paragraphs, and paginated it and addded the headers and footers which then have to be re-stripped before the book can be adapted to a new reader or font size.


Anyway, all help would be appreciated. I'd like to get my tool set into better order before beginning another round of book cleanup, and a paragraph joining tool and chapter finding tool are the most needed items.
dstampe is offline   Reply With Quote