Quote:
Originally Posted by DaleDe
Well to get back to the original theme I just finished reading a gutenberg book that was actually in fairly good shape. But even so it had some annoying problems still in it after I have gone through and beautified it once.
These included: punctuation without spaces. two sentences run together with a period and no spaces after the period. Spelling checkers are a great tool to find problems in scanned books but some of them won't find this since they have been taught (programmed) to ignore words of this kind since they might be filenames.
The second problem was paragraph splits where they didn't belong. The sentence was not over and the new paragraph started with a small letter. It should not have been a paragraph split.
Hopefully a program could detect this sort of thing.
Dale
|
Dale, GutenMark will take care of a lot of these types of problems with PG texts. Some of them can be fixed with a decent text editor with search/replace capability (regular expressions would work even better for some issues). No matter which software you use, a human being will still have to proof the result, if you want it perfect.