View Single Post
Old 08-31-2010, 06:36 AM   #53
Iain
Enthusiast
Iain began at the beginning.
 
Posts: 49
Karma: 14
Join Date: Jul 2010
Location: Harrogate, England
Device: iPad
Flaws and time

Quote:
Originally Posted by Lady Fitzgerald View Post
You do not edit after OCR?

On average, how much time did you spend on each book.

I do not edit after OCR. It's still early days and I'm refining the Word->ePub transformation. Also, it takes a good deal longer to READ the book than the whole rest of the process.

I'll report when I've read a dozen or so books, but so far I seem to have almost no character mis-recognitions. I'm talking of a handful in a book.

The other flaws I'm encountering may be artefacts of my word->ePub translation or of the OCR. I'm not sure which, yet. I'm expecting to be able to fix many of these either by fixing my code ( ) or by applying a bit of intelligence to the process.

So far (and this is NOT statistically reliable), I'm seeing a missing space about every 4 pages, a space added after a correctly- hyphenated (sic!) term about as often and a line break in a paragraph every 10 pages or so (I think I know what's causing this and *may* be able to fix it).

Actually, I'm delighted with the quality, though as I mentioned in my post I'm not the best person to proofread things.

As far as time is concerned, I've been doing some Hammond Innes this morning. It took me about 13 minutes to trim a dozen books. They are almost consistently sized and quite thin (280 pages or so) so they are about the easiest of all books to slice.

I've scanned about two whilst I've been writing this. One of my main objectives is to be able to scan whilst I work. If there are no issues with the scan, then it takes probably a minute of my time for a book this size to scan (bar code) the ISBN, enter the pages (and subject) and feed the hopper.

Issues (I seem to be fumble fingered this morning! - I've been putting the covers in the wrong way round) add some minutes.

I bought a Thomas Hardy (for 5 pence!) at a car boot sale yesterday and plan to scan that and compare it to a gutenberg version to get a more formal comparison. At some point!

Hope this is interesting...
Iain is offline   Reply With Quote