View Single Post
Old 09-01-2010, 04:10 AM   #55
Iain
Enthusiast
Iain began at the beginning.
 
Posts: 49
Karma: 14
Join Date: Jul 2010
Location: Harrogate, England
Device: iPad
OK. Lots of questions there. I'll try and get answers to all in.

Firstly, my books are mainly of the 'pulp fiction' variety so tend to be light on posh formatting. I'm also still tuning the whole process so there's the what is being done and what can be done.

For a paperback book the OCR process takes roughly the same time as the scanning process. Somewhere between 4 and 10 minutes. That is with the latest FineReader running on a quadcore machine, so I can see how it could get to be 30 mins on an older machine with an older version.

The system I've written makes the processing automatic so I can do it on another machine or even overnight.

The OCR does a good job of italic and bold changes. It should do well for margin changes (the information is there in the word doc), though I've not yet processed (or at least proofed) a book which uses this.

I think there are around half a dozen character misreads in the 300 page book I've just 'proofed' (though my disclaimers are about my proofing skills remains!).

The more complex stuff which happens before and after the book (with decorative fonts and mixed up with graphics) can be a mess, so I would imagine anything complex in the middle will also be a mess. I'll look at dealing with the messes as I come across them!

I actually deliberately discard headers and footers. If you want pages to reflow as font sizes change then they aren't helpful. Having said that you've just make me realise I can use them to enhance chapter detection.

I suspect that I've been lucky with the books I've proofed so far and I also suspect I have a higher level of tolerence for errors!

Thanks for the advice on the guillotine. That all sounds like a good deal of sense - I too have lightly touched the blade (I had to remove the guard to see what is going on) and found it astonishingly sharp! I wish my kitchen knives were that sharp.

I suppose I have it in mind that if there are serious problems in a book I can go back to the original and tweak the OCR. I've also thought about writing an editing eBook reader for the iPad to tweak the minor errors. However, I doubt I will ever have the time or energy to do this.

In a couple of weeks I'll have a much better idea of the quality and will keep you posted on what I discover!

Iain
Iain is offline   Reply With Quote