View Single Post
Old 01-23-2010, 01:25 PM   #1
RootlessAgrarian
Enthusiast
RootlessAgrarian is on a distinguished road
 
RootlessAgrarian's Avatar
 
Posts: 48
Karma: 62
Join Date: Jan 2010
Device: HANLIN V3
Current state of OCR/scanner tech?

I have a number of Very Old Books which I'd like to scan non-destructively (these are collectible editions, long OOP and OOC, which I'd like to preserve for my own records and to contrib to PG).

Looking around at the state of scanners etc, my halfbaked assessment is this:

1) automated book scanning requires very expensive industrial machines.

2) artisanal book scanning requires a lot of time and effort either
using a standard or Opticbook (better for fragile old editions) style bed scanner or
using a digital camera in some kind of offset stand (and possibly post processing 100s of images for contrast, ugh).

3) OCR software at present is either (a) very costly or (b) very cheesy. there doesn't seem to be any really good GPL OCRware. (why is that I wonder? we have all kinds of other GPL/CC software that's often better than the commercial flavour). either way, it's also time consuming to tend the OCR process and then fix the 5 to 20 percent error rate (depending on the cheesiness of the OCR).

so... has anyone actually dared to compare -- is it faster to set up a copy-holder stand and (if you are a fast typist) just re-type the book content? it seems an arduous task yet I do wonder if it would take any more time than the lengthy, high-tech procedure of book scanning and OCR.

have I got a grip on the basic situation or am I missing some recent and exciting development like a blazing new GPL OCR app?
RootlessAgrarian is offline   Reply With Quote