|
|
#1 |
|
Enthusiast
![]() Posts: 48
Karma: 62
Join Date: Jan 2010
Device: HANLIN V3
|
Current state of OCR/scanner tech?
Looking around at the state of scanners etc, my halfbaked assessment is this: 1) automated book scanning requires very expensive industrial machines. 2) artisanal book scanning requires a lot of time and effort either using a standard or Opticbook (better for fragile old editions) style bed scanner or using a digital camera in some kind of offset stand (and possibly post processing 100s of images for contrast, ugh). 3) OCR software at present is either (a) very costly or (b) very cheesy. there doesn't seem to be any really good GPL OCRware. (why is that I wonder? we have all kinds of other GPL/CC software that's often better than the commercial flavour). either way, it's also time consuming to tend the OCR process and then fix the 5 to 20 percent error rate (depending on the cheesiness of the OCR). so... has anyone actually dared to compare -- is it faster to set up a copy-holder stand and (if you are a fast typist) just re-type the book content? it seems an arduous task yet I do wonder if it would take any more time than the lengthy, high-tech procedure of book scanning and OCR. have I got a grip on the basic situation or am I missing some recent and exciting development like a blazing new GPL OCR app? |
|
|
|
|
|
#2 |
|
Booklegger
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,681
Karma: 7050000
Join Date: Jun 2009
Location: Toronto, Ontario, Canada
Device: BeBook(1 & 2010), PEZ, PRS-505, Kobo BT, PRS-T1, Playbook
|
See my comments on the Scanner Recommendation thread. I would not call the Abbyy FineReader software cheesy, even if it doesn't run under Linux
![]() Sorry, I didn't figure out how to reference the thread by linky... |
|
|
|
|
Enthusiast
|
|
|
|
#3 | |
|
Enthusiast
![]() Posts: 48
Karma: 62
Join Date: Jan 2010
Device: HANLIN V3
|
no support for osx/linux? yep, cheesy :-)
Quote:
just an idiosyncratic bias.More seriously though... I do wonder why the free software community, which has produced GIMP and other very viable alternatives to ransomware for other applications, has not managed to produce good OCR. That seems worth a bit of research just as an interesting question in its own right. |
|
|
|
|
|
|
#4 |
|
Gentleman & Cynic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,832
Karma: 7205202
Join Date: Jan 2008
Location: 5 generation native Texan
Device: BeBook/Openinkpot, CYbook 3rd gen awaiting RTF software upgrade
|
Lack of interest in the Linux community, I suppose.
The OS preference are basically passe' ,in my perspective. I have a job to do, how effect is the answer, and can I afford it? A good scan setup for reflowable text will cost around $600 dollars. And that's to buy it for single purpose use. What do you get for the money? ACER REVO mini PC with windows OS - $200 Optiscan 3600 scanner - $300 AABBY Express 10.0 ( I run 9.0 on my setup) - $60 shipping. - $40 You don't have to do anything else with the Windows PC, treat it as a embedded machine. (It's 15 cm x 15 cm by 4 cm, i.e smaller that a typical hardback) You'll get a defect rate of 1 for every 4-5 pages for hardbacks, 1-2 per page for paperbacks, although it will vary on font type and size. The snap scan method is for PDF's and you lose reflowability with it. It causes problems with texts that don't fit the screen.
__________________
Another proud Tarnover graduate! Remember, no matter what they say, people don't want the truth, just their prejudices reinforced. -RSE |
|
|
|
|
|
#5 | |
|
la gauche
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,437
Karma: 10557981
Join Date: Oct 2007
Location: Sapporo, Japan
Device: OPUS/PB360,Nexus 7,GzONE, Kobo Mini
|
Quote:
Recommend Vuecsan: http://www.hamrick.com/ Works on Linux, Apple, or Windoze. 40USD. I have used it for several years.. |
|
|
|
|
|
|
#6 | |
|
Enthusiast
![]() Posts: 48
Karma: 62
Join Date: Jan 2010
Device: HANLIN V3
|
Quote:
Point well taken about the embedded-machine aspect (using a windoze laptop or palmtop as a controller, like the msdos machines that still run many CNC mills). I just don't have a few hundred bux to throw at the project; hoping to do it on the cheap with what I already have (Leopard, a laptop, a digicam, and coding skills). thanks for the thread! I hope some more folks will pop up and explain their own personal scanning/ocring setups. |
|
|
|
|
|
|
#7 | |||||
|
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 594
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-505
|
Quote:
Quote:
Quote:
Quote:
Quote:
|
|||||
|
|
|
|
|
#8 |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 393
Karma: 546196
Join Date: Mar 2009
Location: UK canal boat
Device: sony prs505, prs650 liseuses
|
I have done this for a couple of books, and decided that despite being a tolerably good touch-typist, the pain was too much. With respect to the time taken, I've found that it's the "editorial" processes required *after* acquiring a digital text that are time consuming. (layout, chapter management, images, front and end matter as well as proof-reading). I'll be experimenting with using my digital camera for the job in the not-too-distant future - I have some aged pbacks that *have* to be digitised before they disintegrate!
|
|
|
|
|
|
#9 |
|
Enthusiast
![]() Posts: 48
Karma: 62
Join Date: Jan 2010
Device: HANLIN V3
|
I built Tesseract with no difficulty -- it works fine on its included test images, which probably doesn't say much. The stable 2.04 version can only read TIFF, but the slightly unstable v3, with some additional libraries, can read jpegs and other formats. Looks like a pretty good OCR engine, and I think this is where I'd put my effort if I get serious about scanning my old books.
|
|
|
|
![]() |
| Thread Tools | Search this Thread |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Low budget scanner + OCR: Test and results | Madmanden | Workshop | 4 | 09-13-2010 01:37 AM |
| OCR to use | pepak | Workshop | 17 | 05-26-2008 05:30 PM |
| What is an OCR Cradle? | JackieFrost | Which one should I buy? | 4 | 05-21-2008 08:10 PM |