|10-30-2010, 04:26 AM||#1|
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Amazon Kindle 3 Wi-Fi & B&N Nook Tablet & B&N Nook HD+
General scanning/OCR advice?
I am preparing to start my first major scanning/conversion process and am curious what tools most of you use.
From what I've seen and read, Finereader seems to be pretty much the standard for OCR work. Unfortunately, I can't afford $400 for an OCR tool no matter how awesome it is. It seems like I may have a very old version lying around though, possibly v5.
What file format do most of you find gives you the best results for OCR work? I'm sure tifs are great but they can take up a ton of space. jpgs are much smaller but I worry about artifacts causing bad results. I've heard pngs give fairly decent results at a decent size.
Are there any good OCR tools that you can just point to a directory of page scan images and let it work through everything automatically are do you tend to go therough page-by-page?
Finally, do you try scan in such a way that your OCR tool will recognize italics and other speacial formatting or do you pretty much try to capture dumb text and then add the special formatting later?
Thanks for any information or advice any of you may be able to offer.
- Byron Followell
|10-30-2010, 04:51 AM||#2|
Join Date: Oct 2010
Device: iRiver Story, iPad 2
I have used FreeOCR. It scans and do OCR. It shows result in its own window where you edit and save it. Free program. Only bad thing is that somehow I haven't got good results with quirky letters, like ä ö å.
But if you do english only, it works pretty good, for the price
Program does not save scanned files anywhere but do you really need them after OCR?
|10-31-2010, 06:08 AM||#3|
Join Date: Jul 2010
Location: Harrogate, England
I've written a long post on my experience with scanning here.
Briefly, I used FineReader 10 which cost around 60 quid ($100), a guillotine ($200) and a Fujitsu fi6130 (£600) which was the largest cost.
You have to make a few decisions. Are you prepared to destroy your books (cutting the spines off)? This allows a vastly quicker process. How important are errors to you (if you hate typos output to PDF, otherwise ePub makes sense)? How do you value your time over your spending?
On output, if you pick PDF (or PDF/A) you will get a book out in 80MB (Tiff file 1GB) which is a good copy of the original. If you get the book into ePub format then it will be 1MB. I personally don't like PDF to read - I want to be able to set the font size and reflow the book.
Finally, even with FineReader 10, the quality varies from book to book. Mainly it is very good (character errors in the 1 in 10,000 range at a guess - formatting is less good). With some books though (probably font related) it makes more or less consistent errors little -> lidle perhaps). With decorative fonts especially in chapter headings, drop caps and initial paragraph text it can get things wrong more often.
Also, if you use the 'cut the spines off approach' you will get feed errors so you need to think about how to repair or re-process books which have stuck, missing, angled or torn pages.
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|Recommendation for basic scanning software (non OCR)||yunti||Workshop||1||11-27-2009 07:08 AM|
|OCR help needed||Nate the great||Workshop||7||09-21-2009 11:21 PM|
|OCR to use||pepak||Workshop||17||05-26-2008 05:30 PM|
|Newbie, Mac-user, non-techie... General advice?||Savonarola||iRex||8||04-27-2008 11:26 AM|
|Do I need the cradle? Purchase advice in general||fekg||Sony Reader||13||05-25-2007 02:42 PM|