Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 10-30-2010, 04:26 AM   #1
bfollowell
Fanatic
bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.bfollowell ought to be getting tired of karma fortunes by now.
 
Posts: 541
Karma: 1152752
Join Date: Aug 2010
Location: Evansville, IN, USA
Device: Samsung Galaxy Tab 4 Nook & Samsung Galaxy Tab S 10.5
General scanning/OCR advice?

I am preparing to start my first major scanning/conversion process and am curious what tools most of you use.

From what I've seen and read, Finereader seems to be pretty much the standard for OCR work. Unfortunately, I can't afford $400 for an OCR tool no matter how awesome it is. It seems like I may have a very old version lying around though, possibly v5.

What file format do most of you find gives you the best results for OCR work? I'm sure tifs are great but they can take up a ton of space. jpgs are much smaller but I worry about artifacts causing bad results. I've heard pngs give fairly decent results at a decent size.

Are there any good OCR tools that you can just point to a directory of page scan images and let it work through everything automatically are do you tend to go therough page-by-page?

Finally, do you try scan in such a way that your OCR tool will recognize italics and other speacial formatting or do you pretty much try to capture dumb text and then add the special formatting later?

Thanks for any information or advice any of you may be able to offer.

Sincerely,
- Byron Followell
bfollowell is offline   Reply With Quote
Old 10-30-2010, 04:51 AM   #2
hernep
Enthusiast
hernep began at the beginning.
 
Posts: 30
Karma: 42
Join Date: Oct 2010
Location: Finland
Device: iRiver Story, iPad 2
I have used FreeOCR. It scans and do OCR. It shows result in its own window where you edit and save it. Free program. Only bad thing is that somehow I haven't got good results with quirky letters, like ä ö å.
But if you do english only, it works pretty good, for the price
http://www.paperfile.net/

Program does not save scanned files anywhere but do you really need them after OCR?
hernep is offline   Reply With Quote
Advert
Old 10-31-2010, 06:08 AM   #3
Iain
Enthusiast
Iain began at the beginning.
 
Posts: 49
Karma: 14
Join Date: Jul 2010
Location: Harrogate, England
Device: iPad
I've written a long post on my experience with scanning here.
Briefly, I used FineReader 10 which cost around 60 quid ($100), a guillotine ($200) and a Fujitsu fi6130 (£600) which was the largest cost.

You have to make a few decisions. Are you prepared to destroy your books (cutting the spines off)? This allows a vastly quicker process. How important are errors to you (if you hate typos output to PDF, otherwise ePub makes sense)? How do you value your time over your spending?

On output, if you pick PDF (or PDF/A) you will get a book out in 80MB (Tiff file 1GB) which is a good copy of the original. If you get the book into ePub format then it will be 1MB. I personally don't like PDF to read - I want to be able to set the font size and reflow the book.

Finally, even with FineReader 10, the quality varies from book to book. Mainly it is very good (character errors in the 1 in 10,000 range at a guess - formatting is less good). With some books though (probably font related) it makes more or less consistent errors little -> lidle perhaps). With decorative fonts especially in chapter headings, drop caps and initial paragraph text it can get things wrong more often.

Also, if you use the 'cut the spines off approach' you will get feed errors so you need to think about how to repair or re-process books which have stuck, missing, angled or torn pages.

Iain
Iain is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Recommendation for basic scanning software (non OCR) yunti Workshop 1 11-27-2009 07:08 AM
OCR help needed Nate the great Workshop 7 09-21-2009 11:21 PM
OCR to use pepak Workshop 17 05-26-2008 05:30 PM
Newbie, Mac-user, non-techie... General advice? Savonarola iRex 8 04-27-2008 11:26 AM
Do I need the cradle? Purchase advice in general fekg Sony Reader 13 05-25-2007 02:42 PM


All times are GMT -4. The time now is 04:09 AM.


MobileRead.com is a privately owned, operated and funded community.