![]() |
#1 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 25
Karma: 412584
Join Date: Feb 2014
Device: IPAD, KF8 & Tablet
|
OCR engine
Hi...,
Can any one suggest OCR engine which can give good text accuracy. |
![]() |
![]() |
![]() |
#2 |
eBook Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Abbyy FineReader is an excellent OCR package.
|
![]() |
![]() |
![]() |
#3 |
Media Bloke
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,382
Karma: 113956855
Join Date: Sep 2010
Location: NSW - Australia
Device: iOS
|
I use Acrobat X1 Pro. You can download a trial of all Adobe software for 30 days free or rent all 55 programs for 50 bucks a month.
|
![]() |
![]() |
![]() |
#4 |
Nameless Being
|
|
![]() |
![]() |
#5 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 815
Karma: 1029784
Join Date: May 2008
Location: Nebraska, USA
Device: PEZ, Color Libre, 2@Sony T1, Onyx i62HD
|
Quote:
AJ |
|
![]() |
![]() |
![]() |
#6 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,054
Karma: 18821071
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
|
Maybe you're just guesstimating the accuracy, but 95% is not good. 95% for characters is terrible, and 95% for words is marginally acceptable. A typical printed page has something like 50 characters per line and 40 lines per page, so about 2000 characters per page. A 95% success rate per character would result in about 100 bad characters per page. A 95% success rate per word would bring that down to about 20 or 25 bad words per page. Even 99% accuracy produces more errors than most people like. You'd have to get to about 99.9% accuracy before you could think about not proofing the text afterwards.
|
![]() |
![]() |
![]() |
#7 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,251
Karma: 3720310
Join Date: Jan 2009
Location: USA
Device: Kindle, iPad (not used much for reading)
|
I would never NOT proof an OCRed document.
|
![]() |
![]() |
![]() |
#8 |
Banned
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 488
Karma: 1080260
Join Date: Sep 2012
Device: sony prs t1 kindle dx ipad
|
I would never proofread an ocr-ed document because I would either use exact pdf image (ocr layer in the background) in Abbyy Finereader or clearscan in Acrobat for documents that need 100% exactness, or would use plain ocr-ed txt from Abbyy Finereader for novels and other documents that allow for a few mistakes here and there.
Last edited by markom; 03-20-2014 at 08:44 PM. |
![]() |
![]() |
![]() |
#9 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 815
Karma: 1029784
Join Date: May 2008
Location: Nebraska, USA
Device: PEZ, Color Libre, 2@Sony T1, Onyx i62HD
|
Quote:
I often got "1" instead of "I" or "l"; "m" instead of "r n" ; odd Hard Returns on the last line of a paragraph instead of Softreturns. So for an entire novel, 95% or better is more than acceptable to me. (My epubs come out great!) |
|
![]() |
![]() |
![]() |
#10 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 897
Karma: 950683
Join Date: Oct 2009
Device: Kobo Libra2
|
Quote:
The only way I see to do this with any speed is to take apart the book so the pages could be put through an ADF instead of having to turn the pages and flatten the book each time. Is that what you're doing or do you have an alternative? Thanks, Marcy |
|
![]() |
![]() |
![]() |
#11 |
Media Bloke
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,382
Karma: 113956855
Join Date: Sep 2010
Location: NSW - Australia
Device: iOS
|
Oh! I've never used OCR for an entire book. Mainly for documents that are easily proofed. Acrobat mixes up "1"s and "l"s too.
I remember OmniPage would flag any words not in the dictionary and provide a list of probables for you to chose. That made proofing pretty easy. I think its pretty expensive though. |
![]() |
![]() |
![]() |
#12 | |
350 Hoarder
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,574
Karma: 8281267
Join Date: Dec 2010
Location: Midwest USA
Device: Sony PRS-350, Kobo Glo & Glo HD, PW2
|
Quote:
The time pretty much flies if you're busy doing something else, so find a good TV show you're insterested in and you'll be done before you know it. I did my first book just standing there next to my computer and could barely stand doing 25 pages a day. |
|
![]() |
![]() |
![]() |
#13 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
There is pretty much:
Quote:
Then on top of the OCR, you have to fix broken paragraphs, add in proper indentation, check for missing quotation marks, adding in blockquotes, check for actual typos/errors in the physical/PDF book, etc. etc. I do book conversion professionally, and mostly work with non-fiction economics books (lots of footnotes). Other types of books might be eaiser/faster, but If I want to completely proof a book and get a completed/finalized EPUB out of it, it takes me ~8-15 hours of work (although when I first started it used to take me ~2 weeks to convert a book). I explained a lot of the method in here: https://www.mobileread.com/forums/sho...d.php?t=223817 and in here: https://www.mobileread.com/forums/sho...d.php?t=234146 I personally use ABBYY Finereader (because in my testing it has been the most accurate). But the same methods should apply no matter what OCR program you are using. Last edited by Tex2002ans; 03-22-2014 at 10:09 AM. |
||
![]() |
![]() |
![]() |
#14 | |
eBook Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Quote:
|
|
![]() |
![]() |
![]() |
#15 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 815
Karma: 1029784
Join Date: May 2008
Location: Nebraska, USA
Device: PEZ, Color Libre, 2@Sony T1, Onyx i62HD
|
Quote:
I had an old (HP I think) flat bed scanner that would OCR the text and let me take it directly to my WP. On that one it didn't take long, even the paperbacks. However, it was an old XP compatible scanner and it did not get upgrades with the new OS. (I'm on Win 7 now) My current scanner is a all-in-one and it scans nicely but not to OCR. So....... I flatten the pb on the screen. Set the preview to identify the two different pages (making sure they are in order) then scan. I always put weights on the pb to hold it flat. It lets me continue to scan into a multi page PDF until I save the file. I will scan a chapter at a time and save it at that point. A chapter scan, depending on the number of pages and the difficulty with light leaking in where I have to rescan, takes me about 5 to 30 minutes. I will rescan a page many times if needed to get the lettering clear. But that is just the first step for me. Then I convert the pdf to wp, edit each chapter for errors and formatting. Then convert to epub. So anywhere for at least a full day, to a couple of weeks depending on how much I spend each time. Hope this answers your question. AJ I have been looking at a portable double sided ASF scanner, a Brother 720D, lately, but haven't purchased it yet. It would necessitate taking my pbs apart and scanning page by page. Does anyone have one? |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regex engine | huebi | Sigil | 1 | 02-23-2012 02:53 AM |
How to convert an OCR file to a Non-OCR one | res9282 | 1 | 08-05-2011 05:58 AM | |
Search Engine | alroy | Calibre | 1 | 11-06-2010 01:39 AM |
Regex engine? | troymc | Sigil | 10 | 07-09-2010 04:52 PM |