Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book General > News

Notices

Reply
 
Thread Tools Search this Thread
Old 10-11-2009, 11:57 PM   #76
HauntedAttic
Junior Member
HauntedAttic began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Oct 2009
Device: Sony PRS-505, iRex DR800SG
After another day of using OmniPage, I'm starting to get frustrated. Getting good results from OmniPage takes hours, while Abbyy gives good results in minutes. OmniPage is turning into a huge time-sink.
HauntedAttic is offline   Reply With Quote
Old 10-12-2009, 12:38 AM   #77
ascherjim
Addict
ascherjim has a complete set of Star Wars action figures.ascherjim has a complete set of Star Wars action figures.ascherjim has a complete set of Star Wars action figures.
 
Posts: 260
Karma: 274
Join Date: Apr 2006
Location: Gig Harbor, Washington
Device: BeBook One, PocketBook 360, Kindle Paperwhite, Kobo Aura One
Quote:
Originally Posted by HauntedAttic View Post
After another day of using OmniPage, I'm starting to get frustrated. Getting good results from OmniPage takes hours, while Abbyy gives good results in minutes. OmniPage is turning into a huge time-sink.
Thanks for the further information, and possibly saving me a lot of heartache. ABBYY it'll have to continue to be, in whichever version I ultimately find I need.
ascherjim is offline   Reply With Quote
Old 10-13-2009, 10:50 AM   #78
ascherjim
Addict
ascherjim has a complete set of Star Wars action figures.ascherjim has a complete set of Star Wars action figures.ascherjim has a complete set of Star Wars action figures.
 
Posts: 260
Karma: 274
Join Date: Apr 2006
Location: Gig Harbor, Washington
Device: BeBook One, PocketBook 360, Kindle Paperwhite, Kobo Aura One
Quote:
Originally Posted by Elfwreck View Post
There's a better way.

If your pages are the same size and layout, or close to it, you can save the text blocks you use, and load them on all the pages at once.

I have FineReader 7 Pro. How I'd do this:
-Go to a standard-looking page of your document
-Ctrl-E to place zones on the page. Delete unwanted text/image blocks.
-Shape wanted text block(s) to just a bit bigger than the main text of the page; give a bit of margin in case of pages that are shifted a bit to one side or the other.
-Image-->Save Blocks: save blocks out (usually with the name of the book, so you remember which one it is.
-Select all pages in your book (or all besides the cover page & TOC, which may need different zoning)
-Image-->Load Blocks; apply to selected pages.

This will only work if your pages are substantially identical--but it'll save hours if they are. And it can be done to all pages, and then you can quickly flip through and look for any that need to be zoned differently.
I've been using with considerable success Elfwreck's block scanning method. I use ABBYY's Sprint 6 version, which accomplishes in effect the same thing, but not with the same key punches. However, the instructions with my version are sadly scant and often erroneous. Does anyone who also employs this Sprint version know how to actually save the defined blocks once they're created?
ascherjim is offline   Reply With Quote
Old 10-13-2009, 11:17 AM   #79
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
How is ABBYY (What the hell kind of name is that?!) for OCR-ing older books filled with long s characters and other such delights?

- Ahi
ahi is offline   Reply With Quote
Old 10-13-2009, 12:10 PM   #80
igorsk
Wizard
igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.
 
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
Quote:
Originally Posted by ahi View Post
How is ABBYY (What the hell kind of name is that?!) for OCR-ing older books filled with long s characters and other such delights?
ABBYY is the company name, the OCR tool is called FineReader.
It has a pattern training tool ("user patterns") which can be quite effective. There is also a special version for old texts:
Quote:
On top of FineReader's basic OCR functions, FineReader XIX is capable of reading old texts that feature elaborate type prints. This includes text with ornamental curls that break the continuous line of the word and roman type characters no longer in use such as the elongated “s” used in early English or French text. FineReader XIX support for Fraktur includes:
Languages:
German, English, French, Italian, and Spanish
Fonts:
Fraktur, Schwabacher, and a majority of Textura (Gothic) fonts
igorsk is offline   Reply With Quote
Old 10-13-2009, 12:17 PM   #81
Hellmark
Wizard
Hellmark ought to be getting tired of karma fortunes by now.Hellmark ought to be getting tired of karma fortunes by now.Hellmark ought to be getting tired of karma fortunes by now.Hellmark ought to be getting tired of karma fortunes by now.Hellmark ought to be getting tired of karma fortunes by now.Hellmark ought to be getting tired of karma fortunes by now.Hellmark ought to be getting tired of karma fortunes by now.Hellmark ought to be getting tired of karma fortunes by now.Hellmark ought to be getting tired of karma fortunes by now.Hellmark ought to be getting tired of karma fortunes by now.Hellmark ought to be getting tired of karma fortunes by now.
 
Hellmark's Avatar
 
Posts: 2,592
Karma: 4290425
Join Date: Jun 2009
Location: Foristell, Missouri, USA
Device: Nokia N800, PRS-505, Nook STR Glowlight, Kindle 3, Kobo Libra 2
This is from the company's FAQ
Quote:
The company's name – ABBYY – can be translated as keen eye. This word comes from the hypothetical (reconstructed) parent language for Miao-Yao, Nu, Hmong-Mien, Hmong, and Kim Mun groups of the Sino-Tibetan language family. This name and its meaning (keen eye) reflect the key fields of company's activity and research: document recognition and linguistic technologies.
Hellmark is offline   Reply With Quote
Old 10-13-2009, 12:43 PM   #82
Elfwreck
Grand Sorcerer
Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.
 
Elfwreck's Avatar
 
Posts: 5,187
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
Quote:
Originally Posted by ahi View Post
How is ABBYY (What the hell kind of name is that?!) for OCR-ing older books filled with long s characters and other such delights?

- Ahi
I believe it can't do long s. Its languages are also limited; I gather it has good Cyrillic alphabet coverage, but doesn't yet do right-to-left languages. It also doesn't deal with en-dashes (although it'll accept them if added, it never reads them) or smart quotes (same issue).

There's a review at http://capecodhistory.us/books/PrestoOCR.htm (scroll past the Presto review) describing some of the areas where ABBYY doesn't work well.

"Best on the market today" doesn't mean "works for all purposes." FineReader's terrific for OCRing modern novels and most textbooks; it's got problems for more complex works. The best that can be said about it is that the viewing window arrangement allows easy editing while looking at the scanned page.
Elfwreck is offline   Reply With Quote
Old 10-13-2009, 12:53 PM   #83
Ea
Wizard
Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.
 
Ea's Avatar
 
Posts: 3,490
Karma: 5239563
Join Date: Jan 2008
Location: Denmark
Device: Kindle 3|iPad air|iPhone 4S
Quote:
Originally Posted by Elfwreck View Post
... It also doesn't deal with en-dashes (although it'll accept them if added, it never reads them) or smart quotes (same issue).
And em-dashes as well, according to to the review. It's not quite clear whether it just skips them? OmniPro does this - it can be highly annoying.
Ea is offline   Reply With Quote
Old 10-13-2009, 01:44 PM   #84
Elfwreck
Grand Sorcerer
Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.
 
Elfwreck's Avatar
 
Posts: 5,187
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
Quote:
Originally Posted by Ea View Post
And em-dashes as well, according to to the review. It's not quite clear whether it just skips them? OmniPro does this - it can be highly annoying.
It doesn't skip them, although it's prone to reading them as hyphens. I thought I remembered FR reading emdashes, though. It might catch some and not others.

It reads section symbols (§) but not paragraph symbols (¶). Paragraph symbols tend to be read as either "ff" or "fl." (I OCR a lot of legal documents; court rulings like to use § and ¶.) It's good with hyphenated words at the ends of lines. It's awful with poetry--the text is accurate, but where it puts hard & soft returns is weird.

It will indeed assign variable font sizes, so one page is 10.93 pt, and the next is 11.02 pt. Also, it sets the leading at a specific point level, rather than "single spaced" or "double spaced."
Elfwreck is offline   Reply With Quote
Old 10-13-2009, 02:42 PM   #85
Daithi
Publishers are evil!
Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.
 
Daithi's Avatar
 
Posts: 2,418
Karma: 36205264
Join Date: Mar 2008
Location: Rhode Island
Device: Various Kindles
I just scanned a couple books this weekend and wish I would have read this thread first. Lot's of good information, and I think I've been sold on ABBYY.

Here is what doesn't work: Scanning each page with a cheap scanner and then printing the images to PDF, followed by using the PDF's software for OCR conversion.

Actually, I think my biggest problem was not the OCR conversion, but was poor scanning that resulted in poor OCR conversion. I was using a flatbed scanner and anywhere that the page wasn't laying flat resulted in tons of OCR errors.

In my case, I'm scanning old books that I don't want to destroy. This means cutting off the binding and using a flatbed scanner is out of the question. I don't even want to press real hard on the book to get it to lie flat, because I'm afraid I will break the binding and have my pages falling out.

I looked at the DIY book scanner projects, but most of them looked like a huge amount of work, and I'm not so sure I have the room for the resulting scanner. However, one set of instructions I found for a cheap scanner looked promising. I suppose I'll give something like this a shot and see if it works.
Daithi is offline   Reply With Quote
Old 10-13-2009, 03:03 PM   #86
slayda
Retired & reading more!
slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.slayda ought to be getting tired of karma fortunes by now.
 
slayda's Avatar
 
Posts: 2,764
Karma: 1884247
Join Date: Sep 2006
Location: North Alabama, USA
Device: Kindle 1, iPad Air 2, iPhone 6S+, Kobo Aura One
Quote:
Originally Posted by ascherjim View Post
I've been using with considerable success Elfwreck's block scanning method. I use ABBYY's Sprint 6 version, which accomplishes in effect the same thing, but not with the same key punches. However, the instructions with my version are sadly scant and often erroneous. Does anyone who also employs this Sprint version know how to actually save the defined blocks once they're created?
This doesn't work very well with cheap paperback books. The text tends to move around from page to page. The better the original, the easier the OCR.
slayda is offline   Reply With Quote
Old 10-13-2009, 04:45 PM   #87
igorsk
Wizard
igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.
 
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
Quote:
Originally Posted by Elfwreck View Post
I believe it can't do long s. Its languages are also limited; I gather it has good Cyrillic alphabet coverage, but doesn't yet do right-to-left languages.
Actually, since last year it can do Hebrew (RTL), Thai and Chinese/Japanese/Korean. That's in addition to basically any Latin- or Cyrillic-based language. Probably the only major script it's still missing is Arabic.
As for long s, see my previous post.
igorsk is offline   Reply With Quote
Old 10-13-2009, 04:58 PM   #88
igorsk
Wizard
igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.
 
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
Quote:
Originally Posted by Daithi View Post
Actually, I think my biggest problem was not the OCR conversion, but was poor scanning that resulted in poor OCR conversion. I was using a flatbed scanner and anywhere that the page wasn't laying flat resulted in tons of OCR errors.

In my case, I'm scanning old books that I don't want to destroy. This means cutting off the binding and using a flatbed scanner is out of the question. I don't even want to press real hard on the book to get it to lie flat, because I'm afraid I will break the binding and have my pages falling out.
There are two kinds of flatbed scanners.
Contact Image Sensor (CIS) ones are usually cheaper since they mount sensors directly on the scan head. That gets rid of some optics but results in what you describe: anything that's more than two millimeters away from the glass is basically not registered.
Charge-Coupled Device (CCD) scanners use some optics to direct the image from the head to the fixed sensor and thus can pick up pretty much anything above the glass.
The latest versions of FineReader have some sophisticated algorithms to straighten the lines of two-page book scans, so if you can get a scanner to at least register the part close to binding, it should do a fair job.
igorsk is offline   Reply With Quote
Old 10-13-2009, 06:38 PM   #89
Daithi
Publishers are evil!
Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.Daithi ought to be getting tired of karma fortunes by now.
 
Daithi's Avatar
 
Posts: 2,418
Karma: 36205264
Join Date: Mar 2008
Location: Rhode Island
Device: Various Kindles
Quote:
Originally Posted by igorsk View Post
There are two kinds of flatbed scanners.
Contact Image Sensor (CIS) ones are usually cheaper since they mount sensors directly on the scan head. That gets rid of some optics but results in what you describe: anything that's more than two millimeters away from the glass is basically not registered.
Charge-Coupled Device (CCD) scanners use some optics to direct the image from the head to the fixed sensor and thus can pick up pretty much anything above the glass.
The latest versions of FineReader have some sophisticated algorithms to straighten the lines of two-page book scans, so if you can get a scanner to at least register the part close to binding, it should do a fair job.
After doing a bit more research, mostly here on MobileRead (I just love this place), I decided to buy a OpticBook 3600 scanner. From what I understand it scans right to the edge of the scanner, and it also comes with an older version of ABBYY FineReader. My plan is to upgrade the FineReader software once the scanner arrives.

Here is the cool part. I was planning on buying FineReader, which is $400, but instead I bought the 3600 scanner for $215 and will spend $179 for the FineReader upgrade. So the way I see it, I'm getting the scanner for free.
Daithi is offline   Reply With Quote
Old 10-13-2009, 06:56 PM   #90
ascherjim
Addict
ascherjim has a complete set of Star Wars action figures.ascherjim has a complete set of Star Wars action figures.ascherjim has a complete set of Star Wars action figures.
 
Posts: 260
Karma: 274
Join Date: Apr 2006
Location: Gig Harbor, Washington
Device: BeBook One, PocketBook 360, Kindle Paperwhite, Kobo Aura One
Quote:
Originally Posted by Daithi View Post
After doing a bit more research, mostly here on MobileRead (I just love this place), I decided to buy a OpticBook 3600 scanner. From what I understand it scans right to the edge of the scanner, and it also comes with an older version of ABBYY FineReader. My plan is to upgrade the FineReader software once the scanner arrives.

Here is the cool part. I was planning on buying FineReader, which is $400, but instead I bought the 3600 scanner for $215 and will spend $179 for the FineReader upgrade. So the way I see it, I'm getting the scanner for free.
I'd like to offer a cautionary note here. I myself a week or so ago purchased the OpticBook 3600 scanner, with which I am VERY pleased. I've been scanning about a book a day. I have been using the FineReader Sprint version 6 that came bundled with it, and have worked out (with of course some trial-and-error) a wonderfully efficient (for me) methodology for converting my scans to the Mobipocket format (my preferred format). I won't go into details.

Today, then, I downloaded the trial version of FineReader Pro version 10, and produced a book using it. I know, and can readily see, that FR 10 has far more features than the simple and basic Sprint version I have "come to love." But I definitely will not be upgrading. For MY purposes, I don't need the upgrade, and the unneeded additional complexity the upgrade would ask (if not require) of me.

So, my cautionary note is this: Before ordering the upgrade, try out the Sprint. It may just suit your needs fine, as it does mine.
ascherjim is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
calibre crashes when scanning and adding books oncdoc Calibre 8 04-21-2010 03:03 PM
Scanning books - New need help Sporadic Workshop 9 04-19-2009 01:11 PM
Scanning paper (out of copyright) books. Charles Gray Workshop 18 03-25-2009 02:06 PM
Scanning books Nate the great Lounge 10 11-04-2007 01:20 AM
Scanning books from your own library Alexander Turcic Deals and Resources (No Self-Promotion or Affiliate Links) 13 06-16-2006 12:28 AM


All times are GMT -4. The time now is 09:01 AM.


MobileRead.com is a privately owned, operated and funded community.