Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 03-10-2013, 02:39 PM   #1
nlundberg
Connoisseur
nlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and grace
 
Posts: 56
Karma: 43090
Join Date: Jan 2013
Location: Sweden
Device: Cybook Odyssey HD Frontlight, Calibre, Macbook pro, OSX 10.6.8
An advice on OCRing, please.

I got this scanned book with features such as two pages landscape scan, illustrations, warped text, and shades over the pages.

But, the OCR tests I have done grab the text pretty ok. The problem is the that I can not find a way to rearrange to single page layout, and that mess up the whole document.

I figure I either have to find an application that split the pages beforehand, or a smart OCR'er.

I am on OSX. I want to be able to read it on my e-reader.

Thankful for input!
nlundberg is offline   Reply With Quote
Old 03-10-2013, 05:23 PM   #2
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
Scan Tailor is an excellent tool. Google it. It's available on Windows and Linux, so you may need VirtualBox/VMware/Parallels/etc., and a copy of Windows XP or Ubuntu. Maybe that will get you the desired result with whatever OCR program you currently use. The top OCR-ing program right now is ABBYY FineReader Professional 11, but it doesn't do layout. Think of it as an "extraction" tool. You get the text and the images which you then process using the Adobe Creative Suite, Microsoft Word, or various other open-source tools. Vectorizing the graphics and tracking down the fonts is an optional step, which brings a very nice touch to the final result. Don't forget to proofread it at least once.
DSpider is offline   Reply With Quote
Advert
Old 03-12-2013, 04:25 PM   #3
nlundberg
Connoisseur
nlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and grace
 
Posts: 56
Karma: 43090
Join Date: Jan 2013
Location: Sweden
Device: Cybook Odyssey HD Frontlight, Calibre, Macbook pro, OSX 10.6.8
Thanks. I installed Scan Tailor but I forgot to mention that my scanned book is in the form of a PDF, and Scan Tailor does not accept PDF. I guess I can split it up somehow somewhere else.

But ABBYY FineReader Professional 11, would that one work with two page scans?
nlundberg is offline   Reply With Quote
Old 03-13-2013, 03:03 AM   #4
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
Don't ever scan straight to PDF. Always scan as images, process them, and then pack them in a PDF, if you want. The way I see it, you can export the PDF as images using one of the many virtual printer drivers on the net, then run the images through Scan Tailor.

FineReader will import the PDF as images and process them more or less similar to Scan Tailor, except with OCR. I don't like doing this because it basically takes a screenshot of each page from the PDF and loses resolution.

There are two options:

1. The "quick and dirty" way, using Scan Tailor + FineReader. Images on top, positional OCR underneath the images. The text won't reflow, but at least it's searchable (within a ~95% margin, if you don't proofread it). You won't have to worry about fonts or the layout, but you also won't be able to easily correct typos or other mistakes from the book, since they're just images.

2. The "quality way", using FineReader to extract the text, proofread it in FineReader, track down the fonts, process the graphics, redo the layout, proofread the final product again. Takes a lot of time and effort, not many people stick with it... but believe me, it's always a pleasure to read such a book. Make sure that the material is worth it and that it's not already available as an e-book. Training videos on how to use InDesign, Word, Acrobat, etc., can be very helpful.

Last edited by DSpider; 03-13-2013 at 03:06 AM.
DSpider is offline   Reply With Quote
Old 03-13-2013, 03:23 AM   #5
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
The PDF is in this case a container for all the images. You can extract the images from the PDF with the tool 'pdfimages'. It is part of the poppler-utils. More info here. Although it is Linux based, it might also exist for OSX.

The result are the separate images which can then be processed with ScanTailor.
Toxaris is offline   Reply With Quote
Advert
Old 03-13-2013, 06:27 AM   #6
nlundberg
Connoisseur
nlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and grace
 
Posts: 56
Karma: 43090
Join Date: Jan 2013
Location: Sweden
Device: Cybook Odyssey HD Frontlight, Calibre, Macbook pro, OSX 10.6.8
I want it to end up as an e-pub optimally, but from my tests the "quick and dirty way" in FineReader worked only for PDF. But that seems to be just enough - Now that it is in singe page layout it is readable on my e-ink device.
Although it would be nice to edit the book to OCR only, I would probably hade read it twice in the same time...
nlundberg is offline   Reply With Quote
Old 03-13-2013, 06:29 AM   #7
nlundberg
Connoisseur
nlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and gracenlundberg herds cats with both ease and grace
 
Posts: 56
Karma: 43090
Join Date: Jan 2013
Location: Sweden
Device: Cybook Odyssey HD Frontlight, Calibre, Macbook pro, OSX 10.6.8
Actually, the e-pub OCR turned out pretty well. But the images are missing.
nlundberg is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
need some advice ayirp Introduce Yourself 5 12-14-2011 05:35 PM
Seriously thoughtful I need advice Exer Lounge 21 03-29-2011 05:03 AM
Looking for advice! allie_88 Which one should I buy? 2 09-30-2010 03:42 AM
Another which advice which one? sheppy124 Which one should I buy? 15 07-28-2008 01:38 PM
Help please, I need your advice! CMA Which one should I buy? 11 07-23-2008 09:33 AM


All times are GMT -4. The time now is 08:40 AM.


MobileRead.com is a privately owned, operated and funded community.