View Single Post
Old 09-02-2014, 06:02 AM   #4
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by tsolignani View Post
Doing 400 DPI means that if I would use, say, 600 DPI, or anyway a «better» resolution, would lead to worse results? Or rather are you choosing 400 DPI as a good compromise between file size and quality?
It is an ok compromise between filesize and quality. 300 DPI would be the lowest DPI I would go, that is good enough for accurate OCR.

Anything higher would be icing on the cake (although much larger filesize). If your book is full of images, you may want to scan those at a higher DPI, so you have more to work with if you are fixing/editing them.

Just last month, there was this topic, "DPI to use when scanning images":

https://www.mobileread.com/forums/sho...d.php?t=243418

Quote:
Originally Posted by tsolignani View Post
How come you suggest ABBY over Acrobat? For the OCR performance alone or else?
ABBYY Finereader is the most accurate OCR. More accurate OCR, means much less man-hours in "post-processing" fixing the mistakes.

If you already own Adobe Acrobat Pro, then meh, that OCR is probably fine, but I would make a strong case for going with Finereader over all others.

There was also this topic, also from about a month ago, in which OCR was discussed. I would also recommend visiting the topic I linked to in Post #6 (which leads to even more sets of in-depth topics discussing the subject matter):

https://www.mobileread.com/forums/sho...d.php?t=243327

Quote:
Originally Posted by tsolignani View Post
And for post processing you mean solving OCR problems or what?
Yep, PDF is an abysmal input format. There is lots of work that has to be done to get the text into good shape. (See topics linked above for the details).

Quote:
Originally Posted by tsolignani View Post
Please forgive me for asking so many questions, I just would like to get it right before doing a great batch of books.
So say we all!

I don't know how anyone else feels though, but it seems like every few weeks you get the same exact "How do I convert a PDF to ebook" questions. So I started just cross-linking to the previous topics with my previous tomes answers + everyone else's discussions/ideas.

I think those topics + the mountain of other linked material will answer almost all of your PDF -> OCR -> text questions. If you have any more, of course, feel free to ask.

There is also this topic in the MobileRead Wiki, although some of that info might be a tiny bit dated:

https://wiki.mobileread.com/wiki/Digi...ooks_to_Ebooks

Last edited by Tex2002ans; 09-02-2014 at 06:22 AM.
Tex2002ans is offline   Reply With Quote