View Single Post
Old 07-12-2020, 07:48 PM   #12
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Shohreh View Post
I'd like to turn an out-of-print paper book I have into an EPUB.

[...]

Are there tips you would recommend before I go ahead with the whole 250 pages and turn them into an EPUB (and PDF as well)?
I've written extensively about this over the years.

On cleaning up your images, I would recommend using Scan Tailor Advanced. This crops your images, fixes distortion due to curved pages, and can turn them B&W.

I recently wrote a tutorial + more details about this just a few months ago: "Optimize PDFs from archive.org for E-Ink devices" (especially Post #2+#14).

On OCRing and all other errors/situations that may crop up, I recommend my detailed posts in the 2014 topic, "Delicate text digitalizing + scanning issues".

Not too much has changed since then... most of the steps and issues are still exactly the same in 2020.

Quote:
Originally Posted by Shohreh View Post
I tried Abbyy FineReader, and it worked much better than gImageReader (ie. Tesseract).


Back in 2014, I wrote another post discussing all the ins-and-outs of free vs. proprietary OCR:

"Can you OCR the images inside of .pdf files?"

Most of the free tools get you the straight text, but then do a poorer job of carrying over the actual formatting (italics/bold, footnotes, superscript, tables, etc.).

Fiction, you would probably be okay... but the more complicated the book, the more time you're going to be spending trying to correct/readd all the formatting.

Last edited by Tex2002ans; 07-12-2020 at 08:08 PM.
Tex2002ans is offline   Reply With Quote