Quote:
Originally Posted by Shohreh
I'd like to turn an out-of-print paper book I have into an EPUB.
[...]
Are there tips you would recommend before I go ahead with the whole 250 pages and turn them into an EPUB (and PDF as well)?
|
I've written extensively about this over the years.
On cleaning up your images, I would recommend using Scan Tailor Advanced. This crops your images, fixes distortion due to curved pages, and can turn them B&W.
I recently wrote a tutorial + more details about this just a few months ago:
"Optimize PDFs from archive.org for E-Ink devices" (especially Post #2+#14).
On OCRing and all other errors/situations that may crop up, I recommend my detailed posts in the
2014 topic, "Delicate text digitalizing + scanning issues".
Not too much has changed since then... most of the steps and issues are still exactly the same in 2020.
Quote:
Originally Posted by Shohreh
I tried Abbyy FineReader, and it worked much better than gImageReader (ie. Tesseract).
|
Back in 2014, I wrote another post discussing all the ins-and-outs of free vs. proprietary OCR:
"Can you OCR the images inside of .pdf files?"
Most of the free tools get you the straight text, but then do a poorer job of carrying over the actual formatting (italics/bold, footnotes, superscript, tables, etc.).
Fiction, you would probably be okay... but the more complicated the book, the more time you're going to be spending trying to correct/readd all the formatting.