MobileRead Forums - View Single Post

roger64 · 07-13-2020, 04:40 AM

Quote:

Originally Posted by Shohreh

Hello,

I'd like to turn an out-of-print paper book I have into an EPUB.

I just tried taking pictures of a few pages using my smartphone, and fed them to gImageReader (a GUI to Tesseract).

The text only has a few errors, and I'll have to manually remove mid-line carriage returns, but it's pretty good.

Are there tips you would recommend before I go ahead with the whole 250 pages and turn them into an EPUB (and PDF as well)?

Thank you.

I also use Gimagereader-qt5 with Archlinux. Mine looks slightly different.

See screenshot

I process only .tif images coming from Scan Tailor.
I recognize text in HOCR format by blocks of 70 pages max
I save in html file (see red arrow)
I insert the block file in LibreOffice and save as odt.
Each block has a 3 mega size max
I suppress all bookmarks and sections, block by block.

the result is a clean enough odt file that will be later converted using ODTImport (a Sigil plugin).