View Single Post
Old 07-13-2020, 04:40 AM   #13
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,625
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Quote:
Originally Posted by Shohreh View Post
Hello,

I'd like to turn an out-of-print paper book I have into an EPUB.

I just tried taking pictures of a few pages using my smartphone, and fed them to gImageReader (a GUI to Tesseract).

The text only has a few errors, and I'll have to manually remove mid-line carriage returns, but it's pretty good.

Are there tips you would recommend before I go ahead with the whole 250 pages and turn them into an EPUB (and PDF as well)?

Thank you.
I also use Gimagereader-qt5 with Archlinux. Mine looks slightly different.

See screenshot

I process only .tif images coming from Scan Tailor.
I recognize text in HOCR format by blocks of 70 pages max
I save in html file (see red arrow)
I insert the block file in LibreOffice and save as odt.
Each block has a 3 mega size max
I suppress all bookmarks and sections, block by block.

the result is a clean enough odt file that will be later converted using ODTImport (a Sigil plugin).
Attached Thumbnails
Click image for larger version

Name:	ksnip.png
Views:	498
Size:	241.5 KB
ID:	180562  

Last edited by roger64; 07-13-2020 at 04:47 AM. Reason: image
roger64 is offline   Reply With Quote