MobileRead Forums - View Single Post - book scanning

nezih · 01-21-2025, 09:35 AM

Postprocess the scanned pages with ScanTailor (https://github.com/4lex4/scantailor-advanced), pretty easy to fix skewness you mentioned, among other things.
Merge the ScanTailor output files with Adobe Acrobat, OCR them via ClearScan (named "Editable text and images" in newer Acrobat DC versions). This will basically vectorize the OCRed text.
gImageReader is the only usable Tesseract GUI imo, however, if you can use Finereader, it can output the OCRed text in many formats, ePub being one of them. Since OCR is not %100 accurate creating pretty looking and proofread epubs is a very exhausting process but at least Finereader's epub output eases the chore a bit.

01-21-2025, 09:35 AM	#5
nezih Connoisseur Posts: 51 Karma: 14828 Join Date: Feb 2023 Device: Boox Page, Kobo Aura SE	Postprocess the scanned pages with ScanTailor (https://github.com/4lex4/scantailor-advanced), pretty easy to fix skewness you mentioned, among other things. Merge the ScanTailor output files with Adobe Acrobat, OCR them via ClearScan (named "Editable text and images" in newer Acrobat DC versions). This will basically vectorize the OCRed text. gImageReader is the only usable Tesseract GUI imo, however, if you can use Finereader, it can output the OCRed text in many formats, ePub being one of them. Since OCR is not %100 accurate creating pretty looking and proofread epubs is a very exhausting process but at least Finereader's epub output eases the chore a bit.