View Single Post
Old 01-21-2025, 09:35 AM   #5
nezih
Enthusiast
nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.nezih is less competitive than you.
 
nezih's Avatar
 
Posts: 44
Karma: 14828
Join Date: Feb 2023
Device: Boox Page, Kobo Aura SE
  • Postprocess the scanned pages with ScanTailor (https://github.com/4lex4/scantailor-advanced), pretty easy to fix skewness you mentioned, among other things.
  • Merge the ScanTailor output files with Adobe Acrobat, OCR them via ClearScan (named "Editable text and images" in newer Acrobat DC versions). This will basically vectorize the OCRed text.
  • gImageReader is the only usable Tesseract GUI imo, however, if you can use Finereader, it can output the OCRed text in many formats, ePub being one of them. Since OCR is not %100 accurate creating pretty looking and proofread epubs is a very exhausting process but at least Finereader's epub output eases the chore a bit.
nezih is offline   Reply With Quote