[SOLVED] [OCR] Extract text layer, fix errors, re-import?
Hello,
I notice some typos in the text layer added by an OCR into a "bitmap" PDF, ie. pages are actually scanned pages.
I first tried opening the EPUB generated by Abbyy Finereader, but LibreOffice couldn't open it at all, while Sigil could after showing an error message but lacks a French dictionary to run the job (as far as I can tell).
As an alternative, pdftotext or mutool (convert) can extract the text layer from such PDF, but can they put it back after I fixed the typos?
Thank you.
--
Edit: An easy solution is to convert the PDF to EPUB using Abbyy Finereader, and then run the HTML files within through a spellchecker.
Last edited by Shohreh; 08-30-2024 at 03:28 AM.
|