MobileRead Forums - View Single Post - [OCR] Extract text layer, fix errors, re-import?

Shohreh · 08-29-2024, 01:26 PM

Hello,

I notice some typos in the text layer added by an OCR into a "bitmap" PDF, ie. pages are actually scanned pages.

I first tried opening the EPUB generated by Abbyy Finereader, but LibreOffice couldn't open it at all, while Sigil could after showing an error message but lacks a French dictionary to run the job (as far as I can tell).

As an alternative, pdftotext or mutool (convert) can extract the text layer from such PDF, but can they put it back after I fixed the typos?

Thank you.

--
Edit: An easy solution is to convert the PDF to EPUB using Abbyy Finereader, and then run the HTML files within through a spellchecker.

08-29-2024, 01:26 PM	#1
Shohreh Addict Posts: 222 Karma: 304158 Join Date: Jan 2016 Location: France Device: none	[SOLVED] [OCR] Extract text layer, fix errors, re-import? Hello, I notice some typos in the text layer added by an OCR into a "bitmap" PDF, ie. pages are actually scanned pages. I first tried opening the EPUB generated by Abbyy Finereader, but LibreOffice couldn't open it at all, while Sigil could after showing an error message but lacks a French dictionary to run the job (as far as I can tell). As an alternative, pdftotext or mutool (convert) can extract the text layer from such PDF, but can they put it back after I fixed the typos? Thank you. -- Edit: An easy solution is to convert the PDF to EPUB using Abbyy Finereader, and then run the HTML files within through a spellchecker. Last edited by Shohreh; 08-30-2024 at 04:28 AM.