View Single Post
Old 05-29-2025, 05:25 PM   #1
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 940
Karma: 53902736
Join Date: Jun 2015
Device: multiple
Alternatives to Ocrmypdf?

I've been using ocrmypdf to ocr or re-ocr pdf books and articles. But:

1. It will crash if the book contains blank pages, or other problem pages.

2. It rasterizes everything. So even if the original was a clean pdf which just had buggy text encodng, the output will be a rasterized pdf.

3. It doesn't like spaces in the file names or file path. So I have to rename and move pdfs before processing.

I know k2pdfopt can ocr pdfs, but in my experience, trying to do everything at once can make k2pdfopt crash too. So I tend to run it, and *then* run ocrmypdf. I tried the reverse, but it sometimes scrambled the ocr.

Are there other scriptable ocr options, with decent language support, which are less likely to crash?
MarjaE is offline   Reply With Quote