MobileRead Forums - View Single Post

MarjaE · 05-29-2025, 06:25 PM

I've been using ocrmypdf to ocr or re-ocr pdf books and articles. But:

1. It will crash if the book contains blank pages, or other problem pages.

2. It rasterizes everything. So even if the original was a clean pdf which just had buggy text encodng, the output will be a rasterized pdf.

3. It doesn't like spaces in the file names or file path. So I have to rename and move pdfs before processing.

I know k2pdfopt can ocr pdfs, but in my experience, trying to do everything at once can make k2pdfopt crash too. So I tend to run it, and *then* run ocrmypdf. I tried the reverse, but it sometimes scrambled the ocr.

Are there other scriptable ocr options, with decent language support, which are less likely to crash?

05-29-2025, 06:25 PM	#1
MarjaE Guru Posts: 942 Karma: 53902736 Join Date: Jun 2015 Device: multiple	Alternatives to Ocrmypdf? I've been using ocrmypdf to ocr or re-ocr pdf books and articles. But: 1. It will crash if the book contains blank pages, or other problem pages. 2. It rasterizes everything. So even if the original was a clean pdf which just had buggy text encodng, the output will be a rasterized pdf. 3. It doesn't like spaces in the file names or file path. So I have to rename and move pdfs before processing. I know k2pdfopt can ocr pdfs, but in my experience, trying to do everything at once can make k2pdfopt crash too. So I tend to run it, and then run ocrmypdf. I tried the reverse, but it sometimes scrambled the ocr. Are there other scriptable ocr options, with decent language support, which are less likely to crash?