Alternatives to Ocrmypdf?
I've been using ocrmypdf to ocr or re-ocr pdf books and articles. But:
1. It will crash if the book contains blank pages, or other problem pages.
2. It rasterizes everything. So even if the original was a clean pdf which just had buggy text encodng, the output will be a rasterized pdf.
3. It doesn't like spaces in the file names or file path. So I have to rename and move pdfs before processing.
I know k2pdfopt can ocr pdfs, but in my experience, trying to do everything at once can make k2pdfopt crash too. So I tend to run it, and *then* run ocrmypdf. I tried the reverse, but it sometimes scrambled the ocr.
Are there other scriptable ocr options, with decent language support, which are less likely to crash?
|