|
OCRthisPDF (not yet published(
I have been using a plugin for this purpose for a while now—one that I haven't released yet because I still want to add a proofreading step. It is based on OCRmyPDF, which, in turn, relies on Tesseract for the OCR component. If anyone wants to try this out, I could release it even without the proofreading feature (which utilizes the hOCR files generated by Tesseract). For high-quality scans, the results are already excellent out-of-the-box.
|