01-25-2018, 08:40 PM | #1 |
Guru
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
|
How to run Tesseract (to Ocr Pdfs) on the Mac?
What's a good way to run Tesseract to ocr pdfs on the Mac?
I've previously used Elucidate, which uses Tesseract but reformats pdfs in Quartz, which limits compatibility with k2pdfopt later on. I've also tried k2pdfopt, which can use Tesseract if requested and reformats pdfs in MuPdf. It's really useful for reformatting pdfs, but not as good as Elucidate for ocr. I've tried straight-up Tesseract, installing with Macports, but I can't get it to work. Apparently I have to create a config file to actually use Tesseract? The Macports installation doesn't include the required config files. |
01-26-2018, 05:23 PM | #2 |
Veteran Linux user
Posts: 144
Karma: 678910
Join Date: Mar 2017
Location: Barcelona/Spain
Device: Boyue Likebook Note & Mimas, Hisense A5, hopefully soon a PineNote
|
Not sure what's the final aim behind "to ocr pdfs", but maybe this here is of interest for you:
github.com/jbarlow83/OCRmyPDF |
01-27-2018, 02:53 PM | #3 |
Guru
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
|
Okay, thanks. I'll try Homebrew and Ocrmypdf.
I want to be able to (a) search pdfs (b) in some cases copy text to translation tools (c) and still be able to process the pdfs in k2pdfopt so they will load on the Kindle, and will load faster on the Mac. P.S. Had some trouble with Homebrew, but it installed on the 2nd try. P.P.S. Having more trouble with ocrmypdf. I followed the installation instructions here, but when I try to check ocrmypdf --help, I get a message stating "-bash: ocrmypdf: command not found" https://github.com/jbarlow83/OCRmyPDF Last edited by MarjaE; 01-27-2018 at 03:32 PM. |
01-27-2018, 06:15 PM | #4 |
Guru
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
|
I had to move Macports out of the way to install Ocrmypdf using Homebrew... I haven't tested Ocrmypdf yet.
P.S. Ocrmypdf can ocr a few files which Elucidate and K2 had balked at. It can leave good images on its own, but bad ones when combined with k2. Last edited by MarjaE; 01-27-2018 at 09:15 PM. |
02-05-2018, 06:46 PM | #5 |
Guru
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
|
I also installed cpdf, using HomeBrew this time. Some success with individual files, but it refuses to work with folders; it interprets invisible system DS_Store files as malformed pdfs.
|
Tags |
ocr software, tesseract |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Best E-book reader for searchable (OCR'd) PDFs | brennus | Which one should I buy? | 10 | 11-21-2015 08:38 PM |
Any foreseeable problems when ocr.processing pdfs in my collection? | caoyuan | Library Management | 3 | 10-28-2013 11:37 AM |
making the OCR go right to left when scanning PDFs | yoavbd123 | Development | 0 | 03-05-2013 12:23 AM |
PRS-350 PDFs to ePub with OCR conversion? | ivantheipodder | Sony Reader | 3 | 11-04-2010 01:09 PM |
Grafische Oberfläche für tesseract OCR - Anforderungen bitte | netseeker | Software | 39 | 10-09-2010 04:48 AM |