Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 01-25-2018, 08:40 PM   #1
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
How to run Tesseract (to Ocr Pdfs) on the Mac?

What's a good way to run Tesseract to ocr pdfs on the Mac?

I've previously used Elucidate, which uses Tesseract but reformats pdfs in Quartz, which limits compatibility with k2pdfopt later on. I've also tried k2pdfopt, which can use Tesseract if requested and reformats pdfs in MuPdf. It's really useful for reformatting pdfs, but not as good as Elucidate for ocr.

I've tried straight-up Tesseract, installing with Macports, but I can't get it to work. Apparently I have to create a config file to actually use Tesseract? The Macports installation doesn't include the required config files.
MarjaE is offline   Reply With Quote
Old 01-26-2018, 05:23 PM   #2
orebmur
Veteran Linux user
orebmur ought to be getting tired of karma fortunes by now.orebmur ought to be getting tired of karma fortunes by now.orebmur ought to be getting tired of karma fortunes by now.orebmur ought to be getting tired of karma fortunes by now.orebmur ought to be getting tired of karma fortunes by now.orebmur ought to be getting tired of karma fortunes by now.orebmur ought to be getting tired of karma fortunes by now.orebmur ought to be getting tired of karma fortunes by now.orebmur ought to be getting tired of karma fortunes by now.orebmur ought to be getting tired of karma fortunes by now.orebmur ought to be getting tired of karma fortunes by now.
 
Posts: 144
Karma: 678910
Join Date: Mar 2017
Location: Barcelona/Spain
Device: Boyue Likebook Note & Mimas, Hisense A5, hopefully soon a PineNote
Not sure what's the final aim behind "to ocr pdfs", but maybe this here is of interest for you:

github.com/jbarlow83/OCRmyPDF
orebmur is offline   Reply With Quote
Old 01-27-2018, 02:53 PM   #3
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
Okay, thanks. I'll try Homebrew and Ocrmypdf.

I want to be able to (a) search pdfs (b) in some cases copy text to translation tools (c) and still be able to process the pdfs in k2pdfopt so they will load on the Kindle, and will load faster on the Mac.

P.S. Had some trouble with Homebrew, but it installed on the 2nd try.

P.P.S. Having more trouble with ocrmypdf. I followed the installation instructions here, but when I try to check ocrmypdf --help, I get a message stating "-bash: ocrmypdf: command not found"

https://github.com/jbarlow83/OCRmyPDF

Last edited by MarjaE; 01-27-2018 at 03:32 PM.
MarjaE is offline   Reply With Quote
Old 01-27-2018, 06:15 PM   #4
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
I had to move Macports out of the way to install Ocrmypdf using Homebrew... I haven't tested Ocrmypdf yet.

P.S. Ocrmypdf can ocr a few files which Elucidate and K2 had balked at. It can leave good images on its own, but bad ones when combined with k2.

Last edited by MarjaE; 01-27-2018 at 09:15 PM.
MarjaE is offline   Reply With Quote
Old 02-05-2018, 06:46 PM   #5
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
I also installed cpdf, using HomeBrew this time. Some success with individual files, but it refuses to work with folders; it interprets invisible system DS_Store files as malformed pdfs.
MarjaE is offline   Reply With Quote
Reply

Tags
ocr software, tesseract


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Best E-book reader for searchable (OCR'd) PDFs brennus Which one should I buy? 10 11-21-2015 08:38 PM
Any foreseeable problems when ocr.processing pdfs in my collection? caoyuan Library Management 3 10-28-2013 11:37 AM
making the OCR go right to left when scanning PDFs yoavbd123 Development 0 03-05-2013 12:23 AM
PRS-350 PDFs to ePub with OCR conversion? ivantheipodder Sony Reader 3 11-04-2010 01:09 PM
Grafische Oberfläche für tesseract OCR - Anforderungen bitte netseeker Software 39 10-09-2010 04:48 AM


All times are GMT -4. The time now is 07:17 PM.


MobileRead.com is a privately owned, operated and funded community.