Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF


Thread Tools Search this Thread
Old 01-20-2018, 07:16 PM   #1
orebmur began at the beginning.
Posts: 23
Karma: 10
Join Date: Mar 2017
Device: Onyx Boox i62HD, Onyx Boox i86ML Plus
OCRmyPDF adds OCR text layer to scanned PDF files

Just stumbled over this wonderful tool:

* Generates a searchable PDF/A file from a regular PDF
* Places OCR text accurately below the image to ease copy / paste
* Keeps the exact resolution of the original embedded images
* When possible, inserts OCR information as a "lossless" operation without rendering vector information
* Keeps file size about the same
* If requested deskews and/or cleans the image before performing OCR
* Validates input and output files
* Provides debug mode to enable easy verification of the OCR results
* Processes pages in parallel when more than one CPU core is available
* Uses Tesseract OCR engine
* Supports more than 100 languages recognized by Tesseract
* Battle-tested on thousands of PDFs, a test suite and continuous integration

There is an official package in Debian Linux for those using Linux.

I have used it so far to postprocess both a Spanish and English language PDF of my own making, and i am very happy with the results.
orebmur is offline   Reply With Quote

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Is the PDF experience better with a text layer? El Duderino KOReader 16 08-04-2017 09:25 PM
Scanned text pdf with OCR but graphical layer instead vectorial whopper PDF 2 09-10-2011 07:32 PM
Google Adds OCR for PDF Files kjk News 0 06-22-2010 03:27 PM
Converting OCR Text files jedavis1 Workshop 10 10-01-2009 11:09 PM
PDF Image -> OCR -> text frikk Workshop 9 07-08-2009 08:21 PM

All times are GMT -4. The time now is 02:28 AM. is a privately owned, operated and funded community.