Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 01-20-2018, 06:16 PM   #1
orebmur
Veteran Linux user
orebmur holds these truths to be self-evident.orebmur holds these truths to be self-evident.orebmur holds these truths to be self-evident.orebmur holds these truths to be self-evident.orebmur holds these truths to be self-evident.orebmur holds these truths to be self-evident.orebmur holds these truths to be self-evident.orebmur holds these truths to be self-evident.orebmur holds these truths to be self-evident.orebmur holds these truths to be self-evident.orebmur holds these truths to be self-evident.
 
Posts: 54
Karma: 126062
Join Date: Mar 2017
Location: Barcelona/Spain
Device: Onyx Boox i62HD, Onyx Boox i86ML Plus, Boyue Likebook Note
OCRmyPDF adds OCR text layer to scanned PDF files

Just stumbled over this wonderful tool:

github.com/jbarlow83/OCRmyPDF

* Generates a searchable PDF/A file from a regular PDF
* Places OCR text accurately below the image to ease copy / paste
* Keeps the exact resolution of the original embedded images
* When possible, inserts OCR information as a "lossless" operation without rendering vector information
* Keeps file size about the same
* If requested deskews and/or cleans the image before performing OCR
* Validates input and output files
* Provides debug mode to enable easy verification of the OCR results
* Processes pages in parallel when more than one CPU core is available
* Uses Tesseract OCR engine
* Supports more than 100 languages recognized by Tesseract
* Battle-tested on thousands of PDFs, a test suite and continuous integration

There is an official package in Debian Linux for those using Linux.

I have used it so far to postprocess both a Spanish and English language PDF of my own making, and i am very happy with the results.
orebmur is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Is the PDF experience better with a text layer? El Duderino KOReader 16 08-04-2017 08:25 PM
Scanned text pdf with OCR but graphical layer instead vectorial whopper PDF 2 09-10-2011 06:32 PM
Google Adds OCR for PDF Files kjk News 0 06-22-2010 02:27 PM
Converting OCR Text files jedavis1 Workshop 10 10-01-2009 10:09 PM
PDF Image -> OCR -> text frikk Workshop 9 07-08-2009 07:21 PM


All times are GMT -4. The time now is 04:36 AM.


MobileRead.com is a privately owned, operated and funded community.