Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 01-20-2018, 06:16 PM   #1
orebmur
Veteran Linux user
orebmur ought to be getting tired of karma fortunes by now.orebmur ought to be getting tired of karma fortunes by now.orebmur ought to be getting tired of karma fortunes by now.orebmur ought to be getting tired of karma fortunes by now.orebmur ought to be getting tired of karma fortunes by now.orebmur ought to be getting tired of karma fortunes by now.orebmur ought to be getting tired of karma fortunes by now.orebmur ought to be getting tired of karma fortunes by now.orebmur ought to be getting tired of karma fortunes by now.orebmur ought to be getting tired of karma fortunes by now.orebmur ought to be getting tired of karma fortunes by now.
 
Posts: 144
Karma: 678910
Join Date: Mar 2017
Location: Barcelona/Spain
Device: Boyue Likebook Note & Mimas, Hisense A5, hopefully soon a PineNote
OCRmyPDF adds OCR text layer to scanned PDF files

Just stumbled over this wonderful tool:

github.com/jbarlow83/OCRmyPDF

* Generates a searchable PDF/A file from a regular PDF
* Places OCR text accurately below the image to ease copy / paste
* Keeps the exact resolution of the original embedded images
* When possible, inserts OCR information as a "lossless" operation without rendering vector information
* Keeps file size about the same
* If requested deskews and/or cleans the image before performing OCR
* Validates input and output files
* Provides debug mode to enable easy verification of the OCR results
* Processes pages in parallel when more than one CPU core is available
* Uses Tesseract OCR engine
* Supports more than 100 languages recognized by Tesseract
* Battle-tested on thousands of PDFs, a test suite and continuous integration

There is an official package in Debian Linux for those using Linux.

I have used it so far to postprocess both a Spanish and English language PDF of my own making, and i am very happy with the results.
orebmur is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Is the PDF experience better with a text layer? El Duderino KOReader 16 08-04-2017 08:25 PM
Scanned text pdf with OCR but graphical layer instead vectorial whopper PDF 2 09-10-2011 06:32 PM
Google Adds OCR for PDF Files kjk News 0 06-22-2010 02:27 PM
Converting OCR Text files jedavis1 Workshop 10 10-01-2009 10:09 PM
PDF Image -> OCR -> text frikk Workshop 9 07-08-2009 07:21 PM


All times are GMT -4. The time now is 01:00 AM.


MobileRead.com is a privately owned, operated and funded community.