|
|
#1 |
|
Veteran Linux user
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 150
Karma: 1000000
Join Date: Mar 2017
Location: Barcelona/Spain
Device: Boyue Likebook Note & Mimas, Hisense A5, hopefully soon a PineNote
|
OCRmyPDF adds OCR text layer to scanned PDF files
Just stumbled over this wonderful tool:
github.com/jbarlow83/OCRmyPDF * Generates a searchable PDF/A file from a regular PDF * Places OCR text accurately below the image to ease copy / paste * Keeps the exact resolution of the original embedded images * When possible, inserts OCR information as a "lossless" operation without rendering vector information * Keeps file size about the same * If requested deskews and/or cleans the image before performing OCR * Validates input and output files * Provides debug mode to enable easy verification of the OCR results * Processes pages in parallel when more than one CPU core is available * Uses Tesseract OCR engine * Supports more than 100 languages recognized by Tesseract * Battle-tested on thousands of PDFs, a test suite and continuous integration There is an official package in Debian Linux for those using Linux. I have used it so far to postprocess both a Spanish and English language PDF of my own making, and i am very happy with the results. |
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Is the PDF experience better with a text layer? | El Duderino | KOReader | 16 | 08-04-2017 09:25 PM |
| Scanned text pdf with OCR but graphical layer instead vectorial | whopper | 2 | 09-10-2011 07:32 PM | |
| Google Adds OCR for PDF Files | kjk | News | 0 | 06-22-2010 03:27 PM |
| Converting OCR Text files | jedavis1 | Workshop | 10 | 10-01-2009 11:09 PM |
| PDF Image -> OCR -> text | frikk | Workshop | 9 | 07-08-2009 08:21 PM |