![]() |
#1 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 205
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
[SOLVED] Tool to OCR an "image" PDF → add text as extra layer?
Hello,
Is there a tool that can… 1. OCR an "image" PDF, and 2. Include the text output as an additional layer in a PDF, so that the user can search, and possibly select+copy, and paste it elsewhere, like it were a "text" PDF? Thank you. Last edited by Shohreh; 11-10-2020 at 06:44 PM. |
![]() |
![]() |
![]() |
#2 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,700
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
There are also a couple of free Linux tools that can do this, e.g. pdfsandwich, but most of them are neither easy to install nor exactly user-friendly. Last edited by Doitsu; 11-10-2020 at 04:45 PM. |
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 205
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
|
Thanks for the info.
I tried a couple of open-source apps (Naps2 and ocrmypdf), and the output is pretty good. |
![]() |
![]() |
![]() |
#4 |
Fuzzball, the purple cat
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,299
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Thanks for the tips on naps2 and ocrmypdf. Great looking utilities. k2pdfopt will also do this and also uses Tesseract.
k2pdfopt -mode copy -n- -ocr t file.pdf |
![]() |
![]() |
![]() |
#5 |
Member
![]() Posts: 10
Karma: 10
Join Date: May 2019
Location: Pakistan
Device: kindle4/kobo touch
|
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Fuzzball, the purple cat
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,299
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
With the MS Windows GUI you can set them as shown in the attached screen shot. The OCR option will automatically turn off native mode.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
OCRmyPDF adds OCR text layer to scanned PDF files | orebmur | 0 | 01-20-2018 06:16 PM | |
Tool to rewrite a PDF as new text after OCR | crazybrit | 1 | 06-10-2015 02:22 AM | |
How to add "Extra Titles" to my database? | 1gnition | Library Management | 20 | 04-03-2014 06:51 AM |
Scanned text pdf with OCR but graphical layer instead vectorial | whopper | 2 | 09-10-2011 06:32 PM | |
PDF Image -> OCR -> text | frikk | Workshop | 9 | 07-08-2009 07:21 PM |