11-10-2020, 03:07 PM | #1 |
Zealot
Posts: 148
Karma: 192898
Join Date: Jan 2016
Device: none
|
[SOLVED] Tool to OCR an "image" PDF → add text as extra layer?
Hello,
Is there a tool that can… 1. OCR an "image" PDF, and 2. Include the text output as an additional layer in a PDF, so that the user can search, and possibly select+copy, and paste it elsewhere, like it were a "text" PDF? Thank you. Last edited by Shohreh; 11-10-2020 at 06:44 PM. |
11-10-2020, 04:42 PM | #2 | |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
There are also a couple of free Linux tools that can do this, e.g. pdfsandwich, but most of them are neither easy to install nor exactly user-friendly. Last edited by Doitsu; 11-10-2020 at 04:45 PM. |
|
11-10-2020, 06:44 PM | #3 |
Zealot
Posts: 148
Karma: 192898
Join Date: Jan 2016
Device: none
|
Thanks for the info.
I tried a couple of open-source apps (Naps2 and ocrmypdf), and the output is pretty good. |
11-14-2020, 09:23 AM | #4 |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Thanks for the tips on naps2 and ocrmypdf. Great looking utilities. k2pdfopt will also do this and also uses Tesseract.
k2pdfopt -mode copy -n- -ocr t file.pdf |
12-15-2020, 10:23 AM | #5 |
Member
Posts: 10
Karma: 10
Join Date: May 2019
Location: Pakistan
Device: kindle4/kobo touch
|
|
12-19-2020, 12:47 PM | #6 |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
With the MS Windows GUI you can set them as shown in the attached screen shot. The OCR option will automatically turn off native mode.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
OCRmyPDF adds OCR text layer to scanned PDF files | orebmur | 0 | 01-20-2018 06:16 PM | |
Tool to rewrite a PDF as new text after OCR | crazybrit | 1 | 06-10-2015 02:22 AM | |
How to add "Extra Titles" to my database? | 1gnition | Library Management | 20 | 04-03-2014 06:51 AM |
Scanned text pdf with OCR but graphical layer instead vectorial | whopper | 2 | 09-10-2011 06:32 PM | |
PDF Image -> OCR -> text | frikk | Workshop | 9 | 07-08-2009 07:21 PM |