Tool to OCR an "image" PDF → add text as extra layer?

Shohreh · 11-10-2020, 03:07 PM

Hello,

Is there a tool that can…
1. OCR an "image" PDF, and
2. Include the text output as an additional layer in a PDF, so that the user can search, and possibly select+copy, and paste it elsewhere, like it were a "text" PDF?

Thank you.

Doitsu · 11-10-2020, 04:42 PM

Quote:

Originally Posted by Shohreh

Is there a tool that can…
1. OCR an "image" PDF, and
2. Include the text output as an additional layer in a PDF, so that the user can search, and possibly select+copy, and paste it elsewhere, like it were a "text" PDF?

Besides Adobe Acrobat, pretty much any commercial OCR tool, e.g. ABBYY FineReader, can do this.
There are also a couple of free Linux tools that can do this, e.g. pdfsandwich, but most of them are neither easy to install nor exactly user-friendly.

Shohreh · 11-10-2020, 06:44 PM

Thanks for the info.

I tried a couple of open-source apps (Naps2 and ocrmypdf), and the output is pretty good.

willus · 11-14-2020, 09:23 AM

Thanks for the tips on naps2 and ocrmypdf. Great looking utilities. k2pdfopt will also do this and also uses Tesseract.

k2pdfopt -mode copy -n- -ocr t file.pdf

charsee · 12-15-2020, 10:23 AM

Quote:

Originally Posted by willus

k2pdfopt -mode copy -n- -ocr t file.pdf

These commands go in "Additional options" box?

willus · 12-19-2020, 12:47 PM

Quote:

Originally Posted by charsee

These commands go in "Additional options" box?

With the MS Windows GUI you can set them as shown in the attached screen shot. The OCR option will automatically turn off native mode.

11-10-2020, 03:07 PM	#1
Shohreh Addict Posts: 236 Karma: 304158 Join Date: Jan 2016 Location: France Device: none	[SOLVED] Tool to OCR an "image" PDF → add text as extra layer? Hello, Is there a tool that can… 1. OCR an "image" PDF, and 2. Include the text output as an additional layer in a PDF, so that the user can search, and possibly select+copy, and paste it elsewhere, like it were a "text" PDF? Thank you. Attached Thumbnails Last edited by Shohreh; 11-10-2020 at 06:44 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
OCRmyPDF adds OCR text layer to scanned PDF files	orebmur	PDF	0	01-20-2018 06:16 PM
Tool to rewrite a PDF as new text after OCR	crazybrit	PDF	1	06-10-2015 02:22 AM
How to add "Extra Titles" to my database?	1gnition	Library Management	20	04-03-2014 06:51 AM
Scanned text pdf with OCR but graphical layer instead vectorial	whopper	PDF	2	09-10-2011 06:32 PM
PDF Image -> OCR -> text	frikk	Workshop	9	07-08-2009 07:21 PM

11-10-2020, 06:44 PM	#3
Shohreh Addict Posts: 236 Karma: 304158 Join Date: Jan 2016 Location: France Device: none	Thanks for the info. I tried a couple of open-source apps (Naps2 and ocrmypdf), and the output is pretty good.

11-14-2020, 09:23 AM	#4
willus Fuzzball, the purple cat Posts: 1,318 Karma: 11087510 Join Date: Jun 2011 Location: California Device: iPad	Thanks for the tips on naps2 and ocrmypdf. Great looking utilities. k2pdfopt will also do this and also uses Tesseract. k2pdfopt -mode copy -n- -ocr t file.pdf

Advert

Advert