MobileRead Forums - View Single Post - (Open-source) application to extract text layer?

Shohreh · 02-09-2022, 09:40 AM

Hello,

If a PDF consists of a scanned (picture) layer + OCRed text layer to allow the user to copy text… I assume it's possible to either extract the text layer for use elsewhere, or leave it in the PDF and remove the picture layer, leave just the text layer so as to get a much smaller file.

Here's an example.

I can't find an application to do this, preferably open-source.

Thank you.

---
Edit: Done.

Code:

gs -sDEVICE=txtwrite -o output.txt input.pdf

02-09-2022, 09:40 AM	#1
Shohreh Addict Posts: 224 Karma: 304158 Join Date: Jan 2016 Location: France Device: none	[SOLVED] (Open-source) application to extract text layer? Hello, If a PDF consists of a scanned (picture) layer + OCRed text layer to allow the user to copy text… I assume it's possible to either extract the text layer for use elsewhere, or leave it in the PDF and remove the picture layer, leave just the text layer so as to get a much smaller file. Here's an example. I can't find an application to do this, preferably open-source. Thank you. --- Edit: Done. Code: gs -sDEVICE=txtwrite -o output.txt input.pdf Last edited by Shohreh; 02-09-2022 at 10:57 AM.