View Single Post
Old 02-09-2022, 08:40 AM   #1
Shohreh
Addict
Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.Shohreh ought to be getting tired of karma fortunes by now.
 
Posts: 207
Karma: 304158
Join Date: Jan 2016
Location: France
Device: none
Question [SOLVED] (Open-source) application to extract text layer?

Hello,

If a PDF consists of a scanned (picture) layer + OCRed text layer to allow the user to copy text… I assume it's possible to either extract the text layer for use elsewhere, or leave it in the PDF and remove the picture layer, leave just the text layer so as to get a much smaller file.

Here's an example.

I can't find an application to do this, preferably open-source.

Thank you.

---
Edit: Done.

Code:
gs -sDEVICE=txtwrite -o output.txt input.pdf

Last edited by Shohreh; 02-09-2022 at 09:57 AM.
Shohreh is offline   Reply With Quote