Hello,
If a PDF consists of a scanned (picture) layer + OCRed text layer to allow the user to copy text… I assume it's possible to either extract the text layer for use elsewhere, or leave it in the PDF and remove the picture layer, leave just the text layer so as to get a much smaller file.
Here's an example.
I can't find an application to do this, preferably open-source.
Thank you.
---
Edit: Done.
Code:
gs -sDEVICE=txtwrite -o output.txt input.pdf