Is there a way to detect buggy pdfs without manually checking each pdf?
Some pdfs have corrupt text encoding to begin with. I have a pre-process pdfs for my Kindle. Some pdfs end up with corrupt text encoding after pre-processing in Ghostscript.
If I select text from these pdfs, I get either gibberish, or blank spaces punctuated with ... well, occasional punctuation.
I usually find this out by trying to search in a pdf, or by selecting text in a pdf. Is there an easy way to detect pdfs with malformed or missing text, without manually opening and selecting passages from each pdf?
Last edited by MarjaE; 03-27-2020 at 04:07 PM.
|