View Single Post
Old 03-27-2020, 03:24 PM   #1
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 939
Karma: 53902736
Join Date: Jun 2015
Device: multiple
Is there a way to detect buggy pdfs without manually checking each pdf?

Some pdfs have corrupt text encoding to begin with. I have a pre-process pdfs for my Kindle. Some pdfs end up with corrupt text encoding after pre-processing in Ghostscript.

If I select text from these pdfs, I get either gibberish, or blank spaces punctuated with ... well, occasional punctuation.

I usually find this out by trying to search in a pdf, or by selecting text in a pdf. Is there an easy way to detect pdfs with malformed or missing text, without manually opening and selecting passages from each pdf?

Last edited by MarjaE; 03-27-2020 at 04:07 PM.
MarjaE is offline   Reply With Quote