Easy way to check for pdfs with no text or buggy text?
Sometimes pdfs just lack text and need ocr. Sometimes they start with text, but lose it to pre-processing bugs. Is there a sort of Quality Check tool for pdfs that can find ones which lack text or have seriously screwed up text?
|