Quote:
Originally Posted by dgatwood
Better to use something like pdftotext and see if it returns nothing. PDF files might contain both images *and* text, and I'm assuming you probably want to convert those as well.
|
Oh yes, that's a good point. What I wrote above only finds pure-text PDFs, not mixed text/image ones like the PDFs from Archive.org. I don't think I have many, if any, mixed text/image PDFs, but all my DJVU books are that way. PDFs from Google Books or HathiTrust are mostly images, but they do have a small amount of text for copyright, etc., so making a script ignore that would be more complex.