View Single Post
Old 12-24-2012, 03:33 PM   #5
Geremia
Addict
Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!Geremia rocks like Gibraltar!
 
Posts: 258
Karma: 100000
Join Date: Oct 2012
Device: Calibre
Quote:
Originally Posted by dgatwood View Post
Better to use something like pdftotext and see if it returns nothing. PDF files might contain both images *and* text, and I'm assuming you probably want to convert those as well.
Oh yes, that's a good point. What I wrote above only finds pure-text PDFs, not mixed text/image ones like the PDFs from Archive.org. I don't think I have many, if any, mixed text/image PDFs, but all my DJVU books are that way. PDFs from Google Books or HathiTrust are mostly images, but they do have a small amount of text for copyright, etc., so making a script ignore that would be more complex.
Geremia is offline   Reply With Quote