Quote:
Originally Posted by Dullahir
With me, nope. For example, one book is fine, the other has ugly footers about PDFCompression. Therefore, no results found, even if you check 'Ignore Content', which I find strange.
|
Its not looking at the content its looking at the file size and checksums - it gets them from the file system directory not from within the file. If just one byte in a file is changed, added or removed then the size and checksums will change, and two files otherwise identical will not be regarded as duplicate files.
Here is a review of duplicate file finders
https://www.techsupportalert.com/bes...e-detector.htm
I cant recommend one because I don't have one...
Do you have opf files for your 100,000 books? And what OS are you on - Windows, OS/X or Linux ? There may be a specialist product like the one I have for image files - but I wouldn't hold my breath.
If it were me I'd bite the bullet and load them into Calibre. I would do it in batches, once calibre has an author & title database, I think you could delete the format files as I don't think they're needed by Find Duplicates... unless you're planning on doing a binary compare, on 100,000 books that could take quite a long time.
BR