View Single Post
Old 02-09-2011, 03:55 AM   #1
jekkii
Member
jekkii began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Jan 2011
Device: none
.... and again duplicates ....

I systemize and sort my library till now on CDs with Calibre. I always find new duplicates.
I don't know how Python works as for finding duplicates (comparing titles and authors?).
Earlier i used a software programmed in Visual Basic 6 and based on MS Access (i already mentioned, this : http://depositfiles.com/de/files/xmgx3g3nr). It is fairly rudimentary but one good thing was that the duplicates were really well filtered. There were calculated MD5 hashes of the book files during the scan process and the files with the same hash identified as duplicates. I have only a vague imagination what hash is and how complicated it would be to integrate this process in Calibre, but the result was very good.
So if i have e.g. a book with the title "Nice world" and the same book with the title "World nice" (because the scanner haven't made somehow the right job), Calibre finds them two different books although they are same. On the other way (per hash) there would have been identified as duplicates.
Best regards
jekkii is offline   Reply With Quote