View Single Post
Old 02-09-2011, 04:44 AM   #2
kacir
Wizard
kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.
 
kacir's Avatar
 
Posts: 3,463
Karma: 10684861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
Quote:
Originally Posted by jekkii View Post
I systemize and sort my library till now on CDs with Calibre. I always find new duplicates.
I don't know how Python works as for finding duplicates (comparing titles and authors?).
Earlier i used a software programmed in Visual Basic 6 and based on MS Access (i already mentioned, this : http://depositfiles.com/de/files/xmgx3g3nr). It is fairly rudimentary but one good thing was that the duplicates were really well filtered. There were calculated MD5 hashes of the book files during the scan process and the files with the same hash identified as duplicates. I have only a vague imagination what hash is and how complicated it would be to integrate this process in Calibre, but the result was very good.
So if i have e.g. a book with the title "Nice world" and the same book with the title "World nice" (because the scanner haven't made somehow the right job), Calibre finds them two different books although they are same. On the other way (per hash) there would have been identified as duplicates.
Best regards
See this thread for better discussion.
https://www.mobileread.com/forums/sho...d.php?t=118013
A system for Duplicate detection is in the making.

There is a problem with your hash theory.
You can have a duplicate book but version in Calibre is in txt format and version what you are adding is in epub. So you want the program to add the epub next to the txt.
I have many, many texts in Calibre that are in several formats.
kacir is offline   Reply With Quote