02-09-2011, 03:55 AM | #1 |
Member
Posts: 13
Karma: 10
Join Date: Jan 2011
Device: none
|
.... and again duplicates ....
I systemize and sort my library till now on CDs with Calibre. I always find new duplicates.
I don't know how Python works as for finding duplicates (comparing titles and authors?). Earlier i used a software programmed in Visual Basic 6 and based on MS Access (i already mentioned, this : http://depositfiles.com/de/files/xmgx3g3nr). It is fairly rudimentary but one good thing was that the duplicates were really well filtered. There were calculated MD5 hashes of the book files during the scan process and the files with the same hash identified as duplicates. I have only a vague imagination what hash is and how complicated it would be to integrate this process in Calibre, but the result was very good. So if i have e.g. a book with the title "Nice world" and the same book with the title "World nice" (because the scanner haven't made somehow the right job), Calibre finds them two different books although they are same. On the other way (per hash) there would have been identified as duplicates. Best regards |
02-09-2011, 04:44 AM | #2 | |
Wizard
Posts: 3,450
Karma: 10484861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
|
Quote:
https://www.mobileread.com/forums/sho...d.php?t=118013 A system for Duplicate detection is in the making. There is a problem with your hash theory. You can have a duplicate book but version in Calibre is in txt format and version what you are adding is in epub. So you want the program to add the epub next to the txt. I have many, many texts in Calibre that are in several formats. |
|
Advert | |
|
02-09-2011, 06:08 AM | #3 |
Wizard
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
There is also the fact that a book can be a duplicte even though it is not byte identical to an existing file. for instance it might just have different metadata stored inside it.
The key point is that Calibre is working at the 'book' level and not the 'file' level when considering duplicates. |
02-09-2011, 07:07 AM | #4 | ||
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
Quote:
|
||
02-09-2011, 08:20 AM | #5 | ||
Calibre Plugins Developer
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
Personally I am very much in the "before you add to Calibre" camp. Why waste your time cleaning up filenames of files (or fixing up metadata inside Calibre)? Just run a hash comparison using any one of a number of free utilities out there on the internet first on your source folder and Calibre, then delete from the source folder. Don't directly delete from Calibre's folders though - or if you do you will need to run one of the repair database options to get Calibre's internal database matching the fact that a book format is no longer present. Quote:
In the ebook viewer preferences if you disable "Remember the current page when quitting" and don't add bookmarks then your EPUB should remain untouched - or at least that was the hope . |
||
Advert | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Calibre Duplicates | mitch13 | Calibre | 5 | 11-13-2010 06:42 AM |
Possible Bug on Duplicates | Giuseppe Chillem | Calibre | 3 | 05-06-2010 07:26 PM |
Duplicates | pauldadams | Calibre | 17 | 05-04-2010 11:57 PM |
Duplicates... | jaxx6166 | Sony Reader | 5 | 07-09-2009 09:13 PM |
duplicates in database | RJA | Calibre | 3 | 06-22-2009 09:06 AM |