| 
			
			 | 
		#31 | 
| 
			
			
			
			 Calibre Plugins Developer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,735 
				Karma: 2208556 
				Join Date: Oct 2010 
				Location: Australia 
				
				
				Device: Kindle Oasis 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Actually just had another thought on the "scope" of the duplicates search. Presumably you could have an option allowing you to choose "books added today", "this week", "this month", "all books" and use that as your "start set" for comparison, rather than comparing every book in your database against every other book every time...
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#32 | 
| 
			
			
			
			 Well trained by Cats 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,267 
				Karma: 61916422 
				Join Date: Aug 2009 
				Location: The Central Coast of California 
				
				
				Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I worry about a title "Only" matching. I have 2 or 3 'duplicate titles' that Are Not (different authors-different books.  
		
	
		
		
		
		
		
		
		
		
		
		
	
	Then there is the case of different 'Editions' of a book, when it changes publisher and get a edit job   I prefer an 'always ask' option (toss, make new entry, Merge), issues that could be held in a queue so as to not interrupt the rest of the batch and presented to the user near the end (like the current problem status, only allow browsing the library before marking what to do.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#33 | |
| 
			
			
			
			 US Navy, Retired 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,897 
				Karma: 13806776 
				Join Date: Feb 2009 
				Location: North Carolina 
				
				
				Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 What they're talking about here is well past the "title only" matching level.  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#34 | 
| 
			
			
			
			 Well trained by Cats 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,267 
				Karma: 61916422 
				Join Date: Aug 2009 
				Location: The Central Coast of California 
				
				
				Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A 
				
				
				 | 
	
	|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#35 | |
| 
			
			
			
			 Addict 
			
			![]() ![]() ![]() Posts: 239 
				Karma: 237 
				Join Date: Jun 2010 
				Location: OH USA 
				
				
				Device: Sony PRS 900(gave it to my sister); Sony PRS-T1; onyx book note air 
				
				
				 | 
	
	
	
		
		
			
			 
				
				we all have our addictions
			 
			Quote: 
	
 ![]() My goal is to eventually have every one of those books plus whatever I buy new on my reader so 30,000 does not seem unreasonable to me.  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#36 | |||||||
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004 
				Karma: 177841 
				Join Date: Dec 2009 
				
				
				
				Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Quote: 
	
 Quote: 
	
 Quote: 
	
 Quote: 
	
 Quote: 
	
 Quote: 
	
 You could just as easily check one of three options stored near the automerge option, and handle all incoming books according to that option (ignore, overwrite, or add as new dupe record) or you can present that question for each book (preferably with an option to do the selected thing for all the rest of the books). It's not too hard, as each book is being handled individually. Duplicate detection seems to me to be the harder case. All books are compared against all other books. You have to make groups of duplicates. You may have 3 copies of book 1, two copies of book 2, 4 copies of book 3, but one of the 4 copies of book 3 isn't really a dupe and needs to be excluded from the merge, etc. I suppose you could do duplicate detection the same way - individually check each book against the entire dataset, but that would be comparable to adding the entire library to itself - that does take a lot of time.  | 
|||||||
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#37 | |||||
| 
			
			
			
			 Calibre Plugins Developer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,735 
				Karma: 2208556 
				Join Date: Oct 2010 
				Location: Australia 
				
				
				Device: Kindle Oasis 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Personally I think popup dialog window will be the way to go to focus the dialog on the task at hand, custom colouring to indicate the groups of duplicate books, a few columns more useful to duplicate resolution etc. However my point was that doing that will mean a lot of functionality users may take for granted on the library view (such as customisable column displays, right-clicks for other actions etc) will not be available, initially at least. Quote: 
	
 Quote: 
	
 (1) if people wanted it (and Kovid etc was too busy on other things) I could develop it completely independently of any changes to Calibre source, unlike changes to automerge require. (2) There will be many users out there who have never found or intentionally not used the automerge option and have a library with duplicates they want help with identifying (3) Once (if) the automerge suboptions get added and a user chooses the "duplicate format" suboption, they will be creating duplicates and not have a tool to help them identify them. Of course if you and Kovid happened to like the proposal enough to implement the automerge changes so they appeared in Calibre first, that would be just marvellous  . As you say those changes are far less work to implement.Quote: 
	
 Quote: 
	
 And quite frankly if it is just you and me showing any interest in the idea here it won't be very high in my priority list to implement it. I would love more people to comment on whether they think it is a flawed/bad idea, or they would love to see it in Calibre. I won't be offended if they think it's a rubbish idea - on the contrary it would save me many hours of wasted effort. There is always "another way" - but today with Calibre your only choice for ensuring you don't accidentally throw away a better format of a book when adding is to either have automerge off (with various issues that creates) or intentionally give it a different name (requiring you to "know" it was a duplicate first).  | 
|||||
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#38 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Just so you know my current development priorities are unlikely to include working on automerge/duplicates development, so dont wait for me. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	And I vote for a separate dialog for duplicate detection, but my vote is not a veto for doing it in the book list, I just think it will be cleaner to code and have more functionality in a separate dialog.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#39 | |
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,466 
				Karma: 10684861 
				Join Date: May 2006 
				
				
				
				Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 I can imagine that an initial search of each book against all other books would take forever and a bit, but with large libraries users could do this in batches (marking all already checked ebooks) or let it run overnight. The real challenge would be inventing user interface that would offer user groups of identified duplicated letting user accept or reject merging. Also "fuzziness" of the search would have to be carefully balanced so it finds duplicates where author name and title differs somewhat Stephen_King_-_Pet_cemetery_The vs. King_s._-_The_pet_cemetery and yet it doesn't come up with too many false positives.  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#40 | ||
| 
			
			
			
			 Calibre Plugins Developer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,735 
				Karma: 2208556 
				Join Date: Oct 2010 
				Location: Australia 
				
				
				Device: Kindle Oasis 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Quote: 
	
 If you have any further thoughts on what you would/would not want to see on this feel free to drop me an email or PM here if not on the thread. I'll be looking for further comments and feedback before I start coding anything anyways.  | 
||
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#41 | |
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004 
				Karma: 177841 
				Join Date: Dec 2009 
				
				
				
				Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Any comments?  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#42 | 
| 
			
			
			
			 creator of calibre 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609 
				Karma: 28549044 
				Join Date: Oct 2006 
				Location: Mumbai, India 
				
				
				Device: Various 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Fine by me
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#43 | |
| 
			
			
			
			 Calibre Plugins Developer 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,735 
				Karma: 2208556 
				Join Date: Oct 2010 
				Location: Australia 
				
				
				Device: Kindle Oasis 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 (1) Kovid's legacy code is an interactive prompted option - and which I thnk it has to be if you are only matching on title. Personally I would never use it due to all the false positives from not comparing authors but fair enough if others find it useful. However my comment is are you saying it will be an "automerge" option to automatically merge on title, or an "automerge" option to not actually automerge and instead be interactively prompted? (2) So am I right in saying your list will *not* (as yet at least) include the option that sparked this thread and several others of creating a duplicate book entry for when a duplicate format is encountered, but merge formats where they are missing?  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#44 | ||
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004 
				Karma: 177841 
				Join Date: Dec 2009 
				
				
				
				Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			OK 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Quote: 
	
 (The interface is done, and overwrite and ignore are done - I've still got some work to do on New Record creation for duplicate records.) Quote: 
	
  
		 | 
||
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#45 | 
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004 
				Karma: 177841 
				Join Date: Dec 2009 
				
				
				
				Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I'm amazed you can follow all these threads and still write code! 
		
	
		
		
		
		
		
		
		
		
		
		
	
	A question: If I write the automerge option box this way: Code: 
	        choices = [(_('Ignore'), 'ignore'), (_('Overwrite'), 'overwrite'),
            (_('New Record'), 'new record')]
        r('automerge', gprefs, choices=choices)
Code: 
	if gprefs['automerge'] == 'overwrite':  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
![]()  | 
            
        
            
| Tags | 
| duplicate | 
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Duplicate Detection | albill | Calibre | 2 | 10-26-2010 03:21 PM | 
| Help with Chapter detection | ubergeeksov | Calibre | 0 | 09-02-2010 05:56 AM | 
| Device Detection doom | Alberto Franches | Calibre | 6 | 06-24-2010 06:38 PM | 
| Device detection? | totanus | ePub | 1 | 12-17-2009 08:05 AM | 
| Structure detection v5.5 and v6.2 | AlexBell | Calibre | 2 | 07-29-2009 11:11 PM |