MobileRead Forums - View Single Post

kiwidude · 02-01-2011, 04:20 PM

Quote:

Originally Posted by Starson17

Dialog window or GUI plugin - I haven't enough experience with the latter to know if one is better or not. I find plugins to be sort of a pain to find and install.

It wasn't so much a "dialog window or plugin" choice as "popup dialog window or library view" one. What I had in mind was a right-click action called something like "Find duplicates" - which means it must be implemented as a plugin. Of course if Kovid liked the plugin enough eventually he might include it with Calibre which removes any find/install issues you allude to.

Personally I think popup dialog window will be the way to go to focus the dialog on the task at hand, custom colouring to indicate the groups of duplicate books, a few columns more useful to duplicate resolution etc. However my point was that doing that will mean a lot of functionality users may take for granted on the library view (such as customisable column displays, right-clicks for other actions etc) will not be available, initially at least.

Quote:

It seems fast enough, even on libraries of more than 15K books.

Sorry I meant that comparing every book in the entire library as a possible duplicate will be slow, which you agreed to later in the post, not that comparing one book at a time was. As I put in a later post rather than comparing "all books" all of the time, the user could be prompted to compare only a subset such as those added today, this week, month etc. Once they do an initial "all books" cleanup it could be done incrementally.

Quote:

I think you have the order wrong. Automerge is easier to play with than duplicate detection. In automerge, you have one book at a time being considered. Currently, it just checks if the automerge option is on, then does the automerge thing for each book, checking to see if there are any near dupes.

In terms of the "order" I was thinking about the find duplicates plugin as "first" for a number of reasons.
(1) if people wanted it (and Kovid etc was too busy on other things) I could develop it completely independently of any changes to Calibre source, unlike changes to automerge require.
(2) There will be many users out there who have never found or intentionally not used the automerge option and have a library with duplicates they want help with identifying
(3) Once (if) the automerge suboptions get added and a user chooses the "duplicate format" suboption, they will be creating duplicates and not have a tool to help them identify them.

Of course if you and Kovid happened to like the proposal enough to implement the automerge changes so they appeared in Calibre first, that would be just marvellous

. As you say those changes are far less work to implement.

Quote:

You could just as easily check one of three options stored near the automerge option, and handle all incoming books according to that option (ignore, overwrite, or add as new dupe record) or you can present that question for each book (preferably with an option to do the selected thing for all the rest of the books). It's not too hard, as each book is being handled individually.

Totally agree on that is what I would like to see. If automerge is off, you get prompted with a dialog per book with the three options and an ability to "apply to all". If automerge is on, it silently applies whatever suboption you specified in preferences.

Quote:

Duplicate detection seems to me to be the harder case. All books are compared against all other books. You have to make groups of duplicates.

You may have 3 copies of book 1, two copies of book 2, 4 copies of book 3, but one of the 4 copies of book 3 isn't really a dupe and needs to be excluded from the merge, etc. I suppose you could do duplicate detection the same way - individually check each book against the entire dataset, but that would be comparable to adding the entire library to itself - that does take a lot of time.

Agree again it is the harder case. It will be a fair bit of development work.

And quite frankly if it is just you and me showing any interest in the idea here it won't be very high in my priority list to implement it. I would love more people to comment on whether they think it is a flawed/bad idea, or they would love to see it in Calibre. I won't be offended if they think it's a rubbish idea - on the contrary it would save me many hours of wasted effort.

There is always "another way" - but today with Calibre your only choice for ensuring you don't accidentally throw away a better format of a book when adding is to either have automerge off (with various issues that creates) or intentionally give it a different name (requiring you to "know" it was a duplicate first).