Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 01-30-2012, 04:40 PM   #196
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
This plugin cannot "auto-select" in the same way it cannot "auto-merge" - because it is impossible for the plugin to decide which of multiple versions the user wants to keep. So it is a case of "you made the mess, you clean it up" I'm afraid
kiwidude is offline   Reply With Quote
Old 01-30-2012, 04:46 PM   #197
grantshoarma
Member
grantshoarma began at the beginning.
 
Posts: 10
Karma: 10
Join Date: May 2010
Device: kindle 2
Well again, I must be misunderstanding something... If they are all exact duplicates, in the case of a binary duplicate, why does it matter which one you choose to keep?
grantshoarma is offline   Reply With Quote
 
Enthusiast
Old 01-30-2012, 04:53 PM   #198
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,274
Karma: 5495472
Join Date: Aug 2009
Location: The (original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by grantshoarma View Post
Well again, I must be misunderstanding something... If they are all exact duplicates, in the case of a binary duplicate, why does it matter which one you choose to keep?
It probably doesn't matter, but the PI does not have the smarts (that is why Humans are still in use ) to pick one out of however many you have to keep.
theducks is offline   Reply With Quote
Old 01-30-2012, 04:57 PM   #199
grantshoarma
Member
grantshoarma began at the beginning.
 
Posts: 10
Karma: 10
Join Date: May 2010
Device: kindle 2
It could arbitrarily choose one from each group.
grantshoarma is offline   Reply With Quote
Old 01-30-2012, 05:21 PM   #200
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
A specific file format might be identical - but the other metadata may frequently not be. You might have variations of the author and title - which caused you to not identify it as a duplicate in the very first place remember! Even if title and author match - one of those records may have other formats on it which you want to keep. One or both may have different metadata downloads done to them, with differing covers, descriptions etc.

There are way too many variables to arbitrarily decide which book to delete from a library.
kiwidude is offline   Reply With Quote
Old 01-30-2012, 05:36 PM   #201
grantshoarma
Member
grantshoarma began at the beginning.
 
Posts: 10
Karma: 10
Join Date: May 2010
Device: kindle 2
Quote:
A specific file format might be identical - but the other metadata may frequently not be. You might have variations of the author and title - which caused you to not identify it as a duplicate in the very first place remember!
I thought this was the reason for binary duplicate search- regardless of the metadata, it does a bit-by-bit comparison of the actual file, no? Maybe this is my confusion.

Quote:
Even if title and author match - one of those records may have other formats on it which you want to keep.
Right, I would want to keep all the formats, just not books with identical formats that are also identical books.

Quote:
One or both may have different metadata downloads done to them, with differing covers, descriptions etc.
Can't you do a comparison of ALL data, including metadata? The duplicates I have are 100% duplicates. Same books, same metadata. If I could get rid of those in batch and then hand-delete the handful left over that are the same books with different metadata, it would be quite useful...

Sorry with all the questions, it seems like this probably won't be implemented, but at this point I'm just curious...
grantshoarma is offline   Reply With Quote
Old 01-30-2012, 06:17 PM   #202
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
You are confusing metadata stored in a file, with metadata stored for a book in the calibre database that you see on screen. Think about how you ended up with binary duplicate files in calibre in the first place - how did they get in there? Either the books had the same title and you ignored the warnings given by calibre. Or they were originally the same, but you added one to calibre, slightly tweaked something about the author/title, then added the other book and didn't get warned. Or the book was completely mislabelled when you added it, and its content isnt actually what the metadata describes it to be.

These were likely added over time. Maybe you started cleaning up your library for a book - you downloaded metadata for it. Maybe on one of those rows you did a conversion to another format. All of these actions are going to change calibre's database metadata about that particular row in your library, but none of them affect the actual book content that binary comparison is comparing (not until you do a conversion at least).

So calibre now must have at least two rows each containing the same physical file. We agree one of them has to be deleted. But it is *impossible* for the plugin to guarantee to know which one it should keep. It might delete the one with the "wrong" author name/spelling (it cannot know which is "correct"). The plugin cannot know which cover you might prefer (perhaps one was read from the file, and another came from a metadata download). Maybe one has series information, and the other doesn't (or even has a different series name/index).

There are just too many variables that only a human eyeballing the two can make a decision on. So it ain't gonna happen
kiwidude is offline   Reply With Quote
Old 01-30-2012, 06:30 PM   #203
grantshoarma
Member
grantshoarma began at the beginning.
 
Posts: 10
Karma: 10
Join Date: May 2010
Device: kindle 2
Quote:
You are confusing metadata stored in a file, with metadata stored for a book in the calibre database that you see on screen.
But I'm saying, when both of these are identical, shouldn't we be able to delete one with no problem? As you guessed, I ignored the warnings given by calibre about duplicate titles. I did this because some files may have identical metadata but have different file quality (formatting, typos, etc.), but I didn't anticipate that the vast majority of these files would be absolutely identical...

Bah. Stretching my mouse fingers out as we speak.
grantshoarma is offline   Reply With Quote
Old 01-30-2012, 10:52 PM   #204
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,781
Karma: 12516053
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Quote:
Originally Posted by grantshoarma View Post
But I'm saying, when both of these are identical, shouldn't we be able to delete one with no problem?
The find duplicates plugin acts after the files are already in the calibre library. The duplicate files might be identical, but the data for each calibre record may or may not be identical. If you arbitrarily delete one record from the database it may be the record that the user already cleaned up the calibre metadata or cover that is deleted.

Last edited by DoctorOhh; 01-31-2012 at 04:31 AM.
DoctorOhh is offline   Reply With Quote
Old 01-31-2012, 03:16 AM   #205
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
@grantshoarma - when every single thing is identical - yes it would be possible to arbitrarily decide to delete one of them. However in my opinion that is just two small a scenario to try to code for. Remember it only takes one tiny detail to *not* make them identical (an extra format, or the slightest variation on the title/author/series/cover/description/published date/...).

For most users coming to this plugin they have been using calibre for a long time. Which means they have a large database of books, which they have been working on. Likely over that time period they learnt far more about using calibre, downloaded other metadata plugins, cleaned up some books, changed various settings about how their files were imported, etc, etc. So the likelihood that they have two identical versions of a format isolated in non-touched rows is absolutely minimal (remember calibre would have warned them about duplicates if the titles were identical!).
kiwidude is offline   Reply With Quote
Old 01-31-2012, 07:50 AM   #206
Noughty
Addict
Noughty is cognizant of many things which escape those who dream only by night.Noughty is cognizant of many things which escape those who dream only by night.Noughty is cognizant of many things which escape those who dream only by night.Noughty is cognizant of many things which escape those who dream only by night.Noughty is cognizant of many things which escape those who dream only by night.Noughty is cognizant of many things which escape those who dream only by night.Noughty is cognizant of many things which escape those who dream only by night.Noughty is cognizant of many things which escape those who dream only by night.Noughty is cognizant of many things which escape those who dream only by night.Noughty is cognizant of many things which escape those who dream only by night.Noughty is cognizant of many things which escape those who dream only by night.
 
Posts: 352
Karma: 103850
Join Date: Apr 2011
Device: Kindle NT
I think the biggest problem is for new users. Like I once were. I also had a lot of dupes and it took me awhile to clean up. I was only suggesting the selection of dupes - like you don't need to click every one of them. After that you just need to edit some of the selections. Like click/unclick (I saw it in some dupes finder of the files in computer, all dupes were selected, but if you wanted to change which one to keep you just click on the other). After that user should look through them all and as you said unclick the ones with different formats (although this could be included in the selection and skipped as not dupes), click if they want to choose another etc, look through metadata (of course clicking holding the CTRL). After it's all done, then you can choose what to do with the dupes.
Noughty is offline   Reply With Quote
Old 01-31-2012, 08:16 AM   #207
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
@Noughty - those are all good sensible comments.

Although in this particular situation there is a slightly "unique" situation of using binary compare. For any other kind of duplicate comparison, you are going to have to open each of conflicting formats to decide which you want to keep (if you care about keeping your "best" version of a format). Which by its nature means it is a very manual, time consuming operation as you suggest above.

However for binary duplicates, the user is able to "with impunity" remove all but one of those conflicting book formats, no need to open, the files are binary identical. The problem is that by definition these duplicated files are going to have to be on different calibre book records, so just doing "Remove specific format" on all but one record is not going to be sufficient. Instead what the user *actually* wants is the whole book record deleted.

Now as soon as you start talking about deleting book records, you are back into the scenario I have been trying to explain to decide *which* book record should be kept. Because in reality in the vast majority of cases (imo) there is probably "something" different about them, hence you can't automate the decision as to which should be deleted.

I come back to my questions several posts ago - how did the user get themselves into a situation where they apparently have hundreds or thousands of book rows that are binary duplicates? That has the smell to me of someone making a massive mistake of some kind with their calibre library?

I would expect that a binary comparison for most users would at most produce only a handful of matches, since there are other safeguards in calibre such as duplicate dialog warnings. While it would be possible (via automerge settings set to create new book on duplicate) to circumvent that, how could you possibly manage to do it for hundreds of books? Maybe if you have added books from lots of different sources, with slightly different filenames? In which case we are back again to my point that if the title or author differ the plugin cannot decide which to keep for you.

Those questions are not meant in any way as ridicule btw - they are just curiosity. And because I consider it so rare and the suggested "fix" not being relevant to what I consider the majority of scenarios it isn't something I will change the plugin for.
kiwidude is offline   Reply With Quote
Old 01-31-2012, 09:18 AM   #208
Noughty
Addict
Noughty is cognizant of many things which escape those who dream only by night.Noughty is cognizant of many things which escape those who dream only by night.Noughty is cognizant of many things which escape those who dream only by night.Noughty is cognizant of many things which escape those who dream only by night.Noughty is cognizant of many things which escape those who dream only by night.Noughty is cognizant of many things which escape those who dream only by night.Noughty is cognizant of many things which escape those who dream only by night.Noughty is cognizant of many things which escape those who dream only by night.Noughty is cognizant of many things which escape those who dream only by night.Noughty is cognizant of many things which escape those who dream only by night.Noughty is cognizant of many things which escape those who dream only by night.
 
Posts: 352
Karma: 103850
Join Date: Apr 2011
Device: Kindle NT
I don't have thousands of dupes, my library isn't that big. I have a lot of books/fic/etc which I failed to organize properly in the beginning. Never thinking it will be necessary (never though I'll go from paperbacks to ebooks/electronic versions). All the files are scattered around and many are saved multiple times since I didn't find it when I needed it (now I use Everything search program which is amazing). I am still adding them since many have silly names (now I always save in correct names). Just recently found another folder with books and added it, most of them appeared to be dupes (probably an old back up) so I had to remove most of them from calibre. Basically, all the trouble is because of past mistakes - lack of organizing.
Clicking every second (mostly) entry gets tiring quickly. I just always looks for ways to automatize things
And because of my paranoia I never delete dupes - I just send them to another library. So if I accidentally choose the wrong entry I can always go and get it from the dupes library. That's why automatic selection doesn't scare me

I don't get how all plug ins work, how they are made etc. So I don't know when something is simple to do and something is basically impossible. So thanks for explaining.

I have way bigger problem with comics... Need to mentally prepare to go and straighten that mess out. And figure out how too.
Noughty is offline   Reply With Quote
Old 02-06-2012, 11:38 AM   #209
oventura
Junior Member
oventura began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Feb 2012
Device: Sony PRS-T1
I think too, that it would be great to have the pluggin able tu merge by its own way.
I have thousands of books. And some of them duplicated.
It would take a lot of time to fix it manually.
I would prefer to have some errors by the plugin, than to have so many duplicates now.
What it should do is Merge the metadata and the ebooks.
When a conflict, perhaps use the first one, and in with the books the larger one... or something like this.
oventura is offline   Reply With Quote
Old 02-07-2012, 05:29 AM   #210
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
@oventura - sorry but I have no intention of adding functionality to automatically merge duplicate books, for all the reasons outlined in the posts above and more. The subset of book records that could be safely merged (when the formats do not conflict and neither do any of the metadata) is just too small.

You suggest making a decision based on book size - well size generally means nothing. A book with a high resolution cover will be larger than one without - but just having a better cover means nothing about the quality of the content which could be a crappy PDF conversion with broken paragraphs.

Quite apart from the fact that this plugin reporting a duplicate is not actually a cast-iron guarantee that it is a duplicate (well, unliess you do a binary check). You could have a book with the wrong metadata, that got reported as a match but isn't due to the title or author being wrong. You also might have used one of the "fuzzier" duplicate matching algorithms. By forcing a user to go through their match results you have the opportunity to identify such cases.

So putting in such a feature would inevitably result in people having higher quality copies or formats of books being automatically deleted as the result of the merge. And there is no way I want responsibility for encouraging people to do that with such a feature - it is a sure-fire way to a crappy reading experience.
kiwidude is offline   Reply With Quote
Reply

Tags
cross library duplicates, in library duplicates

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] Generate Cover kiwidude Plugins 482 Today 10:09 AM
[GUI Plugin] Quality Check kiwidude Plugins 736 Today 04:48 AM
[GUI Plugin] Open With kiwidude Plugins 228 Today 01:06 AM
[GUI Plugin] View Manager kiwidude Plugins 79 Yesterday 11:16 PM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM


All times are GMT -4. The time now is 12:00 PM.


MobileRead.com is a privately owned, operated and funded community.