View Single Post
Old 02-07-2011, 01:22 PM   #9
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kiwidude View Post
I will butt in since this is a topic of great interest to me currently.
As far as I'm concerned - your help is always appreciated.
Quote:
Firstly, have you read the Duplicate Detection thread in this forum? That discusses some changes and additions to Calibre we are in the process of making. Feedback on that thread as to what sounds useful or not is always welcomed (particularly as the plugin which will "find" duplicates has not been written yet and there's a few ways we can approach it).
Yes, comments are welcome. The new automerge code is now in Kovid's hands, but he'll always consider improvements.
Quote:
As to "instructions", from a Calibre perspective Starson has given you what you need to do if you decide to try that approach. You just need to be aware of the implications:
- It will only find duplicates where the authors exactly match. There is no "fuzzy matching" on authors.
Correct.
Quote:
- You really have very little control over which version will be kept if you have duplicates of a format.
I'm not completely sure how much control you have for Copy to Library. We'd have to test it or ask Kovid. I suspect, however, that the order of the processing for Copy to Library will be according to the sorted selection order. As each selected book is processed, each format for that book is compared to the contents of the new library being constructed by the CTL code. Merging is done with the automerge code (assuming automerge is on). In that case, the first book the CTL code handles will be the final book. (Unlike the manual Merge cod I wrote, automerge has never merged metadata.)

Note that the new code in the linked thread only applies to automerge of incoming books. I did not replicate that code to automerge for Copy to Library. My suggestion to Calliastra (the OP) was to consider using Copy to Library. (In that scenario, there is always only one "identical book" since automerge in it's current form silently ignores duplicate formats. It's guaranteed to merge up all books with identical titles and similar titles. It can't currently make duplicate records with automerge on.

Quote:
As Starson says above it is done by order of "selection" - but if you are doing a bulk library all at once that "selection order" won't mean too much. You could maybe sort by date or something but unless you investigate each book one by one you won't know which version to keep and it could be pot luck.
Yes.

Quote:
There's a few other threads in the forum if you look around at approaches people have taken. At the moment I have my own tool outside of Calibre that does fuzzy matches of authors and/or titles, doing direct sql queries against the Calibre database. Other people have their own tools/scripts, some of which were made available. Hopefully we will have a Calibre plugin soon (I've offered to write it but anyone is welcome to beat me to it), but we need to make decisions about it before I start and that discussion should be kept to the other thread.
I assume you've seen the other duplicate finding threads I contributed to. They all have simple SQL queries for duplicate titles or duplicate author/title

Quote:
Certainly the 1.0 version may "only" have the exact same comparison logic Starson's automerge functionality has - of exact match on author, fuzzy on title.
That's fine by me, but remember that the reason I didn't fuzzy match more aggressively is that automerge is AUTOmatic. An error in automerge meant a lost book format. If we are doing duplicate finding with manual review on a dialog screen with merging manually controlled to find the best format or not merge, we can be much more aggressive.
Starson17 is offline   Reply With Quote