02-11-2011, 10:28 AM | #91 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Yes, I initially thought it would be done this way, and didn't fully appreciate kiwidude's comments about the problems of that approach vs. the power of the library view, but I've come to appreciate the issue/problem and the advantages of keeping things in the library view as much as possible.
|
02-11-2011, 11:24 AM | #92 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I've noticed that this thread is in Library Management, not Plugins. Duplicate finding strikes me as a universally desirable function, not something that only a few people need. It's also more likely to be used by the new user, who's cleaning up the newly (badly?) entered data, and who may be intimidated by the steps needed to get a plugin installed. I don't spend much time in the plugin subforum, but is there any written criteria for what should be a plugin vs. code enhancement in the trunk?
|
Advert | |
|
02-11-2011, 11:56 AM | #93 | |
Grand Sorcerer
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
One issue that affects the choice is whether or not the code will be hacked by a significant number of people. If so, it should remain as a plugin. I can see this happening if people tune matching algorithms to their situation. Of course, the presupposes that a person has that ability. Many do, but many more do not. It may be that test functions themselves become plugins, while the framework migrates to base functionality. Another issue is Kovid's view of the future. I have no idea how he feels about all of this. Charles |
|
02-11-2011, 12:33 PM | #94 |
creator of calibre
Posts: 43,863
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
This should certainly move into trunk eventually. I don't really care if it is developed as a plugin or directly in trunk, with the proviso that if it is in trunk it needs to be committed only once it is fairly complete.
|
02-12-2011, 04:54 PM | #95 |
Enthusiast
Posts: 39
Karma: 10
Join Date: Jan 2011
Device: Nook Color
|
Thank you kiwidude, you really took the time on this one.
|
Advert | |
|
02-12-2011, 06:30 PM | #96 |
Calibre Plugins Developer
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
yw. I don't mind posting my workflow if nothing else to see if someone else would jump in and tell me that there are better approaches for certain steps.
These to/fro conversations on algorithms are excellent, I apologise for not adding my own thoughts as yet but I do read each post and will revisit it all again when I can. Currently my only time allocated to thinking about duplicate detection is when I sit and write a rambling post. As for the "way to approach development" of it, 100% agree with Chaley in that we need to agree an approach before we begin, certainly if it comes down to me to do the plugin development. My relative unfamiliarity with Python/Calibre code means I develop at snail pace so calling it "RAD" is a lie in my case . I would be gutted to spend the considerable development time required for this only to find a fundamental flaw requiring a total rewrite. Such as if I had started coding it as a popup dialog as was our favoured approach in this thread for a while. Basic agreement on how duplicate results will be presented, navigated and maintained looks to be the fundamental issue for the plugin. I am less concerned at this point about the "identification algorithms" as they can be added/tweaked over time. Final comment repeating one I made a while ago that is relevant again given recent posts. My hope was that we would develop it as a plugin initially so that we could get the kinks out without interfering with Calibre releases. And that one day it might get included for distribution with Calibre if Kovid deemed it useful. Confirmation from Kovid that he would be interested in including it when ready is great. We just need to produce something worthwhile to be included of course... |
02-14-2011, 10:04 AM | #97 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
If there are multiple algorithms, one approach is to use Charles' idea about multiple columns, one for each algorithm, to track and avoid false positives when/if that algorithm is run again. Another approach would be to store is_multiple tag keys for each book: algorithm1#-book2id-book3id-book4id, algorithm2#-book2id-book5id, algorithm3#-book2id-book3id For this book (it's book1), three duplicate/matching algorithms have been run. When the first (identified as algorithm1#) was run, it found book book2, book3 and book4 as matches, but the user said they were not matches, and that info on false positives was stored against algorithm #1 for book1 When algorithm2# was run, it found book2 and book5 as false positives (any other dupes it found would have been merged into book1). Presumably this algorithm did not think that book3 or book 4 were dupes of book1, because if it had, the user presumably would have marked them as false positives, too. When algorithm3# was run, it found books 2 and 3 (but not 4 or 5) I'm inclined to think that offering multiple search algorithms is a necessary feature. Avoiding false positives on multiple runs of each algorithm would be nice, but could be added later, provided we structure things in a way that doesn't exclude adding that feature. |
|
02-14-2011, 10:16 AM | #98 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
I think the main worry is that once it's in the trunk, we don't want to fundamentally change the appearance or move too many options around or change defaults. Users get accustomed to new features very quickly. In a plugin , we can play with drastically different approaches. In the trunk, we need to worry about "The Calibre Experience" - something that is already hard enough to dance, without changing the dance floor underneath everyone |
|
02-18-2011, 08:17 AM | #99 |
Member
Posts: 16
Karma: 10
Join Date: Jul 2010
Device: Bebook Neo
|
Maybe a bit off topic, but i noticed that the option 'copy to clipboard' is gone in the window that lists the duplicates after adding new books to the library. Why is it gone? I found it very useful. I always chose to not add the books, copy the results to clipboard and look up the titles manually to see wheter it really were duplicates. (I sometimes have books with the same title but from diffrent authors. They are marked as duplicaties, but aren't).
Anyway, I was just curious why this option was removed. |
02-18-2011, 08:43 AM | #100 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I don't know why for sure, but perhaps because it was duplicative? You can just select "Show details," then highlight and copy the list to the clipboard that way.
|
02-18-2011, 10:23 AM | #101 |
Member
Posts: 16
Karma: 10
Join Date: Jul 2010
Device: Bebook Neo
|
|
02-18-2011, 10:26 AM | #102 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
03-20-2011, 01:09 PM | #103 |
Klaas
Posts: 2
Karma: 10
Join Date: Mar 2011
Location: Germany
Device: Kindle DXG (Internatioal)
|
Is there any progress on a the plugin/feature? I like the idea of a nice duplicate detection feature in calibre. So far a lot of people are doing there own duplicate detection, maybe together we get a nice and working solution.
Has someone started anything yet? I have never done anything in python so far I did everything in java and ruby but maybe I can help with python as well If someone has started it would be great to be pointed to the general stuff that is decided the sourcecode repository and everything else useful. I would like to look at the stuff, learn and try to help. Regards Klaas |
03-20-2011, 02:14 PM | #104 |
Calibre Plugins Developer
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@klaas. In a word, no. Or rather no progress by myself, and as chaley/Starson17 both indicated they had no interest in writing it themselves my guess is that no-one else has either.
At the time of volunteering to write this I had both a very strong personal need and plenty of time on my hands. However the time got swallowed up by writing and enhancing at least 15 other plugins over the last few months. Right now I am to be honest rather burnt out from the hundreds of hours spent on those, plus I need to spend time on some other more important things in life for the next month or two. Also I circumvented my personal need for this by "starting again" with a second library, which I am very slowly building up author by author in a very controlled manner, bringing in just the best possible epub I have (or format I can convert to EPUB) and processing as I go. If you add to your library this way there is no duplicate problem to have to solve. It is very slow going, but the reality is that I still already have way more books in the new library than I can possibly read in the short term. And my library contains perfect metadata with quality ready to read versions rather than many thousands of books just for the sake of having them in Calibre. I will continue to work through my preferred/favourite authors first and just prioritise the rest based on recommendations etc. When any friends/family ask for someone I haven't processed as yet I just do a search using Windows Explorer on the many GBs of raw files and pull them into a working directory to find the best and chuck the rest. When my time frees up again if there is still no sign of this plugin by others then I will take another look at, as I know there are lots of people who would use it. I've written a number of other plugins now that use parts of the Calibre API that this would require, such as working with custom columns, the underlying search/data caches etc so I feel more "prepared" than a few months ago. But it is many hours work which I can't spare at the moment. If you or someone else gives it a go, good luck and go for it! I would suggest either feeding back in here or preferably in the Developers forum the approach being taken etc. There are some very smart Calibre developers out there like chaley/Starson17 and of course Kovid who can help point you in the right directions, though I suggest you invest a lot of time looking through the Calibre code and some of the other plugins first. |
03-20-2011, 10:50 PM | #105 | |
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
Quote:
|
|
Tags |
duplicate |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Duplicate Detection | albill | Calibre | 2 | 10-26-2010 02:21 PM |
Help with Chapter detection | ubergeeksov | Calibre | 0 | 09-02-2010 04:56 AM |
Device Detection doom | Alberto Franches | Calibre | 6 | 06-24-2010 05:38 PM |
Device detection? | totanus | ePub | 1 | 12-17-2009 07:05 AM |
Structure detection v5.5 and v6.2 | AlexBell | Calibre | 2 | 07-29-2009 10:11 PM |