Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 02-11-2011, 10:28 AM   #91
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by chaley View Post
I am suspicious of building a navigation dialog box. My feeling is that there would be a lot of pressure to make it as capable as the library view, including all the metadata edit features.
Yes, I initially thought it would be done this way, and didn't fully appreciate kiwidude's comments about the problems of that approach vs. the power of the library view, but I've come to appreciate the issue/problem and the advantages of keeping things in the library view as much as possible.
Starson17 is offline   Reply With Quote
Old 02-11-2011, 11:24 AM   #92
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
I've noticed that this thread is in Library Management, not Plugins. Duplicate finding strikes me as a universally desirable function, not something that only a few people need. It's also more likely to be used by the new user, who's cleaning up the newly (badly?) entered data, and who may be intimidated by the steps needed to get a plugin installed. I don't spend much time in the plugin subforum, but is there any written criteria for what should be a plugin vs. code enhancement in the trunk?
Starson17 is offline   Reply With Quote
Old 02-11-2011, 11:56 AM   #93
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by Starson17 View Post
I've noticed that this thread is in Library Management, not Plugins. Duplicate finding strikes me as a universally desirable function, not something that only a few people need. It's also more likely to be used by the new user, who's cleaning up the newly (badly?) entered data, and who may be intimidated by the steps needed to get a plugin installed. I don't spend much time in the plugin subforum, but is there any written criteria for what should be a plugin vs. code enhancement in the trunk?
I think that this 'project' could have two distinct phases in its life-cycle. The first is discovery: what needs to be done and how should it be done. This phase is best done largely in a plugin with API support as needed in the trunk, for even faster turnaround than weekly. The second phase is maturity: this phase concentrates on performance enhancement, discoverability, documentation, and tuning. The code could move to trunk, and perhaps should.

One issue that affects the choice is whether or not the code will be hacked by a significant number of people. If so, it should remain as a plugin. I can see this happening if people tune matching algorithms to their situation. Of course, the presupposes that a person has that ability. Many do, but many more do not. It may be that test functions themselves become plugins, while the framework migrates to base functionality.

Another issue is Kovid's view of the future. I have no idea how he feels about all of this.

Charles
chaley is offline   Reply With Quote
Old 02-11-2011, 12:33 PM   #94
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
This should certainly move into trunk eventually. I don't really care if it is developed as a plugin or directly in trunk, with the proviso that if it is in trunk it needs to be committed only once it is fairly complete.
kovidgoyal is online now   Reply With Quote
Old 02-12-2011, 04:54 PM   #95
vitalichka
Enthusiast
vitalichka began at the beginning.
 
Posts: 39
Karma: 10
Join Date: Jan 2011
Device: Nook Color
Thank you kiwidude, you really took the time on this one.
vitalichka is offline   Reply With Quote
Old 02-12-2011, 06:30 PM   #96
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Quote:
Originally Posted by vitalichka View Post
Thank you kiwidude, you really took the time on this one.
yw. I don't mind posting my workflow if nothing else to see if someone else would jump in and tell me that there are better approaches for certain steps.

These to/fro conversations on algorithms are excellent, I apologise for not adding my own thoughts as yet but I do read each post and will revisit it all again when I can. Currently my only time allocated to thinking about duplicate detection is when I sit and write a rambling post.

As for the "way to approach development" of it, 100% agree with Chaley in that we need to agree an approach before we begin, certainly if it comes down to me to do the plugin development. My relative unfamiliarity with Python/Calibre code means I develop at snail pace so calling it "RAD" is a lie in my case . I would be gutted to spend the considerable development time required for this only to find a fundamental flaw requiring a total rewrite. Such as if I had started coding it as a popup dialog as was our favoured approach in this thread for a while. Basic agreement on how duplicate results will be presented, navigated and maintained looks to be the fundamental issue for the plugin. I am less concerned at this point about the "identification algorithms" as they can be added/tweaked over time.

Final comment repeating one I made a while ago that is relevant again given recent posts. My hope was that we would develop it as a plugin initially so that we could get the kinks out without interfering with Calibre releases. And that one day it might get included for distribution with Calibre if Kovid deemed it useful. Confirmation from Kovid that he would be interested in including it when ready is great. We just need to produce something worthwhile to be included of course...
kiwidude is offline   Reply With Quote
Old 02-14-2011, 10:04 AM   #97
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kiwidude View Post
I am less concerned at this point about the "identification algorithms" as they can be added/tweaked over time.
I agree we don't need to think about any specific "identification of duplicates" algorithm now. However, we probably do need to think about whether there will be multiple different algorithms (presumably selected by the user) or a single one selected by the code author(s). If there's only one algorithm, then avoiding false positives in multiple runs is probably easier than if there are multiple algorithms.

If there are multiple algorithms, one approach is to use Charles' idea about multiple columns, one for each algorithm, to track and avoid false positives when/if that algorithm is run again. Another approach would be to store is_multiple tag keys for each book:
algorithm1#-book2id-book3id-book4id,
algorithm2#-book2id-book5id,
algorithm3#-book2id-book3id

For this book (it's book1), three duplicate/matching algorithms have been run. When the first (identified as algorithm1#) was run, it found book book2, book3 and book4 as matches, but the user said they were not matches, and that info on false positives was stored against algorithm #1 for book1

When algorithm2# was run, it found book2 and book5 as false positives (any other dupes it found would have been merged into book1). Presumably this algorithm did not think that book3 or book 4 were dupes of book1, because if it had, the user presumably would have marked them as false positives, too.

When algorithm3# was run, it found books 2 and 3 (but not 4 or 5)

I'm inclined to think that offering multiple search algorithms is a necessary feature. Avoiding false positives on multiple runs of each algorithm would be nice, but could be added later, provided we structure things in a way that doesn't exclude adding that feature.
Starson17 is offline   Reply With Quote
Old 02-14-2011, 10:16 AM   #98
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kiwidude View Post
My hope was that we would develop it as a plugin initially so that we could get the kinks out without interfering with Calibre releases. And that one day it might get included for distribution with Calibre if Kovid deemed it useful. Confirmation from Kovid that he would be interested in including it when ready is great. We just need to produce something worthwhile to be included of course...
I think Kovid was pretty clear, he agrees it will be useful, so it should eventually be added to the trunk. I have no objection to development as a plugin (and that's what Kovid agrees is a good approach), but once the basic structure is settled in the plugin it should move to the trunk code. Features like this don't interfere with Calibre releases, as long as they don't blow up other parts of the code. Putting them in the trunk draws lots of users with their valuable feedback and gives everyone a tool, even when they aren't comfortable installing plugins.

I think the main worry is that once it's in the trunk, we don't want to fundamentally change the appearance or move too many options around or change defaults. Users get accustomed to new features very quickly. In a plugin , we can play with drastically different approaches. In the trunk, we need to worry about "The Calibre Experience" - something that is already hard enough to dance, without changing the dance floor underneath everyone
Starson17 is offline   Reply With Quote
Old 02-18-2011, 08:17 AM   #99
Schaapje82
Member
Schaapje82 began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Jul 2010
Device: Bebook Neo
Maybe a bit off topic, but i noticed that the option 'copy to clipboard' is gone in the window that lists the duplicates after adding new books to the library. Why is it gone? I found it very useful. I always chose to not add the books, copy the results to clipboard and look up the titles manually to see wheter it really were duplicates. (I sometimes have books with the same title but from diffrent authors. They are marked as duplicaties, but aren't).
Anyway, I was just curious why this option was removed.
Schaapje82 is offline   Reply With Quote
Old 02-18-2011, 08:43 AM   #100
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Schaapje82 View Post
i noticed that the option 'copy to clipboard' is gone in the window that lists the duplicates after adding new books to the library. Why is it gone?
I don't know why for sure, but perhaps because it was duplicative? You can just select "Show details," then highlight and copy the list to the clipboard that way.
Starson17 is offline   Reply With Quote
Old 02-18-2011, 10:23 AM   #101
Schaapje82
Member
Schaapje82 began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Jul 2010
Device: Bebook Neo
Quote:
Originally Posted by Starson17 View Post
I don't know why for sure, but perhaps because it was duplicative? You can just select "Show details," then highlight and copy the list to the clipboard that way.
I tried that, but I got an error. Will try again some time when I have more books to add.
Schaapje82 is offline   Reply With Quote
Old 02-18-2011, 10:26 AM   #102
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Schaapje82 View Post
I tried that, but I got an error. Will try again some time when I have more books to add.
Not that it helps you, but it worked for me.
Starson17 is offline   Reply With Quote
Old 03-20-2011, 01:09 PM   #103
klaas
Klaas
klaas began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Mar 2011
Location: Germany
Device: Kindle DXG (Internatioal)
Is there any progress on a the plugin/feature? I like the idea of a nice duplicate detection feature in calibre. So far a lot of people are doing there own duplicate detection, maybe together we get a nice and working solution.

Has someone started anything yet? I have never done anything in python so far I did everything in java and ruby but maybe I can help with python as well If someone has started it would be great to be pointed to the general stuff that is decided the sourcecode repository and everything else useful. I would like to look at the stuff, learn and try to help.

Regards Klaas
klaas is offline   Reply With Quote
Old 03-20-2011, 02:14 PM   #104
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@klaas. In a word, no. Or rather no progress by myself, and as chaley/Starson17 both indicated they had no interest in writing it themselves my guess is that no-one else has either.

At the time of volunteering to write this I had both a very strong personal need and plenty of time on my hands. However the time got swallowed up by writing and enhancing at least 15 other plugins over the last few months. Right now I am to be honest rather burnt out from the hundreds of hours spent on those, plus I need to spend time on some other more important things in life for the next month or two.

Also I circumvented my personal need for this by "starting again" with a second library, which I am very slowly building up author by author in a very controlled manner, bringing in just the best possible epub I have (or format I can convert to EPUB) and processing as I go. If you add to your library this way there is no duplicate problem to have to solve. It is very slow going, but the reality is that I still already have way more books in the new library than I can possibly read in the short term. And my library contains perfect metadata with quality ready to read versions rather than many thousands of books just for the sake of having them in Calibre. I will continue to work through my preferred/favourite authors first and just prioritise the rest based on recommendations etc. When any friends/family ask for someone I haven't processed as yet I just do a search using Windows Explorer on the many GBs of raw files and pull them into a working directory to find the best and chuck the rest.

When my time frees up again if there is still no sign of this plugin by others then I will take another look at, as I know there are lots of people who would use it. I've written a number of other plugins now that use parts of the Calibre API that this would require, such as working with custom columns, the underlying search/data caches etc so I feel more "prepared" than a few months ago. But it is many hours work which I can't spare at the moment.

If you or someone else gives it a go, good luck and go for it! I would suggest either feeding back in here or preferably in the Developers forum the approach being taken etc. There are some very smart Calibre developers out there like chaley/Starson17 and of course Kovid who can help point you in the right directions, though I suggest you invest a lot of time looking through the Calibre code and some of the other plugins first.
kiwidude is offline   Reply With Quote
Old 03-20-2011, 10:50 PM   #105
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by kiwidude View Post
And my library contains perfect metadata with quality ready to read versions rather than many thousands of books just for the sake of having them in Calibre. I will continue to work through my preferred/favourite authors first and just prioritise the rest based on recommendations etc. When any friends/family ask for someone I haven't processed as yet I just do a search using Windows Explorer on the many GBs of raw files and pull them into a working directory to find the best and chuck the rest.
This process sounds very familiar in that it is exactly how I am tackling my calibre library.
DoctorOhh is offline   Reply With Quote
Reply

Tags
duplicate


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Duplicate Detection albill Calibre 2 10-26-2010 02:21 PM
Help with Chapter detection ubergeeksov Calibre 0 09-02-2010 04:56 AM
Device Detection doom Alberto Franches Calibre 6 06-24-2010 05:38 PM
Device detection? totanus ePub 1 12-17-2009 07:05 AM
Structure detection v5.5 and v6.2 AlexBell Calibre 2 07-29-2009 10:11 PM


All times are GMT -4. The time now is 12:21 PM.


MobileRead.com is a privately owned, operated and funded community.