05-09-2011, 08:48 AM | #1 |
Calibre Plugins Developer
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Metatdata API - prioritising results
Having spent about 3 hours of adding print statements and tracing through this code I am pulling my hair out so perhaps (most likely Kovid) can enlighten me.
I have a book which returns three ISBN matches from B&N: 9781101135570 9780451229939 9780594256113 - has no cover These results then get sorted by the InternalMetadataCompareKeyGen to: 9780451229939 9781101135570 9780594256113 All good so far - the results are sorted in order of "preferred match". Then the lookup via XISBN takes place, and it finds the ISBNs ending in 939/570 are in the same "pool". So it takes the first result and discards the second. So now we have: 9780451229939 9780594256113 Now the final merge takes place. HOWEVER the merge of identifiers in ISBNMerge.merge() is done by this code: Code:
for r in results: ans.identifiers.update(r.identifiers) So now my final result being given back is given the ISBN of 9780594256113 - which is the ISBN that does NOT have a cover, and is my least preferred match. Is this a bug or am I missing something? |
05-09-2011, 09:17 AM | #2 |
Grand Sorcerer
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Just for grins, what happens if you change that code to be:
Code:
for r in reversed(results): ans.identifiers.update(r.identifiers) EDIT: Never mind. 'results' is not a list. Last edited by chaley; 05-09-2011 at 09:21 AM. |
05-09-2011, 09:28 AM | #3 |
Calibre Plugins Developer
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Hi Charles - I'm not sure what you mean by "identifiers is not ordered"?
Certainly that now gives me the result I want. I guess the question is was this intentional or an accidental oversight by Kovid? It seems "all bets are off" when it comes to merge time - it isn't the case of take the first result then merge the rest into it which is what I would have thought from how the "old code" used to work. It is a case of merging each field in isolation which could come from any result left at that point (after XISBN pool rationalising). So your net result could be a "mish-mash" of data from the results you have returned. Perhaps most of the time this isn't a problem, it just wasn't quite what I assumed would be the effect of prioritising results. Last edited by kiwidude; 05-09-2011 at 09:32 AM. |
05-09-2011, 09:36 AM | #4 |
Grand Sorcerer
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Never mind again. I thought that "results" was a dict, but it is instead a list, so reversing it may make sense.
The problem with reversing dicts is that the key traversal order is not defined. It isn't particularly useful reversing an undefined order. Your original comment, "most likely Kovid" is probably right. It isn't clear to me that the order of items in 'result' is significant. It might be accidental that the one you want is or isn't first. Edit: I edited, then you edited, then I edited. Is this a multi-threaded conversation? |
05-09-2011, 10:04 AM | #5 |
Calibre Plugins Developer
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Haha, yeah hard to keep track of isn't it?
My initial response would be automatically to say that yes, order is significant. However that response is based around an assumption that on inspection of the code as per my edits is incorrect. Right now it seems to me that the only place order is being "respected" is in the part of the code that creates the XISBN pools - anything but your first result for a pool will be discarded. I would have "thought" that priority should continue to play a part when it came to merging identifiers as well - as per my example above of the two results I am left with, one is my "excellent" match with a high quality cover and lots of good metadata, the other is a less quality match from having no cover (it could also be the case it has very little metadata as well). So as a user you would want the hyperlinked id's of ISBN/B&N/Goodreads or whatever in the book details panel to be going to your "best match", likewise if you did ctrl+d on it again to get fresh metadata it will now use the ISBN as a lookup so again you want your "best". I never found this issue with my Goodreads plugin because I only return one result (well unless you enable the option to search multiple editions, but as that is slower I turned it off by default I tested it less). Perhaps it is just a simple Kovid oversight, it is not often I have the confidence in my understanding of the code to call it officially as a bug |
05-09-2011, 10:56 AM | #6 |
creator of calibre
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I'm confused, I cannot look at the code right now, but IIRC, merging only happens for results in the same pool. Either ISBN pool or title/author pool.
In the first case what you describe cannot happen since the results are in separate pools. Is it happening in the second case and if it is, then I'm not sure what can be done about it since in general the results are the result of merging metadata from different metadata sources and therefore comparing priorities is meaningless. The only fix I can see for this is to have a pre ISBN merge filter that throws away lower priority results from each source when a result wth the same title and author exists that has a higher priority. |
05-09-2011, 11:22 AM | #7 |
Calibre Plugins Developer
Posts: 4,637
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
It is a title/author search, with multiple results from the same source (I only have one source enabled atm).
That third result (which ends up being the "chosen result") has no matches in XISBN, so ends up in a "pool of its own" rather than being merged with the other two. Quite why that is I don't know, it is just a hardback edition of the same book, perhaps either B&N have the wrong ISBN or the XISBN database is out of date. Or maybe that is expected. Here is the search results on B&N. I confess to not entirely understanding all the voodoo going on underneath or its intentions. I can only tell you what my print statements are saying is getting merged at various points in the process And the net result is the "wrong" one in this case imho. |
05-09-2011, 11:31 AM | #8 |
creator of calibre
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Well, like I said, the only solution is to throw away results with the same title/author and lower priority from the same source, before merging. Open a ticket for it and I will implement it when i return.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
calibre's new plugin API | kovidgoyal | Plugins | 26 | 05-07-2011 02:43 PM |
New metadata API in 0.8 questions | kiwidude | Development | 38 | 04-18-2011 10:42 AM |
ePubs and Google Font API | Justin Rotkowitz | ePub | 1 | 03-29-2011 11:33 AM |
Goodreads has published an API | EricLandes | Calibre | 6 | 01-12-2011 04:39 PM |
Ubook plugin api | Dopedangel | Reading and Management | 0 | 08-25-2007 06:54 AM |