Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 05-09-2011, 09:48 AM   #1
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,230
Karma: 1345754
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
Metatdata API - prioritising results

Having spent about 3 hours of adding print statements and tracing through this code I am pulling my hair out so perhaps (most likely Kovid) can enlighten me.

I have a book which returns three ISBN matches from B&N:
9781101135570
9780451229939
9780594256113 - has no cover

These results then get sorted by the InternalMetadataCompareKeyGen to:
9780451229939
9781101135570
9780594256113

All good so far - the results are sorted in order of "preferred match".

Then the lookup via XISBN takes place, and it finds the ISBNs ending in 939/570 are in the same "pool". So it takes the first result and discards the second. So now we have:
9780451229939
9780594256113

Now the final merge takes place. HOWEVER the merge of identifiers in ISBNMerge.merge() is done by this code:
Code:
        for r in results:
            ans.identifiers.update(r.identifiers)
Which effectively says take the LAST isbn value from the results it is given, as each "update" will overwrite the ISBN set previously?

So now my final result being given back is given the ISBN of 9780594256113 - which is the ISBN that does NOT have a cover, and is my least preferred match.

Is this a bug or am I missing something?
kiwidude is offline   Reply With Quote
Old 05-09-2011, 10:17 AM   #2
chaley
"chaley", not "charley"
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 5,818
Karma: 1216136
Join Date: Jan 2010
Location: France
Device: Many android devices
Just for grins, what happens if you change that code to be:
Code:
for r in reversed(results):
    ans.identifiers.update(r.identifiers)
My thought is that if results is sorted highest-priority first, then processing them in reverse will leave the one you want. However, if identifiers is not ordered, then this suggestion is bogus.

EDIT: Never mind. 'results' is not a list.

Last edited by chaley; 05-09-2011 at 10:21 AM.
chaley is offline   Reply With Quote
Old 05-09-2011, 10:28 AM   #3
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,230
Karma: 1345754
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
Hi Charles - I'm not sure what you mean by "identifiers is not ordered"?

Certainly that now gives me the result I want. I guess the question is was this intentional or an accidental oversight by Kovid?

It seems "all bets are off" when it comes to merge time - it isn't the case of take the first result then merge the rest into it which is what I would have thought from how the "old code" used to work. It is a case of merging each field in isolation which could come from any result left at that point (after XISBN pool rationalising). So your net result could be a "mish-mash" of data from the results you have returned. Perhaps most of the time this isn't a problem, it just wasn't quite what I assumed would be the effect of prioritising results.

Last edited by kiwidude; 05-09-2011 at 10:32 AM.
kiwidude is offline   Reply With Quote
Old 05-09-2011, 10:36 AM   #4
chaley
"chaley", not "charley"
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 5,818
Karma: 1216136
Join Date: Jan 2010
Location: France
Device: Many android devices
Never mind again. I thought that "results" was a dict, but it is instead a list, so reversing it may make sense.

The problem with reversing dicts is that the key traversal order is not defined. It isn't particularly useful reversing an undefined order.

Your original comment, "most likely Kovid" is probably right. It isn't clear to me that the order of items in 'result' is significant. It might be accidental that the one you want is or isn't first.

Edit: I edited, then you edited, then I edited. Is this a multi-threaded conversation?
chaley is offline   Reply With Quote
Old 05-09-2011, 11:04 AM   #5
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,230
Karma: 1345754
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
Haha, yeah hard to keep track of isn't it?

My initial response would be automatically to say that yes, order is significant. However that response is based around an assumption that on inspection of the code as per my edits is incorrect. Right now it seems to me that the only place order is being "respected" is in the part of the code that creates the XISBN pools - anything but your first result for a pool will be discarded.

I would have "thought" that priority should continue to play a part when it came to merging identifiers as well - as per my example above of the two results I am left with, one is my "excellent" match with a high quality cover and lots of good metadata, the other is a less quality match from having no cover (it could also be the case it has very little metadata as well). So as a user you would want the hyperlinked id's of ISBN/B&N/Goodreads or whatever in the book details panel to be going to your "best match", likewise if you did ctrl+d on it again to get fresh metadata it will now use the ISBN as a lookup so again you want your "best".

I never found this issue with my Goodreads plugin because I only return one result (well unless you enable the option to search multiple editions, but as that is slower I turned it off by default I tested it less).

Perhaps it is just a simple Kovid oversight, it is not often I have the confidence in my understanding of the code to call it officially as a bug
kiwidude is offline   Reply With Quote
Old 05-09-2011, 11:56 AM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,359
Karma: 5382313
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I'm confused, I cannot look at the code right now, but IIRC, merging only happens for results in the same pool. Either ISBN pool or title/author pool.

In the first case what you describe cannot happen since the results are in separate pools. Is it happening in the second case and if it is, then I'm not sure what can be done about it since in general the results are the result of merging metadata from different metadata sources and therefore comparing priorities is meaningless.

The only fix I can see for this is to have a pre ISBN merge filter that throws away lower priority results from each source when a result wth the same title and author exists that has a higher priority.
kovidgoyal is offline   Reply With Quote
Old 05-09-2011, 12:22 PM   #7
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,230
Karma: 1345754
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
It is a title/author search, with multiple results from the same source (I only have one source enabled atm).

That third result (which ends up being the "chosen result") has no matches in XISBN, so ends up in a "pool of its own" rather than being merged with the other two. Quite why that is I don't know, it is just a hardback edition of the same book, perhaps either B&N have the wrong ISBN or the XISBN database is out of date. Or maybe that is expected.

Here is the search results on B&N.

I confess to not entirely understanding all the voodoo going on underneath or its intentions. I can only tell you what my print statements are saying is getting merged at various points in the process And the net result is the "wrong" one in this case imho.
kiwidude is offline   Reply With Quote
Old 05-09-2011, 12:31 PM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,359
Karma: 5382313
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Well, like I said, the only solution is to throw away results with the same title/author and lower priority from the same source, before merging. Open a ticket for it and I will implement it when i return.
kovidgoyal is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
calibre's new plugin API kovidgoyal Plugins 26 05-07-2011 03:43 PM
New metadata API in 0.8 questions kiwidude Development 38 04-18-2011 11:42 AM
ePubs and Google Font API Justin Rotkowitz ePub 1 03-29-2011 12:33 PM
Goodreads has published an API EricLandes Calibre 6 01-12-2011 05:39 PM
Ubook plugin api Dopedangel Reading and Management 0 08-25-2007 07:54 AM


All times are GMT -4. The time now is 03:34 AM.


MobileRead.com is a privately owned, operated and funded community.