Cover/metadata retrieval when ISBN is un-configured

ldolse · 03-01-2011, 04:18 AM

I just hooked my Overdrive plugin fully into Calibre so I could test its behavior from the GUI. From the CLI everything was working pretty well, but from the GUI I'm seeing things that concern me. It seems to be related to whether or not an ISBN is configured prior to retrieving the cover. Not sure if similar behavior happens for Metadata.

My test book was 'Bitten' by Kelley Armstrong - popular book, translated in multiple languages, many editions/title variations. I added the epub to Calibre as a brand new book and re-started calibre to make sure all caches/previous references were clear. The epub didn't have any ISBN in it's metadata, hence that field was empty.

I had previously disabled all the cover/metadata download plugins except Google Books/ISBNDB (I would have disabled those too, but at some low level Calibre seems to assume one of those will alway be enabled, otherwise metadata download fails instantly). The new Overdrive plugin was also enabled.

I could initiate the cover download either using ctrl-D to download all metadata or just clicking the 'download cover' button in edit metadata.

The core function for getting covers works more or less the same as Amazon's get_cover_url. I immediately saw calls to this function for numerous titles/ISBNs for multiple editions of the book. It looked like multiple simultaneous threads were calling this for every book returned by ISBNDB/Google? This all started happening well before the xisbn to overdrive ID mapping would have a chance to occur.

The way the plugin is plugin is written it does a couple searches against the web server based on each book format, as that was the only way I could find to prioritize ebooks over audio books. It stops on the first successful match.

Anyway the net is that metadata download for a single book caused 74 searches against the web server, and only stopped at 74 because I haven't gotten around to cleansing titles and the final variation wasn't considered a string. Over the course of those 74 queries a number of cover URLs were found, but it kept on going. This was even with a number of successful cache lookup matches to xisbn eliminating some queries. There appeared to be some looping going on here that I didn't fully understand, as I saw the same author/title combo going to the server many times.

Anyway I'm thinking if the ISBN isn't configured perhaps only the first closest match ISBN should be used, and I don't think title variations should be attempted unless perhaps the first title didn't return a cover.

Willing to dig into tuning some of this myself, but I don't know that much about the core of the metadata download code, so some guidance would be helpful.

When an ISBN was pre-set before downloading the cover everything was quite well behaved, functioning exactly as expected.

ldolse · 03-01-2011, 04:32 AM

Further along this same thread - the ISBN I configured for the book is the ISBN for the epub edition I saw in the Overdrive metadata.

That worked well for covers, but didn't work for other metadata types.

Apparently this ISBN doesn't exist in either Google/ISBNDB, and due to that fact Calibre immediately came back with 'No matches found for this book' instead of attempting to query the other metadata providers that are configured.

kiwidude · 03-01-2011, 04:46 AM

Maybe Kovid has changed things in the last few weeks but one of the limitations I found for my Goodreads covers download is that you can *only* download covers for books with an ISBN. It is hard-baked into a number of places in the Calibre code that if the book has no ISBN then you are out of luck. I reported this as a ticket here and Kovid said it would be addressed as part of the new API.

ldolse · 03-01-2011, 05:27 AM

Quote:

Originally Posted by kiwidude

Maybe Kovid has changed things in the last few weeks but one of the limitations I found for my Goodreads covers download is that you can *only* download covers for books with an ISBN. It is hard-baked into a number of places in the Calibre code that if the book has no ISBN then you are out of luck. I reported this as a ticket here and Kovid said it would be addressed as part of the new API.

Well it definitely didn't seem to want to download covers without an ISBN, though the Overdrive plugin doesn't use ISBN. However it was retrieving ISBN data beforehand from google/isbndb before attempting cover retrieval. The problem was that instead of picking the first/best ISBN from ISBNDB/Google it seemed to be searching for all of the covers at once, and was also over-writing the title in every case.

The difference I'm seeing for covers vs. metadata providers is it's not checking the validity of the ISBN before cover download, but for metadata it will only proceed to get metadata after confirming the ISBN is in Google's or ISBNDB's databases.

Now that I think about I have a hunch that this is because three interfaces are sharing the same code - bulk download, download cover button, and 'Fetch Metadata' button - the fetch metadata button checks each ISBN to see if a cover exists so it can display that info to the user - I suspect it's that bit that doesn't play well with the overdrive plugin.

kovidgoyal · 03-01-2011, 09:53 AM

There are three stages to the metadata download process:

1) An identify stage: this uses isbndb and google books to get the book isbn from title/author or get the title/author from the isbn. This also queries the cover download plugins has_cover method

2) Social metadata download: This downloads tags/rating/comments/series based on the isbn from step 1

3) cover download. picks the first cover returned by the cover download plugins based on the metadata discovered so far. All builting cover download plugins use isbn.

ldolse · 03-01-2011, 10:20 AM

Quote:

Originally Posted by kovidgoyal

There are three stages to the metadata download process:

1) An identify stage: this uses isbndb and google books to get the book isbn from title/author or get the title/author from the isbn. This also queries the cover download plugins has_cover method

2) Social metadata download: This downloads tags/rating/comments/series based on the isbn from step 1

3) cover download. picks the first cover returned by the cover download plugins based on the metadata discovered so far. All builting cover download plugins use isbn.

Understood - I guess my concern is with stage 1. The behavior makes sense when a user clicks the 'fetch metadata' button, as you want to provide them with a list of choices. However there are at least two ways to download metadata which don't allow for user interaction - Bulk Metadata download and the 'download cover' button. Why iterate through every single possible record in those scenarios when the user isn't going to get any choice in the matter anyway? I'm looking at this from both a performance perspective of getting the information downloaded as well as load on the metadata providers that have varying levels of love of Calibre users.

I can see an argument for bulk fetches iterating through the ISBN's to the first ISBN that can be associated with a cover, but I don't see why it needs to keep going after that. The other issue is that as soon as an ISBN matches it begins using that title in the related cover searches, which in general is more likely to fail than the original cover, since more often than not that variant includes subtitle/series info (though the title cleansing discussed in the other thread could mitigate that somewhat).

kovidgoyal · 03-01-2011, 10:22 AM

Because calibre tries to find the best match.

ldolse · 03-01-2011, 10:35 AM

Best match based on what Criteria? When I manually search for 'Bitten' in the example above, ISBNDB has 8 matches where the title matches exactly, Google has one, and the one Calibre chooses is the one that includes series information in the title which wasn't originally there... The one Calibre chose isn't wrong, but I wouldn't call it more right than the ones where the title actually matched exactly.

The version that gets chosen is the one at the top of the list when you manually go to the 'Fetch Metadata' button and have it populate a list of options.

kovidgoyal · 03-01-2011, 10:46 AM

See the code in metadata.fetch for exactly how the results are sorted/merged

fenuks · 11-05-2011, 09:43 AM

Hi. Sorry for refreshing this old thread, but it's related to my question.

Quote:

Originally Posted by kovidgoyal

3) cover download. picks the first cover returned by the cover download plugins based on the metadata discovered so far.

Many pages have a few variants of cover (different edition etc.). Possibility of returning more than one cover would be useful. Of course there should be some limitation for plugins to avert cover-flooding. I am big fan of cover flow, so I find this useful. I hope you're too and you'll find this worth consideration

. Thank you.

kovidgoyal · 11-05-2011, 10:29 AM

Returning multiple covers is slow. Each cover has to be downloaded before it can be displayed. And you cannot typically download covers in parallel from the same site (as you can from different sites) as that would overload the site's servers. Imagine 5 million calibre users all downloading 10 covers per book from some poor site

For browsing multiple covers, a google image search actually works pretty well. See for example the search the internet calibre plugin. Not as nice as cover flow, obviously, but not too bad either.

fenuks · 11-05-2011, 01:01 PM

Quote:

Originally Posted by kovidgoyal

you cannot typically download covers in parallel from the same site (as you can from different sites) as that would overload the site's servers.

Typically not, but sometimes there is possibility to get links for another covers directly from same site. I didn't except that every metadata source plugin will return n covers (n>1), but when there are convenient conditions for that plugin author should have possibility to make use of them.

Quote:

Originally Posted by kovidgoyal

Imagine 5 million calibre users all downloading 10 covers per book from some poor site

We must try that some day

kovidgoyal · 11-05-2011, 10:07 PM

links for the covers are not enough, you have to also download the actual cover data. Still, I have no fundamental objection to allowing plugins to download multiple covers, however, I am not very motivated to implement it either, so, patches welcome.

03-01-2011, 04:18 AM	#1
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	Cover/metadata retrieval when ISBN is un-configured I just hooked my Overdrive plugin fully into Calibre so I could test its behavior from the GUI. From the CLI everything was working pretty well, but from the GUI I'm seeing things that concern me. It seems to be related to whether or not an ISBN is configured prior to retrieving the cover. Not sure if similar behavior happens for Metadata. My test book was 'Bitten' by Kelley Armstrong - popular book, translated in multiple languages, many editions/title variations. I added the epub to Calibre as a brand new book and re-started calibre to make sure all caches/previous references were clear. The epub didn't have any ISBN in it's metadata, hence that field was empty. I had previously disabled all the cover/metadata download plugins except Google Books/ISBNDB (I would have disabled those too, but at some low level Calibre seems to assume one of those will alway be enabled, otherwise metadata download fails instantly). The new Overdrive plugin was also enabled. I could initiate the cover download either using ctrl-D to download all metadata or just clicking the 'download cover' button in edit metadata. The core function for getting covers works more or less the same as Amazon's get_cover_url. I immediately saw calls to this function for numerous titles/ISBNs for multiple editions of the book. It looked like multiple simultaneous threads were calling this for every book returned by ISBNDB/Google? This all started happening well before the xisbn to overdrive ID mapping would have a chance to occur. The way the plugin is plugin is written it does a couple searches against the web server based on each book format, as that was the only way I could find to prioritize ebooks over audio books. It stops on the first successful match. Anyway the net is that metadata download for a single book caused 74 searches against the web server, and only stopped at 74 because I haven't gotten around to cleansing titles and the final variation wasn't considered a string. Over the course of those 74 queries a number of cover URLs were found, but it kept on going. This was even with a number of successful cache lookup matches to xisbn eliminating some queries. There appeared to be some looping going on here that I didn't fully understand, as I saw the same author/title combo going to the server many times. Anyway I'm thinking if the ISBN isn't configured perhaps only the first closest match ISBN should be used, and I don't think title variations should be attempted unless perhaps the first title didn't return a cover. Willing to dig into tuning some of this myself, but I don't know that much about the core of the metadata download code, so some guidance would be helpful. When an ISBN was pre-set before downloading the cover everything was quite well behaved, functioning exactly as expected. Last edited by ldolse; 03-01-2011 at 04:25 AM.

03-01-2011, 04:32 AM	#2
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	Further along this same thread - the ISBN I configured for the book is the ISBN for the epub edition I saw in the Overdrive metadata. That worked well for covers, but didn't work for other metadata types. Apparently this ISBN doesn't exist in either Google/ISBNDB, and due to that fact Calibre immediately came back with 'No matches found for this book' instead of attempting to query the other metadata providers that are configured. Last edited by ldolse; 03-01-2011 at 05:20 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
recipe content retrieval	Torx	Recipes	1	04-06-2013 03:58 PM
Slow cover and Metadata retrieval times.	chango714	Calibre	3	03-20-2011 10:40 AM
content retrieval recipe	Torx	Amazon Kindle	0	12-17-2010 11:05 AM
Recipe - save some date for later retrieval	mh445	Calibre	3	07-19-2010 04:06 PM
"BOOKS" button leads to an empty display after configured to the CF card?	genome2k	iRex	12	09-24-2008 08:14 AM

03-01-2011, 04:46 AM	#3
kiwidude Calibre Plugins Developer Posts: 4,637 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	Maybe Kovid has changed things in the last few weeks but one of the limitations I found for my Goodreads covers download is that you can only download covers for books with an ISBN. It is hard-baked into a number of places in the Calibre code that if the book has no ISBN then you are out of luck. I reported this as a ticket here and Kovid said it would be addressed as part of the new API.

03-01-2011, 09:53 AM	#5
kovidgoyal creator of calibre Posts: 43,857 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	There are three stages to the metadata download process: 1) An identify stage: this uses isbndb and google books to get the book isbn from title/author or get the title/author from the isbn. This also queries the cover download plugins has_cover method 2) Social metadata download: This downloads tags/rating/comments/series based on the isbn from step 1 3) cover download. picks the first cover returned by the cover download plugins based on the metadata discovered so far. All builting cover download plugins use isbn.

03-01-2011, 10:22 AM	#7
kovidgoyal creator of calibre Posts: 43,857 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Because calibre tries to find the best match.

03-01-2011, 10:35 AM	#8
ldolse Wizard Posts: 1,337 Karma: 123455 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	Best match based on what Criteria? When I manually search for 'Bitten' in the example above, ISBNDB has 8 matches where the title matches exactly, Google has one, and the one Calibre chooses is the one that includes series information in the title which wasn't originally there... The one Calibre chose isn't wrong, but I wouldn't call it more right than the ones where the title actually matched exactly. The version that gets chosen is the one at the top of the list when you manually go to the 'Fetch Metadata' button and have it populate a list of options.

03-01-2011, 10:46 AM	#9
kovidgoyal creator of calibre Posts: 43,857 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	See the code in metadata.fetch for exactly how the results are sorted/merged

11-05-2011, 10:29 AM	#11
kovidgoyal creator of calibre Posts: 43,857 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Returning multiple covers is slow. Each cover has to be downloaded before it can be displayed. And you cannot typically download covers in parallel from the same site (as you can from different sites) as that would overload the site's servers. Imagine 5 million calibre users all downloading 10 covers per book from some poor site For browsing multiple covers, a google image search actually works pretty well. See for example the search the internet calibre plugin. Not as nice as cover flow, obviously, but not too bad either.

11-05-2011, 10:07 PM	#13
kovidgoyal creator of calibre Posts: 43,857 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	links for the covers are not enough, you have to also download the actual cover data. Still, I have no fundamental objection to allowing plugins to download multiple covers, however, I am not very motivated to implement it either, so, patches welcome.