Glad to see the new sub-forum.
Wanted to see what people thought of an idea for a new cover source, and get any ideas to improve on it if it seems feasible.
I've recently discovered that the highest quality covers on the 'net are hosted by Libraries around the world, and on Overdrive's servers which back the libraries. The resolutions are substantially higher/less compressed than what comes from Amazon/LT.
I was thinking this could provide a couple potential benefits:
- Increased quality
- Cover downloading could be distributed across many different libraries, as all libraries seem to use the same query format, it would just be a matter of maintaining a list of base URLs. I particularly like this as it would help prevent Calibre from being viewed as a DDOS source from more third parties.
There are a few stumbling blocks:
- Adobe uses it's own ID separate from ISBN, and it needs to be determined before getting the cover
- Libraries only host covers for things they have in their collection, although adobe itself could be used as a source of last resort
Knowing the Adobe ID really the only accurate way to get a cover, but I haven't found any good ways to get it aside from searching a Library's Overdrive database. This Overdrive id only matches the ebook edition's ISBN, searching for an alternate ISBN will fail. For some reason advanced search works better with individual libraries than it does directly against Adobe servers....
Here's some info on how this works and examples of how the same cover could be found across multiple systems with the Adobe identifier. Using 'On the Road' by Jack Kerouac as an example:
To get the Overdrive ID one can just do an advanced search on an individual library's collection (Chicago in this case):
http://overdrive.chipublib.org/82DC6...BANGSearch.dll
The request for all the libraries is always to BANGSearch.dll, though the hostname and url prefix chanes with each library. The actual query is in the request content body:
Code:
Title=On+the+Road&Creator=Jack+Kerouac&Keyword=&ISBN=&Format=&Language=&Publisher=&Subject=&Award=&CollDate=&PerPage=10&Sort=SortBy%3Dtitle
These arguments seem to be uniform across all Overdrive based libraries.
Scraping the result contents (unfortunately I don't see any way to do this without scraping) gives three unique Adobe Overdrive IDs:
ContentDetails.htm?ID=
6609D1A7-98A4-4B41-B9A2-94BEE83CF861
ContentDetails.htm?ID=
7FEC0594-4FAD-4CBC-83B3-7DAEBCC5B900
ContentDetails.htm?ID=
D8414773-1865-4C6B-AEB3-F2CB05C84F16
The third one is the epub book, the first two are audiobooks. Ideally one could prioritize epub/mobi/pdf covers over audiobook covers. The Overdrive advanced search GUI allows formats to be specified, but it doesn't allow 'OR'. Not sure if it allows it under the hood, I haven't yet tried to construct my own queries.
We'll use the epub's ID - D8414773-1865-4C6B-AEB3-F2CB05C84F16 - with this the cover can be retrieved from any library with the book in it's collection.
Overdrive's own servers:
Cleveland Public Library:
Chicago Public Library:
In all those cases the rest of the URL stays the same, just change the Overdrive ID and you get the specified book cover.
Next hurdle would be libraries with large collections to increase the likelihood of a result. I haven't found many libraries with more than 20,000 titles. Singapore, Cleveland, and Seattle are all around that mark for epub/mobi/pdf. Chicago, though I used it as an example isn't great with ~6000.
Openlibrary also maintains Overdrive IDs for ebook editions, but not sure how universally they do it, and thus far it seems like more scraping would be required to get to the desired result, which isn't ideal:
http://openlibrary.org/books/OL24273691M/On_the_Road