MobileRead Forums - View Single Post

kiwidude · 04-17-2011, 05:27 AM

Thanks Kovid. I will try doing the extra hop thing by default and see what happens.

Re the extra covers. I was thinking I would have to have my own mapping cache to store a list of urls per id in the plugin. Is the issue in calibre related to that, or is it further downstream like the cover results are keyed by plugin name or something? I'm not desperate to have this feature as can always add it later, but now I'm curious as to where the limitation lies.

I don't know how google and amazon return results but I seem to have a more complex set of permutations to handle retrieval in identify() than the other plugins. I'm using amazon as the base, so a Worker thread class is similarly responsible for parsing the final detail page. However identify() can be called with a variety of different things and the Goodreads site responds differently to each.

- If I have a Goodreads I'd, then I can immediately construct a URL to give directly to worker
- If I have an isbn, then if Goodreads has a match for it then it will return the details page for it as a response, rather than the search results page.
- If I have an isbn that there is no match for, then the search results page comes back with a no results message.
- If I do a title/author search it will always be the the search results page. However sometimes it seems Goodreads would rather give you search results for a "similar" book than say there were no matches.
- If I have search results, as mentioned above they are rolled up for each book and with a link to the editions for each. Note that the editions page is another search results type of page, so I would still need to grab book urls from that to pass to the Worker.

The isbn response differences I can get from the response header from the 'location'. For code simplicity I can just add the URL in the case of a match off to the Worker thread. Though that means the worker is firing another request at Goodreads for the same URL I just got the response for. So I think i should allow passing the response into the Worker as an alternative parameter to avoid the extra fetch and bypass some of the stuff the worker does to fetch from a URL in this case?

The title/author vaguely similar results thing is the biggest problem. I handled this in my current plugin by doing a fuzzy type match on the title and author of the search result versus what I was searching for. Because in the situation of a bulk download in the background, I did not want it to retrieve data fir the wrong book just because it was the first result.

Do you have that same issue to handle at all with any of the calibre plugins? I know amazon has the relevance thing, but iirc that is just the order on the search results for which my equivalent would come from the editions page. Am I correct in thinking you assume that any search result will be fine and there are no sanity checks elsewhere in the calibre process?

It may just be a Goodreads thing that they try to be too helpful. For instance they will show a result by an author with the same surname. Now under no circumstances do I want that to be treated as 'good enough' I think. So I will still need to do my own fuzzy sanity checks, right?

04-17-2011, 05:27 AM	#3
kiwidude Calibre Plugins Developer Posts: 4,726 Karma: 2197770 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	Thanks Kovid. I will try doing the extra hop thing by default and see what happens. Re the extra covers. I was thinking I would have to have my own mapping cache to store a list of urls per id in the plugin. Is the issue in calibre related to that, or is it further downstream like the cover results are keyed by plugin name or something? I'm not desperate to have this feature as can always add it later, but now I'm curious as to where the limitation lies. I don't know how google and amazon return results but I seem to have a more complex set of permutations to handle retrieval in identify() than the other plugins. I'm using amazon as the base, so a Worker thread class is similarly responsible for parsing the final detail page. However identify() can be called with a variety of different things and the Goodreads site responds differently to each. - If I have a Goodreads I'd, then I can immediately construct a URL to give directly to worker - If I have an isbn, then if Goodreads has a match for it then it will return the details page for it as a response, rather than the search results page. - If I have an isbn that there is no match for, then the search results page comes back with a no results message. - If I do a title/author search it will always be the the search results page. However sometimes it seems Goodreads would rather give you search results for a "similar" book than say there were no matches. - If I have search results, as mentioned above they are rolled up for each book and with a link to the editions for each. Note that the editions page is another search results type of page, so I would still need to grab book urls from that to pass to the Worker. The isbn response differences I can get from the response header from the 'location'. For code simplicity I can just add the URL in the case of a match off to the Worker thread. Though that means the worker is firing another request at Goodreads for the same URL I just got the response for. So I think i should allow passing the response into the Worker as an alternative parameter to avoid the extra fetch and bypass some of the stuff the worker does to fetch from a URL in this case? The title/author vaguely similar results thing is the biggest problem. I handled this in my current plugin by doing a fuzzy type match on the title and author of the search result versus what I was searching for. Because in the situation of a bulk download in the background, I did not want it to retrieve data fir the wrong book just because it was the first result. Do you have that same issue to handle at all with any of the calibre plugins? I know amazon has the relevance thing, but iirc that is just the order on the search results for which my equivalent would come from the editions page. Am I correct in thinking you assume that any search result will be fine and there are no sanity checks elsewhere in the calibre process? It may just be a Goodreads thing that they try to be too helpful. For instance they will show a result by an author with the same surname. Now under no circumstances do I want that to be treated as 'good enough' I think. So I will still need to do my own fuzzy sanity checks, right?