Thanks for the pointers Kovid, I will take a look into that title/author stuff. I can't directly override identify_results_keygen for this purpose because it is "too late" (I want to do it while deciding whether to fetch a result, not sorting them after the fact) but hopefully there is some stuff I can steal between that and isbndb. It has always been a "weakness" if doing a title/author match with my Goodreads plugin that if you were too "different" from what they had then I would refuse the match and I put it on the todo list for the rewrite. Can't put it off any more...
FYI and maybe you exepcted this but I tried using the soupparser.fromstring like you did with Amazon and found that it trashed the original html at one scenario making it unusable, so I went back to just using html.fromstring. Specifically it turned this:
Code:
<div ...><span...><p>Some text</p></span><span...><p>More text</p></span></div>
into
<div ...><span...></span><p>Some text</p><span...></span><p>More text</p></div>
So the closing span tags got moved and placed next to the opening ones. Filth.

Things work properly using just fromstring though.