Some time ago, Kovid changed the paste button for identifiers in the metadata editor to handle several prefixes. At the time, I was looking at a way to paste a URL and generate the identifier based on the metadata sources installed and identifier rules defined. I wanted this as I frequently had a URL that I wanted to add as a specific identifier type. This was especially the case for the URL covered by the identifier rules. But, because of the way the metadata download doesn't always merge results, I might want the identifier from a different result than the one I am choosing.
Doing this is easy for the rules as they are defined in a way that they can be parsed in either direction. But, this isn't possible for the metadata sources as none contain a property or method that can be used for this. To do this, I needed something added to each metadata source plugin.
This could be a property that returned a pattern that could be used in the same way as the identifier rules. Or a method in the metadata source plugin that parsed the URL and returned the identifier. In the end, I decided the latter was probably the way to go. Doing it this way allows more flexibility in the parsing. Amazon with national URLs and identifiers probably need it done this way.
My proposal is to add the following to ebooks/metadata/sources/base.py:
Code:
def id_from_url(self, url):
'''
Parse a URL and return a tuple of the form:
(identifier_type, identifier_value).
If the URL does not match the pattern for the metadata source,
return None.
'''
return None
And override this in each metadata source plugin. For example, in the Kobo plugin, I have:
Code:
def id_from_url(self, url):
match = re.match(self.BASE_URL + ".*" + self.BOOK_PATH + "(.*)?", url)
if match:
return ('kobo', match.groups(0)[0])
return None
Which I just realised is in the latest version I released.
The attached version of gui2/metadata/basic_widgets.py has the code to handle this. It tries the identifier rules and then tries the metadata source plugins. It stops after the first identifier found. If none is found, it goes through the current rules.
I have been using this for a while and it works well for me. The question is, does anyone have a problem with this approach? Or a suggestion of a better way to do this. And a better name of the method than "id_from_url"? Does this need to return multiple identifiers?