MobileRead Forums - View Single Post - Caliber metadata plugin ignors source_relevance attribute

dandman · 05-20-2024, 01:14 PM

so apparently the source_relevance is not the only attribute the compare takes into consideration when sorting the book results by relevance, but also the comments length, if it has a cover, are the identifier of searched book and current results resembles etc.

i could not find that in the docs, only in the code comments,

i think it is an important part to mention, so i will quote it here for others:

function identify_results_keygen returns a function that will generate a key that the sort will use as a sorting key:

PHP Code:


			
identify_results_keygen(title, authors, identifiers)

Code:

        Returns a function that is used to generate a key that can sort Metadata
        objects by their relevance given a search query (title, authors,
        identifiers).

        These keys are used to sort the results of a call to :meth:`identify`.

        For details on the default algorithm see
        :class:`InternalMetadataCompareKeyGen`. Re-implement this function in
        your plugin if the default algorithm is not suitable.

and the internal implementation of the algorithm is:

Code:

    Generate a sort key for comparison of the relevance of Metadata objects,
    given a search query. This is used only to compare results from the same
    metadata source, not across different sources.

    The sort key ensures that an ascending order sort is a sort by order of
    decreasing relevance.

    The algorithm is:

        * Prefer results that have at least one identifier the same as for the query
        * Prefer results with a cached cover URL
        * Prefer results with all available fields filled in
        * Prefer results with the same language as the current user interface language
        * Prefer results that are an exact title match to the query
        * Prefer results with longer comments (greater than 10% longer)
        * Use the relevance of the result as reported by the metadata source's search
           engine

for me it was the length of the comments that overcame the relevance,
basically the source_relevance (.extra) is taken into consideration if and only if all else is the same

PHP Code:


			
    def compare_to_other(self, other):

        a = cmp(self.base, other.base)

        if a != 0:

            return a

        cx, cy = self.comments_len, other.comments_len

        if cx and cy:

            t = (cx + cy) / 20

            delta = cy - cx

            if abs(delta) > t:

                return -1 if delta < 0 else 1

        return cmp(self.extra, other.extra)

one can override this method in the plugin and re-implement the results comparison algorithm, here is a simple example to compare only by the source_relevance attribute:

PHP Code:


			
    def identify_results_keygen(self, title=None, authors=None, identifiers={}):

        # return a function that will be used while sorting the identify results based on the source_relevance field of the Metadata object

        return lambda x: x.source_relevance

for me it was important since i wanted to introduce an option where no perfect match exist (due to misspelled title or authors) and to serve the user with close options ordered by match percentage (at what percent the result matches the searched book)

05-20-2024, 01:14 PM	#4
dandman Enthusiast Posts: 29 Karma: 10545 Join Date: May 2024 Device: none	so apparently the source_relevance is not the only attribute the compare takes into consideration when sorting the book results by relevance, but also the comments length, if it has a cover, are the identifier of searched book and current results resembles etc. i could not find that in the docs, only in the code comments, i think it is an important part to mention, so i will quote it here for others: function identify_results_keygen returns a function that will generate a key that the sort will use as a sorting key: PHP Code: `identify_results_keygen(title, authors, identifiers)` Code: Returns a function that is used to generate a key that can sort Metadata objects by their relevance given a search query (title, authors, identifiers). These keys are used to sort the results of a call to :meth:`identify`. For details on the default algorithm see :class:`InternalMetadataCompareKeyGen`. Re-implement this function in your plugin if the default algorithm is not suitable. and the internal implementation of the algorithm is: Code: Generate a sort key for comparison of the relevance of Metadata objects, given a search query. This is used only to compare results from the same metadata source, not across different sources. The sort key ensures that an ascending order sort is a sort by order of decreasing relevance. The algorithm is: * Prefer results that have at least one identifier the same as for the query * Prefer results with a cached cover URL * Prefer results with all available fields filled in * Prefer results with the same language as the current user interface language * Prefer results that are an exact title match to the query * Prefer results with longer comments (greater than 10% longer) * Use the relevance of the result as reported by the metadata source's search engine for me it was the length of the comments that overcame the relevance, basically the source_relevance (.extra) is taken into consideration if and only if all else is the same PHP Code: def compare_to_other(self, other): a = cmp(self.base, other.base) if a != 0: return a cx, cy = self.comments_len, other.comments_len if cx and cy: t = (cx + cy) / 20 delta = cy - cx if abs(delta) > t: return -1 if delta < 0 else 1 return cmp(self.extra, other.extra) one can override this method in the plugin and re-implement the results comparison algorithm, here is a simple example to compare only by the source_relevance attribute: PHP Code: `def identify_results_keygen(self, title=None, authors=None, identifiers={}): # return a function that will be used while sorting the identify results based on the source_relevance field of the Metadata object return lambda x: x.source_relevance` for me it was important since i wanted to introduce an option where no perfect match exist (due to misspelled title or authors) and to serve the user with close options ordered by match percentage (at what percent the result matches the searched book)