View Single Post
Old 08-18-2020, 09:41 AM   #740
capink
Wizard
capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.
 
Posts: 1,203
Karma: 1995558
Join Date: Aug 2015
Device: Kindle
Update post 738 to use builtin template:

Code:
{authors:re(\(.+\),)}
In example 5 instead of using a user defined function to remove author roles.

Also added the following example:

Example No. 8: Tag management using similar algorithm and builtin templates:

Spoiler:


In this example will use the metadata variations to de-clutter our tags by getting rid of duplicate tags. We will use the similar match algorithm which already does a good job of finding duplicates. We will enhance it with templates to do an even better job.

The advanced metadata variations is used in exactly the same way we have used the Find Book Duplicates Dialog, with the only difference is that we have a single match rule for a single column.

To start this example, we do the tag match using the similar match as we have done before, doing this on my test library there are some deficiencies that need to be addressed:
  1. The following pairs of tags do not qualify as duplicates

    Code:
    analytics
    analytic
    Code:
    budget
    Budgeting
    Code:
    Cartooning
    cartoons
    To address this, we will add a template that uses builtin functions to remove 's' and 'ing' from the end of words in tags, so that the tags above can match. To do this we add the following template to the similar match algorithm (As we have demonstrated before):

    Code:
    {tags:re((e?s\b|ing\b),)}
  2. The similar match can match hierarichal tags regardless of separator, like the pair below:

    N.B. There is one separator that makes this fail, we will discuss it at the end and see how to correct it.

    Code:
    Fiction.Thrillers.Suspense
    Fiction ::: Thrillers ::: Suspense
    This is really useful, but it still leaves a lot to be desired. For example the following pairs of tags fail to match:

    Code:
    Crime::Mystery::Thriller
    Thrillers.Crime.Mystery

    Code:
    Crime & mystery
    Mystery & Crime
    The two pairs above have a different sort order, which the builtin similar match does not accout for. We will correct this be using the template below which sorts the tags before matching them:

    Code:
    {tags:list_sort(0, )}
    Note: space is used as list separator in the above template. We will explain why in the next point.

    Note: We add the above template to the similar match + the template we added before to match plural "thrillers" with singular "thriller".


  3. Even after adding the previous template, there is one case when hierarchical tags fail to match. Out of the three tags below, the first two match, while the third fails to match:

    Code:
    Thrillers.Crime.Mystery
    Crime / Mystery / Thriller
    Crime/Mystery/Thriller
    The problem here is not the sort order which was taken care of in the previous example, the problem is that the slash is not processed as other separators by the similar match algorithm. To understand this better we need to know how the similar algorithm works, which is explained briefly below:

    Quote:
    The similar algorithm does four things:
    1. It removes some special characters.
    2. It replaces some other characters with a space.
    3. It concatenates multiple adjacent spaces into single one.
    4. It converts all characters to ascii lower case characters.

    Most separators (like dots and colons) are replaced with a space. The slash however, is removed without being replaced by a space. So, applying the rules above the tags will evaluate as follows:

    first tag will evaluate to:

    Code:
    thrillers crime mystery
    The second will evaluate to:

    Code:
    crime mystery thriller
    The third will evaluate to:

    Code:
    crimemysterythriller
    The first two have the following differences:
    • One of them has the plural form "thrillers". The first template we wrote takes care of that.
    • They have a different sort order. The second template takes care of that as well.

    So, the first two will match.

    If you want the slash to be treated as other separators, you will have to add a this template before the similar match acts on the tags:

    Code:
    {tags:re(/, )}
    The above template replaces any slash with a space.

    Note: The order here is important. You must add this template before the similar match algorithm. If you put it after it, it will not have any effect.

Last edited by capink; 08-19-2020 at 05:39 AM. Reason: correcting typs
capink is offline   Reply With Quote