MobileRead Forums - View Single Post

capink · 08-18-2020, 09:41 AM

Update post 738 to use builtin template:

Code:

{authors:re(\(.+\),)}

In example 5 instead of using a user defined function to remove author roles.

Also added the following example:

Example No. 8: Tag management using similar algorithm and builtin templates:

Spoiler:

In this example will use the metadata variations to de-clutter our tags by getting rid of duplicate tags. We will use the similar match algorithm which already does a good job of finding duplicates. We will enhance it with templates to do an even better job.

The advanced metadata variations is used in exactly the same way we have used the Find Book Duplicates Dialog, with the only difference is that we have a single match rule for a single column.

To start this example, we do the tag match using the similar match as we have done before, doing this on my test library there are some deficiencies that need to be addressed:

The following pairs of tags do not qualify as duplicates
Code:
```
analytics
analytic
```
Code:
```
budget
Budgeting
```
Code:
```
Cartooning
cartoons
```
To address this, we will add a template that uses builtin functions to remove 's' and 'ing' from the end of words in tags, so that the tags above can match. To do this we add the following template to the similar match algorithm (As we have demonstrated before):
Code:
```
{tags:re((e?s\b|ing\b),)}
```
The similar match can match hierarichal tags regardless of separator, like the pair below:

N.B. There is one separator that makes this fail, we will discuss it at the end and see how to correct it.
Code:
```
Fiction.Thrillers.Suspense
Fiction ::: Thrillers ::: Suspense
```
This is really useful, but it still leaves a lot to be desired. For example the following pairs of tags fail to match:
Code:
```
Crime::Mystery::Thriller
Thrillers.Crime.Mystery
```
Code:
```
Crime & mystery
Mystery & Crime
```
The two pairs above have a different sort order, which the builtin similar match does not accout for. We will correct this be using the template below which sorts the tags before matching them:
Code:
```
{tags:list_sort(0, )}
```
Note: space is used as list separator in the above template. We will explain why in the next point.

Note: We add the above template to the similar match + the template we added before to match plural "thrillers" with singular "thriller".

Even after adding the previous template, there is one case when hierarchical tags fail to match. Out of the three tags below, the first two match, while the third fails to match:

Code:

Thrillers.Crime.Mystery
Crime / Mystery / Thriller
Crime/Mystery/Thriller

The problem here is not the sort order which was taken care of in the previous example, the problem is that the slash is not processed as other separators by the similar match algorithm. To understand this better we need to know how the similar algorithm works, which is explained briefly below:

Quote:

The similar algorithm does four things:

It removes some special characters.
It replaces some other characters with a space.
It concatenates multiple adjacent spaces into single one.
It converts all characters to ascii lower case characters.

Most separators (like dots and colons) are replaced with a space. The slash however, is removed without being replaced by a space. So, applying the rules above the tags will evaluate as follows:

first tag will evaluate to:

Code:

thrillers crime mystery

The second will evaluate to:

Code:

crime mystery thriller

The third will evaluate to:

Code:

crimemysterythriller

The first two have the following differences:

One of them has the plural form "thrillers". The first template we wrote takes care of that.
They have a different sort order. The second template takes care of that as well.

So, the first two will match.

If you want the slash to be treated as other separators, you will have to add a this template before the similar match acts on the tags:

Code:

{tags:re(/, )}

The above template replaces any slash with a space.

Note: The order here is important. You must add this template before the similar match algorithm. If you put it after it, it will not have any effect.