[GUI Plugin] Find Duplicates - Page 50

BetterRed · 06-25-2020, 06:56 AM

@davidfor & dunhill - all done, I've removed the attachments from your posts.

take care

BR

capink · 07-19-2020, 06:11 AM

Edit: This feature (metadata variation custom column support) is now added to the advanced mode. Look at next posts for more details.

capink · 08-09-2020, 11:17 AM

Here is my latest update to this plugin. It adds an Advanced Mode to the plugin. Before detailing the features of the advanced mode, I'd like to underline a couple of things:

The new update retains the old functionality of the plugin without modifying it at all. The advanced mode has separate dialogs, and even inside the plugin zip it is totally separated in its own directory (except for the few lines that create the menu entries).
The whole point of this update is to make the plugin more flexible and extensible. Some of the features I don't use currently, but I tried the make the design as flexible as possible to meet any future needs.

Now, here is list of the advanced mode features:

The advanced mode, like the original plugin, has three dialogs: Book duplicates dialog, Cross library dialog, Metadata variations dialog.
You can match by any column you want. whether standard or custom.
- For book duplicates and cross library dialogs you can use any custom or standard column you want.
- For Metadata variations you can use any standard or custom column as long as its datatype is text or series.
Composite columns are supported in both book and cross library dialogs.
No limit for how many columns you can match by. You are no longer bound by two columns. This is done in the spirit of making the plugin more flexible. However, I have not matched by more than three columns.
Support for user defined algorithms through templates. You can use templates directly in the plugin itself, or indirectly through composite columns. We will show examples of how to do both later in this post. Of the most notable examples below are:
- matching by alias instead of author
- adding language specific modifications to the matching algorithm.
The ability to use more than one matching algorithm for each column. This way you can supplement the builtin matching algorithms with your tailor made algorithms. We will show examples for this below, starting with the simple ones and moving gradually into more complex examples.
Sort dialog enabling you to control how the books are sorted within the duplicate groups. This is complemented with marking the first and last book of each group. You can read more on this in this post.

Before you read examples on using builtin and user-defined template functions, you should visit this page for general info about templates in Calibre.

You do not have to know any programming language to use Calibre's builtin templates. User-defined template functions, however, are written in python. You can use template functions written by others, or use any of the template functions in the examples below. They are written in way that makes them easy to be used and modified for a variety of contexts.

Example No. 1: Replicate original plugin behavior in the advanced mode (Similar title, Soundex author)

Spoiler:

Example No. 2: Use a builtin template to substitute the plugin matching algorithms

Spoiler:

Example No. 3: How to use custom (user defined) template functions to add language specific matching.

Spoiler:

In this example we will use a user-defined template function to add language specific modifications to our match algorithm. Suppose I want to remove French articles to improve the match algorithm for French books. Here is general purpose template that can be used to remove selected words, that we will use here for this purpose:

Code:

def evaluate(self, formatter, kwargs, mi, locals, val, col_name, transliterate=True):
    import re
    asci = lambda x: x
    if transliterate:
        from calibre.utils.filenames import ascii_text as asci
    ar = ['le,' 'la', 'les', 'au', 'aux', 'du', 'des']
    ar_rep = ['^l’']
    r = re.compile(r'({})'.format('|'.join([asci(x) for x in ar_rep])), re.I)
    SEP = mi.metadata_for_field(col_name)['is_multiple'].get('list_to_ui','')

    new_val = []
    if SEP:
        val_ = val.split(SEP)
    else:
        val_ = [val]
    for item in val_:
        tokens = [ asci(tok.lower()) for tok in item.split() if not asci(tok.lower()) in [asci(x) for x in ar] ]
        tokens = [r.sub('', tok) for tok in tokens]
        new_val.append(' '.join(tokens))
    return SEP.join(new_val)

Note: I do not know French. I got these words from a google search, the list of words to remove in this template should be modified by the user to suit his own needs.

First, we have to add this template to Calibre by going to preferences > templates (see attachment 6)

copy the code above into the program's code box. And make sure other settings for the template look like this:
Code:
```
Function: remove_tokens
Argument count: 2
```
Now, press create to permanently add this template to calibre.
For the first match rule select the title field from the combobox.
Add this template to in the algorithm dialog:
Code:
```
{title:remove_tokens(title)}
```
We already covered how to enter templates in the algorithms dialog in the previous example no. 2.
And if you want, you can add another algorithm (Soundex, Similar or Fuzzy) to do more processing to the result of the first template.
Repeat the same for authors, but modify the template to be:
Code:
```
{authors:remove_tokens(title)}
```

NOTE: The template defined above can be used for removal of other unwanted words not just language specific words. You can modify the list of words to suit your own needs. For example, the original plugin's "similar" algorithm removes words like: omnibus, anthology, edition, paperback, hardcover ...etc. If you think of any similar words to remove, you can modify the above template to achieve this.

NOTE: You can further refine the previous template by making it use conditional matching. Say there are certain actions that make sense in one language, but are counter productive in another. Since templates can see the whole book metadata, we can get the book language from the language field, and make actions based on language. We will see an example of conditional matching later. (See examples 6 & 7)

Example No. 4 How to match by alias instead of author.

Spoiler:

This is one of the most interesting examples on how to use templates to find duplicates. It assumes that you are already using user categories in Calibre to set up author aliases as this post by @BetterRed suggests (which is further illustrated here in this attachment).

After creating your aliases as explained above, we will use the template below, which will substitute the author name with the alias if it finds one in the user categories, otherwise it returns the author name unchanged. We can use this template directly as in previous examples, but this time it makes sense to use the template to create a composite column containing aliases that can used for other things, beside duplicate finding, like searching in Calibre by alias.

Add the code below to Calibre templates (we have already how to do this explained in example No. 3).

Code:

def evaluate(self, formatter, kwargs, mi, locals, val, col_name, user_cat_prefix):
    new_val = ''
    if hasattr(mi, '_proxy_metadata'):
        all_cats = mi.user_categories
        cats = {k:v for k,v in all_cats.items() if k.startswith(user_cat_prefix)}
        SEP = mi.metadata_for_field(col_name)['is_multiple'].get('list_to_ui', '')
        new_val = set()
        if SEP:
            val_ = val.split(SEP)
        else:
            val_ = [val]  
        prefix_length = len(user_cat_prefix)
        for user_cat, v in cats.items():
            repl = user_cat[prefix_length:]
            for user_cat_item, src_cat in v:
                if src_cat == col_name:
                    for item in val_:
                        if item == user_cat_item:
                            new_val.add(repl)
                            val_.remove(item)
                        else:
                            new_val.add(item)
        if new_val:
            return SEP.join(list(new_val))
    return val

The rest of the the template setting should look like this:

Code:

Function: replace_with_category
Argument count: 3

Now, press create button to add the code to Calibre's templates.

Create a new composite column with these exact settings (see attachment 8):
Code:
```
Lookup name: alias
Column heading: alias
Column type: Column built from other columns, behaves like tags
Template: {authors:replace_with_category(authors,Authors.Alias.)}
```
You will have to restart Calibre for the new column to be effective.

Note: If you don't choose the "column built from other columns, behave like tags" the duplicate search will not work.

Note: if you decide to use different structure for your user category, you have to replace Authors.Alias. with whatever the user category hierarchy you are using

Note: In simple template mode, spaces are significant. Don't add any space after the comma.
Now, to start matching by alias choose the title field and whatever algorithm you want (I will choose similar match for the title).
Instead of using the authors column we will use the newly built alias column, you will notice as the soon as you choose this column, a new "contains names" checkbox appears (see attachment 9), you have to check it for this duplicate search to work.

This setting tells the plugin to split contents of the alias column using "&" as it does with the authors.

Note that we don't need this option when dealing with the authors column — on any non-composite custom column — as the plugin can know from their metadata that they contain names, and acts accordingly. Unfortunately, Calibre does not give us an option when we create a composite column to indicate that it contains names, so we have to use this option to tell the plugin what to do.
Now choose an algorithm you want for the alias column (you can choose identical here) and click OK to start the find duplicate search.

Note: The above template can be used for purposes other than aliases. Any user categories you apply for series or authors (like nationality) can be added to a composite column appearing in books.

Example No. 5: How to use "match any of the items" option

Spoiler:

As we have noticed before, as soon as we choose a column that contains multiple items, like authors or tags, a "match any of the items" checkbox appears. In most cases this option should be checked, as it makes sure that a book with multiple authors (or items) will match other books containing at least one similar author or item, and not necessarily all of them. (This is the default behavior of the plugin in the normal mode.)

To illustrate this point let us look at these five books:

Code:

title: Brothers Kramazov | authors: Fyodor Dostoyevsky & David Mcduff
title: Brothers Kramazov | authors: David Mcduff & Fyódor Dostoyévsky
title: Brothers Kramazov | authors: Fyodor Dostoyevsky & David Mcduff (trans.)
title: Brothers Kramazov | authors: Fyodor Dostoyevsky & Larissa Volokhonsky
title: Brothers Kramazov | authors: Fyodor Dostoyevsky

If the "match any of the items" option is checked, and we use the similar algorithm, all books will match because they have at least one common author: Fyodor Dostoyevsky. The similar match algorithm will take care of slight differences like diacritics.

If we want to match only the books that share all authors (or items) and not just one of them, you have to un-check the option "match any of the items". When you run the search this time with this option unchecked, you will notice the following:

Only the first two books match. They share all the authors, not just one. Note that they have their authors in different order. This does not matter for the plugin, as it acts on each author separately and then concatenates them after sorting alphabetically.
The last two books predictably failed to match, because they do not share all the authors with any of the other books.
The third book surprisingly failed to match even though it shares all the authors with the first two. This is because the similar algorithm does not remove the text in parenthesis for authors (it does for the title though).

Since having author roles like translators or editors ... etc enclosed in parenthesis as part of the author name is a common occurrence, we can get around this problem by simply adding a the following template before the similar match for the author column

Code:

{authors:re(\(.+\),)}

this uses the builtin template function re to replace anything enclosed inside parenthesis

Example No. 6: Using a user-defined template function for conditional matching

Spoiler:

Example No. 7: How to use the plugin's builtin functions inside our templates

Spoiler:

Example No. 8: Tag management using similar algorithm and builtin templates:

Spoiler:

In this example will use the metadata variations to de-clutter our tags by getting rid of duplicate tags. We will use the similar match algorithm which already does a good job of finding duplicates. We will enhance it with templates to do an even better job.

The advanced metadata variations is used in exactly the same way we have used the Find Book Duplicates Dialog, with the only difference is that we have a single match rule for a single column.

To start this example, we do the tag match using the similar match as we have done before, doing this on my test library there are some deficiencies that need to be addressed:

The following pairs of tags do not qualify as duplicates
Code:
```
analytics
analytic
```
Code:
```
budget
Budgeting
```
Code:
```
Cartooning
cartoons
```
To address this, we will add a template that uses builtin functions to remove 's' and 'ing' from the end of words in tags, so that the tags above can match. To do this we add the following template to the similar match algorithm (As we have demonstrated before):
Code:
```
{tags:re((e?s\b|ing\b),)}
```
The similar match can match hierarchical tags regardless of separator, like the pair below:

N.B. There is one separator that makes this fail, we will discuss it at the end and see how to correct it.
Code:
```
Fiction.Thrillers.Suspense
Fiction ::: Thrillers ::: Suspense
```
This is really useful, but it still leaves a lot to be desired. For example the following pairs of tags fail to match:
Code:
```
Crime::Mystery::Thriller
Thrillers.Crime.Mystery
```
Code:
```
Crime & mystery
Mystery & Crime
```
The two pairs above have a different sort order, which the builtin similar match does not accout for. We will correct this be using the template below which sorts the tags before matching them:
Code:
```
{tags:list_sort(0, )}
```
Note: space is used as list separator in the above template. We will explain why in the next point.

Note: We add the above template to the similar match + the template we added before to match plural "thrillers" with singular "thriller".

Even after adding the previous template, there is one case when hierarchical tags fail to match. Out of the three tags below, the first two match, while the third fails to match:

Code:

Thrillers.Crime.Mystery
Crime / Mystery / Thriller
Crime/Mystery/Thriller

The problem here is not the sort order which was taken care of in the previous example, the problem is that the slash is not processed as other separators by the similar match algorithm. To understand this better we need to know how the similar algorithm works, which is explained briefly below:

Quote:

The similar algorithm does four things:

It removes some special characters.
It replaces some other characters with a space.
It concatenates multiple adjacent spaces into single one.
It converts all characters to ascii lower case characters.

Most separators (like dots and colons) are replaced with a space. The slash however, is removed without being replaced by a space. So, applying the rules above the tags will evaluate as follows:

first tag will evaluate to:

Code:

thrillers crime mystery

The second will evaluate to:

Code:

crime mystery thriller

The third will evaluate to:

Code:

crimemysterythriller

The first two have the following differences:

One of them has the plural form "thrillers". The first template we wrote takes care of that.
They have a different sort order. The second template takes care of that as well.

So, the first two will match.

If you want the slash to be treated as other separators, you will have to add a this template before the similar match acts on the tags:

Code:

{tags:re(/, )}

The above template replaces any slash with a space.

Note: The order here is important. You must add this template before the similar match algorithm. If you put it after it, it will not have any effect.

TIP: Since you are no longer bound only by the mandatory title and author columns, you might have a situation where you exclusively use custom columns for matching. These columns can have no values in a lot of cases. So if you are matching books based on custom_column1 and custom_column2, and one of them don't have value for certain books, you are effectively matching the books based on the column that have value alone.

This situation can be avoided by using virtual library as follows:

In the search bar type a search like this:

Code:

#custom_column1:true #custom_column2:true

And make a virtual library out of the above search by pressing (Ctrl + Shift + *). Now you can open find duplicates and it will only include the books that have values for both columns.

Update: Using filters to sort results. see this post for details.

Final Notes:

In the normal mode, the plugin provides an author only algorithm, this was probably done before the metadata variations feature was added. The advanced mode does not support an author only algorithm as it is better to use the metadata variations dialog for this kind of search. You can use a match rule containing only the author field but it will use the same algorithm.
Using templates (either builtin or user-defined) slightly affect performance. This happens because whenever templates are used, the plugin must fetch the metadata object for every book because that's how templates work. So the biggest performance hit happens when you add the first template. Adding other templates will not affect performance as much as the first template does.

In my testing on an average laptop, this adds about 1 second per 1000 books, so it should hardly be noticeable on most libraries. However, if you have a huge library (tens of thousands), it will take more time to process duplicates, even then, it usually finishes in under a minute.
The normal mode has an option to add language to the title. This option is not needed and thus removed from the advanced mode, since you can add a match rule containing the language column.
There are some situations where the order of the algorithms matter (look at examples 6 & 7), that is why we have buttons to move the algorithms up and down.
I have no use for the cross library duplicate search, I added it because most of the work was already done in the book duplicate dialog. So this is the least tested part of the new updates.
Whenever you enter a template directly into the plugin, it evaluates the template and tries to catch any error and prevent the user from proceeding if the template is not valid. I tried to cover all possible errors, but I am yet to find a reliable way to make the template either produce a valid result or fail, even using unsafe_format it still produces errors without raising exceptions.

This should not be a big problem, and if it ever happens will lead to some false positives, However, the real concern for me here is someone is using only templates on a cross library match, and all templates fail, producing the same error message for all books, we might end up with situation where tens of thousands of books in one library matching tens of thousands of books in the target library which might freeze your pc. So in light of this, it is better to test your templates and make sure they are working if you are using them exclusively in a cross library duplicate search.

Acknowledgements

Thanks for Kovid and the rest of the Calibre team for creating what is the most well designed, flexible piece of software I've come across.
Thanks for kiwidude for creating this, and also for his other awesome plugins. I got much value from them and they made Calibre an even better program. And when I later started to work on his code, I learned from it more than any other resource.
Thanks for chaley for creating templates and other interesting features in Calibre.
Thanks for davidfor for maintaining this plugin as well as other kiwidude's plugin. The same goes for JimmXinu.
Thanks for BetterRed for his idea on how to use user categories to add author pen names.

capink · 08-14-2020, 05:56 AM

Version 1.7.5

Fix: Change the way the advanced mode deals with algorithms that generate an additional reverse hash (similar, soundex), so they fit better with multiple algorithms working together when the option "match any of items" is turned off.
Update: Update Spanish translation. Thanks to @dunhill.
Remove the previously added custom column support in the Find metadata variations dialog as it is now included in the advanced mode.

Also the previous post is updated with the this note:

Since you are no longer bound only by the mandatory title and author columns, you might have a situation where you exclusively use custom columns for matching. These columns can have no values in a lot of cases. So if you are matching books based on custom_column1 and custom_column2, and one of them don't have value for certain books, you are effectively matching the books based on the column that have value alone.

This situation can be avoided by using virtual library as follows:

In the search bar type a search like this:

Code:

#custom_column1:true #custom_column2:true

And make a virtual library out of the above search by pressing (Ctrl + Shift + *). Now you can open find duplicates and it will only include the books that have values for both columns.

capink · 08-18-2020, 09:41 AM

Update post 738 to use builtin template:

Code:

{authors:re(\(.+\),)}

In example 5 instead of using a user defined function to remove author roles.

Also added the following example:

Example No. 8: Tag management using similar algorithm and builtin templates:

Spoiler:

In this example will use the metadata variations to de-clutter our tags by getting rid of duplicate tags. We will use the similar match algorithm which already does a good job of finding duplicates. We will enhance it with templates to do an even better job.

The advanced metadata variations is used in exactly the same way we have used the Find Book Duplicates Dialog, with the only difference is that we have a single match rule for a single column.

To start this example, we do the tag match using the similar match as we have done before, doing this on my test library there are some deficiencies that need to be addressed:

The following pairs of tags do not qualify as duplicates
Code:
```
analytics
analytic
```
Code:
```
budget
Budgeting
```
Code:
```
Cartooning
cartoons
```
To address this, we will add a template that uses builtin functions to remove 's' and 'ing' from the end of words in tags, so that the tags above can match. To do this we add the following template to the similar match algorithm (As we have demonstrated before):
Code:
```
{tags:re((e?s\b|ing\b),)}
```
The similar match can match hierarichal tags regardless of separator, like the pair below:

N.B. There is one separator that makes this fail, we will discuss it at the end and see how to correct it.
Code:
```
Fiction.Thrillers.Suspense
Fiction ::: Thrillers ::: Suspense
```
This is really useful, but it still leaves a lot to be desired. For example the following pairs of tags fail to match:
Code:
```
Crime::Mystery::Thriller
Thrillers.Crime.Mystery
```
Code:
```
Crime & mystery
Mystery & Crime
```
The two pairs above have a different sort order, which the builtin similar match does not accout for. We will correct this be using the template below which sorts the tags before matching them:
Code:
```
{tags:list_sort(0, )}
```
Note: space is used as list separator in the above template. We will explain why in the next point.

Note: We add the above template to the similar match + the template we added before to match plural "thrillers" with singular "thriller".

Even after adding the previous template, there is one case when hierarchical tags fail to match. Out of the three tags below, the first two match, while the third fails to match:

Code:

Thrillers.Crime.Mystery
Crime / Mystery / Thriller
Crime/Mystery/Thriller

The problem here is not the sort order which was taken care of in the previous example, the problem is that the slash is not processed as other separators by the similar match algorithm. To understand this better we need to know how the similar algorithm works, which is explained briefly below:

Quote:

The similar algorithm does four things:

It removes some special characters.
It replaces some other characters with a space.
It concatenates multiple adjacent spaces into single one.
It converts all characters to ascii lower case characters.

Most separators (like dots and colons) are replaced with a space. The slash however, is removed without being replaced by a space. So, applying the rules above the tags will evaluate as follows:

first tag will evaluate to:

Code:

thrillers crime mystery

The second will evaluate to:

Code:

crime mystery thriller

The third will evaluate to:

Code:

crimemysterythriller

The first two have the following differences:

One of them has the plural form "thrillers". The first template we wrote takes care of that.
They have a different sort order. The second template takes care of that as well.

So, the first two will match.

If you want the slash to be treated as other separators, you will have to add a this template before the similar match acts on the tags:

Code:

{tags:re(/, )}

The above template replaces any slash with a space.

Note: The order here is important. You must add this template before the similar match algorithm. If you put it after it, it will not have any effect.

jony08 · 08-30-2020, 11:05 AM

Please add a function to automatically delete one of the duplicates if it has a certain format compared to the other. For example, I want to delete all PDF files automatically if another format is available.

Tanjamuse · 08-30-2020, 12:28 PM

Or any other columns? Word count is lowest or last-edited date?

capink · 08-30-2020, 07:05 PM

Quote:

Originally Posted by jony08

Please add a function to automatically delete one of the duplicates if it has a certain format compared to the other. For example, I want to delete all PDF files automatically if another format is available.

Quote:

Originally Posted by Tanjamuse

Or any other columns? Word count is lowest or last-edited date?

I think this issue has been addressed more than once by kiwidude. It is outside the scope of this plugin to decide which books to keep and which to delete. He also noted several times that this is better implemented in a separate plugin.

And even if one is to write a separate plugin to handle this, there is a lot of difficulties in implementing it; most obvious is the question:

How to decide which book(s) to delete and which to keep? Every user has his own set of criteria which makes it difficult to write a plugin that satisfies the need of all users (short of writing a separate plugin or routine for each user).

So the best way is for each user to implement his own routine, by writing python scripts and running them through calibre-debug. This has the obvious problem that most users don't code and cannot go down this path. But even for people who can code, and want to write their own scripts to handle their unique individual needs this can be challenging:

Let's take for example the first request of deleting books that have only pdf formats if other formats exists. A lot of time you find that you have two duplicate entries each containing a pdf and epub formats. Now you have to decide which one of them to delete. Do I delete one of them randomly? Or maybe I should implement another set of criteria for such occurrences.
One solution to this is to keep the last edited, as suggested in the second request. But now I have another problem, a lot of times they will all have the same modification time because calibre resets the modification date in a lot of situations (for example, whenever you add a custom column calibre resets the modification date for all books in the library).

So now I have to decide on some additional criteria to determine which books to delete, which will lead me further down the rabbit hole, until I finally realize it actually easier to manually choose which books to delete from the GUI.

That being said if someone can and want to implement this feature, all power to them.

Tanjamuse · 08-31-2020, 12:36 PM

How about just an option for sorting the books?

Example: First by set of duplicate and then by a date column?

Then I would know automatically that the second book would always be the oldest?

Thanks so much in advance.

theducks · 08-31-2020, 01:52 PM

Quote:

Originally Posted by Tanjamuse

How about just an option for sorting the books?

Example: First by set of duplicate and then by a date column?

Then I would know automatically that the second book would always be the oldest?

Thanks so much in advance.

You can already sort the results. Simply right-click: sort-by: date (or any other column. That is a filtered view, so Calibre sorting user operations still applies)
Note: Date is (normally) the date the record was created and not necessarily the format within

capink · 08-31-2020, 02:26 PM

Quote:

Originally Posted by theducks

You can already sort the results. Simply right-click: sort-by: date (or any other column. That is a filtered view, so Calibre sorting user operations still applies)
Note: Date is (normally) the date the record was created and not necessarily the format within

The above will not work regardless of the method used to display duplicates (whether it is showing one group at a time, or all at once). This is because the plugin applies its own sort filters and refresh them between results.

This means that if all duplicates are shown at once, sorting by date will mess up the groups because it overrides the plugin mechanism for showing them next to each other. On the other hand, if the plugin is set to show one group at a time, each time you move to the next group, the plugin will override whatever sort filter you applied in the previous group.

JSWolf · 09-26-2020, 06:19 AM

Can someone please fix Fnd DUplicates for Calibre 5? Thanks.

capink · 09-26-2020, 06:38 AM

What exactly is not working in calibre 5?

JSWolf · 09-26-2020, 06:47 AM

Quote:

Originally Posted by capink

What exactly is not working in calibre 5?

There's another thread where someone is saying that DeDRM and Find Duplicates is not working with Calibre 5.

DeDRM is a known issue that is easily solved with a 4.23 portable install.

I guess I should have looked at the update history before posting. Sorry.

mbovenka · 09-26-2020, 02:09 PM

Quote:

Originally Posted by JSWolf

There's another thread where someone is saying that DeDRM and Find Duplicates is not working with Calibre 5.

Find Duplicates works fine with Calibre 5.

07-19-2020, 06:11 AM	#737
capink Wizard Posts: 1,196 Karma: 1995558 Join Date: Aug 2015 Device: Kindle	Edit: This feature (metadata variation custom column support) is now added to the advanced mode. Look at next posts for more details. Last edited by capink; 08-19-2020 at 05:28 AM. Reason: removing link - feature now part of advanced mode added in later postss

08-14-2020, 05:56 AM	#739
capink Wizard Posts: 1,196 Karma: 1995558 Join Date: Aug 2015 Device: Kindle	Version 1.7.5 Fix: Change the way the advanced mode deals with algorithms that generate an additional reverse hash (similar, soundex), so they fit better with multiple algorithms working together when the option "match any of items" is turned off. Update: Update Spanish translation. Thanks to @dunhill. Remove the previously added custom column support in the Find metadata variations dialog as it is now included in the advanced mode. Also the previous post is updated with the this note: Since you are no longer bound only by the mandatory title and author columns, you might have a situation where you exclusively use custom columns for matching. These columns can have no values in a lot of cases. So if you are matching books based on custom_column1 and custom_column2, and one of them don't have value for certain books, you are effectively matching the books based on the column that have value alone. This situation can be avoided by using virtual library as follows: In the search bar type a search like this: Code: #custom_column1:true #custom_column2:true And make a virtual library out of the above search by pressing (Ctrl + Shift + ). Now you can open find duplicates and it will only include the books that have values for both columns. Last edited by capink; 10-08-2020 at 10:08 AM. Reason: remove attachment. newer version available*

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[GUI Plugin] Generate Cover	kiwidude	Plugins	862	07-24-2025 08:49 PM
[GUI Plugin] View Manager	kiwidude	Plugins	416	07-16-2025 05:35 PM
[GUI Plugin] Quality Check	kiwidude	Plugins	1251	07-07-2025 09:13 PM
[GUI Plugin] Open With	kiwidude	Plugins	404	02-21-2025 05:42 AM
[GUI Plugin] Plugin Updater Deprecated	kiwidude	Plugins	159	06-19-2011 12:27 PM

06-25-2020, 06:56 AM	#736
BetterRed null operator (he/him) Posts: 21,740 Karma: 30237526 Join Date: Mar 2012 Location: Sydney Australia Device: none	@davidfor & dunhill - all done, I've removed the attachments from your posts. take care BR

08-30-2020, 11:05 AM	#741
jony08 Connoisseur Posts: 91 Karma: 10 Join Date: Jun 2016 Device: Kobo Aura	Please add a function to automatically delete one of the duplicates if it has a certain format compared to the other. For example, I want to delete all PDF files automatically if another format is available.

08-30-2020, 12:28 PM	#742
Tanjamuse Wizard Posts: 1,327 Karma: 5306 Join Date: Jan 2014 Device: none	Or any other columns? Word count is lowest or last-edited date?

08-31-2020, 12:36 PM	#744
Tanjamuse Wizard Posts: 1,327 Karma: 5306 Join Date: Jan 2014 Device: none	How about just an option for sorting the books? Example: First by set of duplicate and then by a date column? Then I would know automatically that the second book would always be the oldest? Thanks so much in advance.

09-26-2020, 06:19 AM	#747
JSWolf Resident Curmudgeon Posts: 79,792 Karma: 146391129 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3	Can someone please fix Fnd DUplicates for Calibre 5? Thanks.

09-26-2020, 06:38 AM	#748
capink Wizard Posts: 1,196 Karma: 1995558 Join Date: Aug 2015 Device: Kindle	What exactly is not working in calibre 5?