I was thinking of 3 new criteria for duplicate file finding. This are criteria that are a 'second pass'. So first there is a duplicate check on the normal way. After that, if results are found, a new check is done to match any of these criteria
1. Has same file type
true false no-check
2. Has a max-difference in pages of:
3. Has a max-difference in size of:
If you have this list (test same author, same title):
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 180 pages
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 199 pages
And Then There Were None - Agatha Christie - EPUB - 0.5MB - 180 pages
And Then There Were None - Agatha Christie - PDF - 0.1MB - 180 pages
Anna Karenina - Graf Leo Tolstoy - PDF - 0.1MB - 180 pages
Anna Karenina - Graf Leo Tolstoy - PDF - 0.2MB - 180 pages
Results would be:
Spoiler:
1. true:
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 180 pages
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 199 pages
And Then There Were None - Agatha Christie - EPUB - 0.5MB - 180 pages
Anna Karenina - Graf Leo Tolstoy - PDF - 0.1MB - 180 pages
Anna Karenina - Graf Leo Tolstoy - PDF - 0.2MB - 180 pages
1. false:
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 180 pages
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 199 pages
And Then There Were None - Agatha Christie - EPUB - 0.5MB - 180 pages
And Then There Were None - Agatha Christie - PDF - 0.1MB - 180 pages
1. no-check (all results)
2. 0
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 180 pages
And Then There Were None - Agatha Christie - EPUB - 0.5MB - 180 pages
Anna Karenina - Graf Leo Tolstoy - PDF - 0.1MB - 180 pages
Anna Karenina - Graf Leo Tolstoy - PDF - 0.2MB - 180 pages
2. -1 (all results)
3. 0
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 180 pages
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 199 pages
And Then There Were None - Agatha Christie - PDF - 0.1MB - 180 pages
3. -1 I think you understand
The advantage would be that you could filter for example books with great page or size difference, there books are likely to be no duplicate.
Books with just 1 or 2 pages difference is more likely to be duplicate.
The advantage of option 1 would be in case you have books with different file-formats, If you should only have books with different formats in your view, it is easy to perform a merge-action on it.
All options should be optional because your current search should have to work like it does now of course.