View Single Post
Old 05-30-2011, 05:58 PM   #65
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
I was thinking of 3 new criteria for duplicate file finding. This are criteria that are a 'second pass'. So first there is a duplicate check on the normal way. After that, if results are found, a new check is done to match any of these criteria

1. Has same file type
true false no-check
2. Has a max-difference in pages of:
3. Has a max-difference in size of:

If you have this list (test same author, same title):
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 180 pages
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 199 pages
And Then There Were None - Agatha Christie - EPUB - 0.5MB - 180 pages
And Then There Were None - Agatha Christie - PDF - 0.1MB - 180 pages
Anna Karenina - Graf Leo Tolstoy - PDF - 0.1MB - 180 pages
Anna Karenina - Graf Leo Tolstoy - PDF - 0.2MB - 180 pages

Results would be:
Spoiler:
1. true:
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 180 pages
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 199 pages
And Then There Were None - Agatha Christie - EPUB - 0.5MB - 180 pages
Anna Karenina - Graf Leo Tolstoy - PDF - 0.1MB - 180 pages
Anna Karenina - Graf Leo Tolstoy - PDF - 0.2MB - 180 pages


1. false:
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 180 pages
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 199 pages
And Then There Were None - Agatha Christie - EPUB - 0.5MB - 180 pages
And Then There Were None - Agatha Christie - PDF - 0.1MB - 180 pages

1. no-check (all results)

2. 0
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 180 pages
And Then There Were None - Agatha Christie - EPUB - 0.5MB - 180 pages
Anna Karenina - Graf Leo Tolstoy - PDF - 0.1MB - 180 pages
Anna Karenina - Graf Leo Tolstoy - PDF - 0.2MB - 180 pages

2. -1 (all results)

3. 0
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 180 pages
And Then There Were None - Agatha Christie - EPUB - 0.1MB - 199 pages
And Then There Were None - Agatha Christie - PDF - 0.1MB - 180 pages

3. -1 I think you understand


The advantage would be that you could filter for example books with great page or size difference, there books are likely to be no duplicate.
Books with just 1 or 2 pages difference is more likely to be duplicate.
The advantage of option 1 would be in case you have books with different file-formats, If you should only have books with different formats in your view, it is easy to perform a merge-action on it.

All options should be optional because your current search should have to work like it does now of course.
drMerry is offline   Reply With Quote