View Single Post
Old 05-31-2011, 08:18 AM   #68
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
Quote:
Originally Posted by kiwidude View Post
You either have some scenario in mind where you think pages/file size would be useful, or you are just proposing some random thoughts. I don't mind random thoughts as sometimes they spark better ones, but in this case I don't see where you are going with this one?
Thank you for the information.
Well, I do have a scenario in mind.
As I said, this function is a second pass, filtering duplicate files.
So it does not say that files are unmarked as possible duplicates, it just filters the results one way.

For example, when I run some duplicate checks, I get in return a list of 1200 books. All possible duplicates.
Let's say I have these duplicates inside the list:

marked:duplicate_group_0001:
Book A Epub
Book C pdf

marked:duplicate_group_0002:
Book A Epub
Book B Epub (is a binary duplicate of A)

If I added the different formats, I would only see group 1 giving me the option to easily merge this group. So I can eliminate some of the dups a lot faster.

For book size (of course you can't tell dups by size, but as this is a filter after the dup test...) it is a little different.

When I have 1200 possible duplicate books, I would be happy to see all books with a small size-difference. When I see a book of 0.7MB and one of 12.3 MB, I can imagine the content of the book is not the same (technical information, presentation for user can be (bmp <-> jpg)).
But if I could only see the books having say, less than 1k difference, I would have a list of books that are far more likely to be duplicates. For example, if one book has downloaded comments and the other has not. I could just open the books, take a quick look and see if they are the same before I remove them.

The page-function could be used with your page-count plugin. If I see a possible duplicate book with same number of pages (or +/- 1) the change it is a duplicate increases. A book of 100 and 326 pages are more likely to be different.
So in stead of pages, you could make it a custom-field compare to compare 2 int ore floating fields
This than would directly add the option to hide books with the same name but a different series-index (a filed that could be custom set by the user)

***EDIT***
One (manually filtered) example is in the screenshot below. As you can see I added a [other version] for some books to remove them from title check.
You can also see the difference in booksize / page-numbers. They are all non-duplicates, filtering on pages would exclude these books from view.
Attached Thumbnails
Click image for larger version

Name:	duplicate_baantjer.jpg
Views:	792
Size:	157.1 KB
ID:	72117  

Last edited by drMerry; 05-31-2011 at 08:26 AM.
drMerry is offline   Reply With Quote