MobileRead Forums - View Single Post

Philosopher · 01-23-2011, 05:51 AM

I have a library of over 30,000 and growing. However I have a large number of duplicate files - both input and in folders waiting to input.

The problem I find is that the duplicate process - the pause and message about adding them - really does little. It seems to detect by title.

Yet when I import in a large number of the books come in with a title I have to correct and have thus the same title. So I can't eliminate duplicates that way.

Also I have many copies of the same book in different formats or different versions or different quality. So that too makes it useless.

I wonder if it would be possible for someone to develop (its out of my league) a really useful duplicate finder either to use on import or even to use afterwards.

It should check for title, author, filetype, isbn, and (most importantly) file size - to see if the two really are the same.

Better yet can't it check the actual file and compare to see if the two files are identical? I know this is possible but would have no idea how to program it into the program.

Then, in my ideal world, it would call up a list - it could be in the main window - and there should be a check box to choose which of the files you want to remove as a duplicate.

Just brainstorming and two other feature possibilities would be to (a) have the rows alternate shading to distinguish each set of books - so all with the same title would be the same shade but the next in the list would alternate - making it easier to quickly determine the dups. OR (b) having instead of a list with a check box - a dialogue box that goes through each suspected set of duplicates - presenting the list of only that book (suspected) with a check box on which to remove.

I think checking which to remove rather than which to keep would be safer - this way accidentally you could avoid removing all of them. Although checking one to keep - if there were several - would be quicker.

Just a thought - I think this would be a real enhancement to the program - and I am not sure if anyone out there actually finds the existing duplicate check very useful.

01-23-2011, 05:51 AM	#1
Philosopher Connoisseur Posts: 77 Karma: 12 Join Date: Jun 2010 Device: Kindle	Duplicate Detection I have a library of over 30,000 and growing. However I have a large number of duplicate files - both input and in folders waiting to input. The problem I find is that the duplicate process - the pause and message about adding them - really does little. It seems to detect by title. Yet when I import in a large number of the books come in with a title I have to correct and have thus the same title. So I can't eliminate duplicates that way. Also I have many copies of the same book in different formats or different versions or different quality. So that too makes it useless. I wonder if it would be possible for someone to develop (its out of my league) a really useful duplicate finder either to use on import or even to use afterwards. It should check for title, author, filetype, isbn, and (most importantly) file size - to see if the two really are the same. Better yet can't it check the actual file and compare to see if the two files are identical? I know this is possible but would have no idea how to program it into the program. Then, in my ideal world, it would call up a list - it could be in the main window - and there should be a check box to choose which of the files you want to remove as a duplicate. Just brainstorming and two other feature possibilities would be to (a) have the rows alternate shading to distinguish each set of books - so all with the same title would be the same shade but the next in the list would alternate - making it easier to quickly determine the dups. OR (b) having instead of a list with a check box - a dialogue box that goes through each suspected set of duplicates - presenting the list of only that book (suspected) with a check box on which to remove. I think checking which to remove rather than which to keep would be safer - this way accidentally you could avoid removing all of them. Although checking one to keep - if there were several - would be quicker. Just a thought - I think this would be a real enhancement to the program - and I am not sure if anyone out there actually finds the existing duplicate check very useful.