04-05-2013, 04:54 PM | #1 |
Zealot
Posts: 146
Karma: 13316
Join Date: Nov 2010
Location: Deva, Romania
Device: Android
|
Finding Duplicate files without launching Calibre?
Is it possible to do this? Whilst I enjoy the built-in duplicate finder plugin, I'd really like to be able to do this without having to launch Calibre, perhaps a right-click option for folders.
Why? Because if I were to add every book on my laptop, every extension, etc, I'd have to add about 100k books to Calibre. I REALLY REALLY don't want to have to do that. So is there a tool or option or something of that nature? To update, the reason why I want to use Calibre's Dupe File Plugin is because *EVERY* Duplicate file finding programme that I've used is inferior. Why? Step 1. Created a folder Step 2. Inserted the SAME book with the SAME title with DIFFERENT extensions. Step 3. Ran the Scan. Step 4. No results. Yeah. See? EVERY other duplicate file finder is inferior. Last edited by Dullahir; 04-05-2013 at 05:19 PM. |
04-05-2013, 06:08 PM | #2 |
Addict
Posts: 374
Karma: 1408579
Join Date: Jul 2012
Location: UK
Device: Kindle Touch, Ipod Touch, Ipad Air
|
I have used duplicate cleaner in the past and one of its options is to find file with the same name. That would do what you are asking, it ignores the extension.
Edit I've just tried it on my calibre library folder and its picked up all the duplicates fine. Last edited by alanHd; 04-05-2013 at 06:20 PM. |
04-05-2013, 06:46 PM | #3 |
Zealot
Posts: 146
Karma: 13316
Join Date: Nov 2010
Location: Deva, Romania
Device: Android
|
Hmmm. Are we talking about the same Duplicate Cleaner? Lol. It tells me to specify a selection, and *.* won't work for dupes.
BLAST!!! I found the problem!!!!! Okay, new question: Is there a dupe finder that will search by Metadata? (The reason why the search between extensions won't work is because, I think, that the two files don't have the exact same content.) Edit: The reason it picked up the dupes for you, I think, is because while the calibre library folder contains files of different extensions, it's still, virtually, the same content. With me, nope. For example, one book is fine, the other has ugly footers about PDFCompression. Therefore, no results found, even if you check 'Ignore Content', which I find strange. Last edited by Dullahir; 04-05-2013 at 06:54 PM. |
04-05-2013, 07:12 PM | #4 |
Addict
Posts: 374
Karma: 1408579
Join Date: Jul 2012
Location: UK
Device: Kindle Touch, Ipod Touch, Ipad Air
|
My bad, it wont find duplicate with different extensions because it treats the file extension as part of the name i think.
The dupicates i found were all the same type. That will teach me to multi task. |
04-05-2013, 07:52 PM | #5 | |
null operator (he/him)
Posts: 20,565
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
Here is a review of duplicate file finders https://www.techsupportalert.com/bes...e-detector.htm I cant recommend one because I don't have one... Do you have opf files for your 100,000 books? And what OS are you on - Windows, OS/X or Linux ? There may be a specialist product like the one I have for image files - but I wouldn't hold my breath. If it were me I'd bite the bullet and load them into Calibre. I would do it in batches, once calibre has an author & title database, I think you could delete the format files as I don't think they're needed by Find Duplicates... unless you're planning on doing a binary compare, on 100,000 books that could take quite a long time. BR Last edited by BetterRed; 04-05-2013 at 08:14 PM. |
|
04-10-2013, 01:00 AM | #6 |
Junior Member
Posts: 3
Karma: 10
Join Date: Oct 2011
Device: Android
|
I can think of a way to do this under Unix/Linux (or Cygwin), but it onvolves the command line.....
Basically, you do something like: find <name of top level folder> -print | xargs sum | sort > listofbooks.txt Then you search for lines in listofbooks.txt that have the same checksum entry - they are probably, but not definitely, duplicates. If you want I can make this more robust and automated - it's what I'm going to have to do myself, but not for a week or so. If you want I can post the script once I do - but it will be a Unix/Linux script. There's probably a way to do it using PowerShell too, but my PowerShell skills aren't that good yet. |
04-12-2013, 11:45 PM | #7 |
Zealot
Posts: 146
Karma: 13316
Join Date: Nov 2010
Location: Deva, Romania
Device: Android
|
Hmmm.. Thanks much for the vital information you all. When I remove ebooks from the Calibre, since it can get literally swarmed, I 'Delete everything' after saving to disk.. Even .opf files, sadly, which I'm guessing are needed for dupe finding.
If I were to delete everything EXCEPT the .opf files, will it clear my Calibre window of books and yet save the library info for the next batch of loaded books so I can find the dupes, or how would that work? By the way, I have Windows 7 64bit on both my laptop and my PC. Also, I use Dupe Cleaner, but sometimes it won't work correctly, ie, it won't find the proper files. I mostly use Anti-Twin, which is a two-edge sword. While good, there are significant issues. When searching Similar files, they give you a ratio. Compare: Ratio set for %100 The Hollows 01 - Dead Witch Walking.epub Dead Witch Walking.epub Results: 0 Duplicates found. The Hollows 01 - Dead Witch Walking The Hollows 11 - Ever After Ratio set to %90 Duplicates found: 3 The Hollows 01 Dead Witch Walking The Hollows 11 Yeah. See my issue? With this method, Series 1 and Series 8 are similar books, while Dead Witch Walking.epub and The Hollows 1 - Dead Witch Walking.epub are not. Last edited by Dullahir; 04-12-2013 at 11:52 PM. |
04-12-2013, 11:55 PM | #8 |
Zealot
Posts: 146
Karma: 13316
Join Date: Nov 2010
Location: Deva, Romania
Device: Android
|
ALSO with this method,
Kim Harrison - XXX Kim Harrison - YYY are marked as similar. Why? Kim Harrison. Which COMPLETELY throws me off when it lists EVERY BOOK SHE'S WRITTEN AS A DUPE! |
04-13-2013, 12:37 AM | #9 |
null operator (he/him)
Posts: 20,565
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@Dullahir - yes, I see your issue
What you're looking for is a command line tool that will compare embedded ebook metadata and identify possible duplicates. If no one here knows of such a tool, then there probably isn't one. BR |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Duplicate Files in Calibre? | marty1990 | Calibre | 2 | 06-14-2012 12:23 PM |
Finding and Deleting Duplicate Files of different formats | dpayment | General Discussions | 19 | 10-19-2011 03:02 PM |
Calibre not launching. | Mobilej | Calibre | 1 | 08-05-2011 10:15 AM |
Help finding files in Calibre | bluejoni | Calibre | 4 | 09-25-2010 04:12 PM |
Trouble with Calibre finding eBookreader files | thorswitch | Calibre | 12 | 06-16-2008 10:04 PM |