Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 04-05-2013, 04:54 PM   #1
Dullahir
Zealot
Dullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blue
 
Dullahir's Avatar
 
Posts: 145
Karma: 13316
Join Date: Nov 2010
Location: Deva, Romania
Device: Android
Finding Duplicate files without launching Calibre?

Is it possible to do this? Whilst I enjoy the built-in duplicate finder plugin, I'd really like to be able to do this without having to launch Calibre, perhaps a right-click option for folders.

Why? Because if I were to add every book on my laptop, every extension, etc, I'd have to add about 100k books to Calibre. I REALLY REALLY don't want to have to do that.

So is there a tool or option or something of that nature?

To update, the reason why I want to use Calibre's Dupe File Plugin is because *EVERY* Duplicate file finding programme that I've used is inferior. Why?

Step 1. Created a folder
Step 2. Inserted the SAME book with the SAME title with DIFFERENT extensions.
Step 3. Ran the Scan.
Step 4. No results.

Yeah. See? EVERY other duplicate file finder is inferior.

Last edited by Dullahir; 04-05-2013 at 05:19 PM.
Dullahir is offline   Reply With Quote
Old 04-05-2013, 06:08 PM   #2
alanHd
Addict
alanHd ought to be getting tired of karma fortunes by now.alanHd ought to be getting tired of karma fortunes by now.alanHd ought to be getting tired of karma fortunes by now.alanHd ought to be getting tired of karma fortunes by now.alanHd ought to be getting tired of karma fortunes by now.alanHd ought to be getting tired of karma fortunes by now.alanHd ought to be getting tired of karma fortunes by now.alanHd ought to be getting tired of karma fortunes by now.alanHd ought to be getting tired of karma fortunes by now.alanHd ought to be getting tired of karma fortunes by now.alanHd ought to be getting tired of karma fortunes by now.
 
alanHd's Avatar
 
Posts: 374
Karma: 1408579
Join Date: Jul 2012
Location: UK
Device: Kindle Touch, Ipod Touch, Ipad Air
I have used duplicate cleaner in the past and one of its options is to find file with the same name. That would do what you are asking, it ignores the extension.

Edit

I've just tried it on my calibre library folder and its picked up all the duplicates fine.

Last edited by alanHd; 04-05-2013 at 06:20 PM.
alanHd is offline   Reply With Quote
Advert
Old 04-05-2013, 06:46 PM   #3
Dullahir
Zealot
Dullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blue
 
Dullahir's Avatar
 
Posts: 145
Karma: 13316
Join Date: Nov 2010
Location: Deva, Romania
Device: Android
Hmmm. Are we talking about the same Duplicate Cleaner? Lol. It tells me to specify a selection, and *.* won't work for dupes.

BLAST!!! I found the problem!!!!!

Okay, new question:

Is there a dupe finder that will search by Metadata? (The reason why the search between extensions won't work is because, I think, that the two files don't have the exact same content.)

Edit: The reason it picked up the dupes for you, I think, is because while the calibre library folder contains files of different extensions, it's still, virtually, the same content.

With me, nope. For example, one book is fine, the other has ugly footers about PDFCompression. Therefore, no results found, even if you check 'Ignore Content', which I find strange.

Last edited by Dullahir; 04-05-2013 at 06:54 PM.
Dullahir is offline   Reply With Quote
Old 04-05-2013, 07:12 PM   #4
alanHd
Addict
alanHd ought to be getting tired of karma fortunes by now.alanHd ought to be getting tired of karma fortunes by now.alanHd ought to be getting tired of karma fortunes by now.alanHd ought to be getting tired of karma fortunes by now.alanHd ought to be getting tired of karma fortunes by now.alanHd ought to be getting tired of karma fortunes by now.alanHd ought to be getting tired of karma fortunes by now.alanHd ought to be getting tired of karma fortunes by now.alanHd ought to be getting tired of karma fortunes by now.alanHd ought to be getting tired of karma fortunes by now.alanHd ought to be getting tired of karma fortunes by now.
 
alanHd's Avatar
 
Posts: 374
Karma: 1408579
Join Date: Jul 2012
Location: UK
Device: Kindle Touch, Ipod Touch, Ipad Air
My bad, it wont find duplicate with different extensions because it treats the file extension as part of the name i think.

The dupicates i found were all the same type. That will teach me to multi task.
alanHd is offline   Reply With Quote
Old 04-05-2013, 07:52 PM   #5
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,457
Karma: 26645808
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by Dullahir View Post
With me, nope. For example, one book is fine, the other has ugly footers about PDFCompression. Therefore, no results found, even if you check 'Ignore Content', which I find strange.
Its not looking at the content its looking at the file size and checksums - it gets them from the file system directory not from within the file. If just one byte in a file is changed, added or removed then the size and checksums will change, and two files otherwise identical will not be regarded as duplicate files.

Here is a review of duplicate file finders https://www.techsupportalert.com/bes...e-detector.htm

I cant recommend one because I don't have one...

Do you have opf files for your 100,000 books? And what OS are you on - Windows, OS/X or Linux ? There may be a specialist product like the one I have for image files - but I wouldn't hold my breath.

If it were me I'd bite the bullet and load them into Calibre. I would do it in batches, once calibre has an author & title database, I think you could delete the format files as I don't think they're needed by Find Duplicates... unless you're planning on doing a binary compare, on 100,000 books that could take quite a long time.

BR

Last edited by BetterRed; 04-05-2013 at 08:14 PM.
BetterRed is offline   Reply With Quote
Advert
Old 04-10-2013, 01:00 AM   #6
altonaduck
Junior Member
altonaduck began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Oct 2011
Device: Android
I can think of a way to do this under Unix/Linux (or Cygwin), but it onvolves the command line.....

Basically, you do something like:

find <name of top level folder> -print | xargs sum | sort > listofbooks.txt

Then you search for lines in listofbooks.txt that have the same checksum entry - they are probably, but not definitely, duplicates. If you want I can make this more robust and automated - it's what I'm going to have to do myself, but not for a week or so. If you want I can post the script once I do - but it will be a Unix/Linux script.

There's probably a way to do it using PowerShell too, but my PowerShell skills aren't that good yet.
altonaduck is offline   Reply With Quote
Old 04-12-2013, 11:45 PM   #7
Dullahir
Zealot
Dullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blue
 
Dullahir's Avatar
 
Posts: 145
Karma: 13316
Join Date: Nov 2010
Location: Deva, Romania
Device: Android
Hmmm.. Thanks much for the vital information you all. When I remove ebooks from the Calibre, since it can get literally swarmed, I 'Delete everything' after saving to disk.. Even .opf files, sadly, which I'm guessing are needed for dupe finding.

If I were to delete everything EXCEPT the .opf files, will it clear my Calibre window of books and yet save the library info for the next batch of loaded books so I can find the dupes, or how would that work?

By the way, I have Windows 7 64bit on both my laptop and my PC.

Also, I use Dupe Cleaner, but sometimes it won't work correctly, ie, it won't find the proper files.

I mostly use Anti-Twin, which is a two-edge sword. While good, there are significant issues.

When searching Similar files, they give you a ratio. Compare:

Ratio set for %100

The Hollows 01 - Dead Witch Walking.epub
Dead Witch Walking.epub

Results: 0 Duplicates found.

The Hollows 01 - Dead Witch Walking
The Hollows 11 - Ever After

Ratio set to %90

Duplicates found: 3

The Hollows 01
Dead Witch Walking
The Hollows 11

Yeah. See my issue? With this method, Series 1 and Series 8 are similar books, while Dead Witch Walking.epub and The Hollows 1 - Dead Witch Walking.epub are not.

Last edited by Dullahir; 04-12-2013 at 11:52 PM.
Dullahir is offline   Reply With Quote
Old 04-12-2013, 11:55 PM   #8
Dullahir
Zealot
Dullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blue
 
Dullahir's Avatar
 
Posts: 145
Karma: 13316
Join Date: Nov 2010
Location: Deva, Romania
Device: Android
ALSO with this method,

Kim Harrison - XXX
Kim Harrison - YYY are marked as similar. Why?

Kim Harrison. Which COMPLETELY throws me off when it lists EVERY BOOK SHE'S WRITTEN AS A DUPE!
Dullahir is offline   Reply With Quote
Old 04-13-2013, 12:37 AM   #9
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,457
Karma: 26645808
Join Date: Mar 2012
Location: Sydney Australia
Device: none
@Dullahir - yes, I see your issue

What you're looking for is a command line tool that will compare embedded ebook metadata and identify possible duplicates. If no one here knows of such a tool, then there probably isn't one.

BR
BetterRed is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Duplicate Files in Calibre? marty1990 Calibre 2 06-14-2012 12:23 PM
Finding and Deleting Duplicate Files of different formats dpayment General Discussions 19 10-19-2011 03:02 PM
Calibre not launching. Mobilej Calibre 1 08-05-2011 10:15 AM
Help finding files in Calibre bluejoni Calibre 4 09-25-2010 04:12 PM
Trouble with Calibre finding eBookreader files thorswitch Calibre 12 06-16-2008 10:04 PM


All times are GMT -4. The time now is 08:21 AM.


MobileRead.com is a privately owned, operated and funded community.