Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 10-24-2020, 11:06 PM   #1
kirk8677
Enthusiast
kirk8677 began at the beginning.
 
Posts: 46
Karma: 10
Join Date: May 2020
Device: Kindle
Identifying duplicate books with same content, but different tags

It seems that the duplicate book finder plug in finds books with duplicate tags.

For me, the reverse functionality would be more useful: identifying duplicates that are the same book from their binary content, but MAY have different tags (or could be the same).

The duplicate file finder utilities I have seen DO NOT recognize a duplicate file when the tag changes, otherwise I would run files through a de-dupe software BEFORE adding.

Any suggestions?

Thanks!
kirk8677 is offline   Reply With Quote
Old 10-24-2020, 11:28 PM   #2
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,311
Karma: 145435140
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by kirk8677 View Post
It seems that the duplicate book finder plug in finds books with duplicate tags.

For me, the reverse functionality would be more useful: identifying duplicates that are the same book from their binary content, but MAY have different tags (or could be the same).

The duplicate file finder utilities I have seen DO NOT recognize a duplicate file when the tag changes, otherwise I would run files through a de-dupe software BEFORE adding.

Any suggestions?

Thanks!
I've been using the Find Duplicates plugin for calibre with the binary option. Please note that any changes to the file such as using Polish, Modify Epub, etc. will make this option useless.
Attached Thumbnails
Click image for larger version

Name:	Find_Duplicates_binary.png
Views:	63
Size:	25.9 KB
ID:	182973  

Last edited by DNSB; 10-24-2020 at 11:32 PM. Reason: Added Find Duplicates binrary compare image
DNSB is offline   Reply With Quote
Old 10-24-2020, 11:34 PM   #3
kirk8677
Enthusiast
kirk8677 began at the beginning.
 
Posts: 46
Karma: 10
Join Date: May 2020
Device: Kindle
Thanks, I didn't see that option - do you know if there's any way to run this automatically when adding a book to the library, so that it's not added at all if it binary matches?
kirk8677 is offline   Reply With Quote
Old 10-24-2020, 11:38 PM   #4
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,311
Karma: 145435140
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by kirk8677 View Post
Thanks, I didn't see that option - do you know if there's any way to run this automatically when adding a book to the library, so that it's not added at all if it binary matches?
Not as far as I know. Though my workflow would make checking on add rather useless. I add to an Intake library and then move to my Main library when I've finished metadata cleanup, covers, editing if needed, etc.

OTOH, I can't think of a ebook supplier that supplies identical files with different metadata that is not also embedded in the file which would make the binary compare pretty useless.

Which brings up the question as to which source you are using that supplies binary identical ebook files with different title/authors/etc. information.
DNSB is offline   Reply With Quote
Old 10-24-2020, 11:39 PM   #5
kirk8677
Enthusiast
kirk8677 began at the beginning.
 
Posts: 46
Karma: 10
Join Date: May 2020
Device: Kindle
It actually also doesn't really seem to work for me. The experiment I did was to export a book, change the metadata, then add the changed one back. I did a binary compare and it didn't see the duplicate, for some reason.
kirk8677 is offline   Reply With Quote
Old 10-24-2020, 11:40 PM   #6
kirk8677
Enthusiast
kirk8677 began at the beginning.
 
Posts: 46
Karma: 10
Join Date: May 2020
Device: Kindle
Quote:
Originally Posted by DNSB View Post
Not as far as I know. Though my workflow would make checking on add rather useless. I add to an Intake library and then move to my Main library when I've finished metadata cleanup, covers, editing if needed, etc.

OTOH, I can't think of a ebook supplier that supplies identical files with different metadata that is not also embedded in the file which would make the binary compare pretty useless.

Which brings up the question as to which source you are using that supplies binary identical ebook files with different title/authors/etc. information.
There are numerous sources of free, poorly organized books on the internet. Shall I point you to a few of them, in case you haven't seen them?
kirk8677 is offline   Reply With Quote
Old 10-24-2020, 11:47 PM   #7
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,311
Karma: 145435140
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by kirk8677 View Post
It actually also doesn't really seem to work for me. The experiment I did was to export a book, change the metadata, then add the changed one back. I did a binary compare and it didn't see the duplicate, for some reason.
Which matches what I said in the second message in this thread:

Quote:
Please note that any changes to the file such as using Polish, Modify Epub, etc. will make this option useless.
When you modify the metadata in the file, it is no longer binary identical. The binary compare first looks at the file sizes, if the file sizes are identical, it generates SHA hashes. If those hashes match, the two files are considered as being duplicates.
DNSB is offline   Reply With Quote
Old 10-24-2020, 11:50 PM   #8
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,311
Karma: 145435140
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by kirk8677 View Post
There are numerous sources of free, poorly organized books on the internet. Shall I point you to a few of them, in case you haven't seen them?
I've seen enough of them over the years. For the most part, the type of site that should include the Jolly Roger in their logos since most of those "free, poorly organized books" are also still in copyright.

'Nuff said. Good luck.
DNSB is offline   Reply With Quote
Old 10-24-2020, 11:50 PM   #9
kirk8677
Enthusiast
kirk8677 began at the beginning.
 
Posts: 46
Karma: 10
Join Date: May 2020
Device: Kindle
Quote:
Originally Posted by DNSB View Post
I've seen enough of them over the years. For the most part, the type of site that should include the Jolly Roger in their logos since most of those "free, poorly organized books" are also still in copyright.

'Nuff said. Good luck.
False. Good luck.
kirk8677 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
identifying Calibre books Basilicum Library Management 2 12-26-2017 09:10 PM
Identifying epub books with large file splits? mcgarvan Conversion 6 07-22-2016 05:59 AM
Identifying kindle books jaydusold Conversion 2 01-02-2013 09:25 AM
Vox Identifying books that have been read robroy9876 Kobo Tablets 3 10-15-2012 05:33 AM
How do I get rid of this duplicate content? kbookie Recipes 3 07-16-2011 09:54 AM


All times are GMT -4. The time now is 08:44 PM.


MobileRead.com is a privately owned, operated and funded community.