Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 03-26-2022, 08:42 AM   #961
PPP-Magic
Junior Member
PPP-Magic began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Mar 2022
Device: Various
Doh, not sure how I missed that LOL

Thanks for the help.
PPP-Magic is offline   Reply With Quote
Old 04-01-2022, 03:57 PM   #962
Rellwood
Library Breeder (She/Her)
Rellwood ought to be getting tired of karma fortunes by now.Rellwood ought to be getting tired of karma fortunes by now.Rellwood ought to be getting tired of karma fortunes by now.Rellwood ought to be getting tired of karma fortunes by now.Rellwood ought to be getting tired of karma fortunes by now.Rellwood ought to be getting tired of karma fortunes by now.Rellwood ought to be getting tired of karma fortunes by now.Rellwood ought to be getting tired of karma fortunes by now.Rellwood ought to be getting tired of karma fortunes by now.Rellwood ought to be getting tired of karma fortunes by now.Rellwood ought to be getting tired of karma fortunes by now.
 
Rellwood's Avatar
 
Posts: 1,268
Karma: 1937891
Join Date: Apr 2015
Location: Fullerton, California
Device: Paperwhite 2015 (2), PW 2024 (12 GEN), PW 2023 (11 GEN), Scribe (1st)
Might be a dumb or obvious question/answer, but in the case of tags where there are multiple matches for duplicates and only a couple of those matches in the group are actual matches, how do I specify that I only want to use those specific ones out of the group? It seems like it's an all or none.

Example 2stars, 2-stars,2.stars,3-stars,3.stars,5-stars
I match 2-stars for all of them and I want to use 2.stars for the 2 star ones but none of the other ones, which should get their own matches.

I ignore them and end up missing out all together. Am I missing something?
Rellwood is offline   Reply With Quote
Old 04-01-2022, 04:54 PM   #963
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,792
Karma: 146391129
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by theducks View Post
Read the last line.
3.21 does not meet the PI requirements.

I suspect your distro is typical and WAY BEHIND.

Use the command found on the Calibre Linux Download page and join us with a modern 5.x version.

Your other option, is to uninstall the PI and find someone with an old version of it
Never ever use a Linux repository. It is a waste of time and you could find things break.
JSWolf is offline   Reply With Quote
Old 05-12-2022, 09:20 AM   #964
mklauber
Junior Member
mklauber began at the beginning.
 
Posts: 3
Karma: 10
Join Date: May 2022
Device: none
Stored Binary Hashes?

Hey, I'm a big fan of your Find Duplicates Plugin. I was wondering if it would be feasible to have it store the binary hashes in the database, to reduce the amount of work needed to do a comparison. Specifically, the binary compare has gotten slow as my library had increased beyond the 1TB mark. Is this a feature that can be added, or can I fork the code and attempt to add it myself somewhere?
mklauber is offline   Reply With Quote
Old 05-12-2022, 12:19 PM   #965
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,076
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by mklauber View Post
Hey, I'm a big fan of your Find Duplicates Plugin. I was wondering if it would be feasible to have it store the binary hashes in the database, to reduce the amount of work needed to do a comparison. Specifically, the binary compare has gotten slow as my library had increased beyond the 1TB mark. Is this a feature that can be added, or can I fork the code and attempt to add it myself somewhere?
Patch submissions (put in this thread, WELL Labeled) are OK.
MR user should be cautioned about using experimental code.

Now the critique. A hash may only be good at the time.
ANY change to the book (polish, embed, edit) will change that value.
How do you NOW know the value is stale???

Value Storage: A custom column will stay with the Library
theducks is online now   Reply With Quote
Old 05-12-2022, 01:59 PM   #966
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,447
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by mklauber View Post
Hey, I'm a big fan of your Find Duplicates Plugin. I was wondering if it would be feasible to have it store the binary hashes in the database, to reduce the amount of work needed to do a comparison. Specifically, the binary compare has gotten slow as my library had increased beyond the 1TB mark. Is this a feature that can be added, or can I fork the code and attempt to add it myself somewhere?
Keeping in mind @theduck's comments ...

Plugins are provided as source code. You are free to change them however you want. Getting your changes released requires consent of the plugin's current maintainer. See Writing your own plugins to extend calibre’s functionality for some guidance on writing & changing plugins.

You can store the hashes in calibre's database using db.cache.add_custom_book_data. Personally I would store both the hash and the format's last-modified date so you can have a clue that the hash is still valid.
chaley is offline   Reply With Quote
Old 08-10-2022, 03:09 AM   #967
capink
Wizard
capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.
 
Posts: 1,196
Karma: 1995558
Join Date: Aug 2015
Device: Kindle
Version 1.9.7

Version 1.9.7
  • Update: update to calibre6 icon themes. Code borrowed from @JimmXinu.
capink is offline   Reply With Quote
Old 09-08-2022, 02:17 PM   #968
jluaioyj
Junior Member
jluaioyj began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Sep 2022
Device: Kobo Touch, Like Book Mars, Android
Optimise Binary Compare using hashes/digests

I'd like to suggest that Find Duplicates's Binary Compare be enhanced to support storage and comparison by hash/digest to significant speed it up, by avoiding most/all redundant full file compares. A file compare on hash/digest could be an option if hash/digest collision false-matches are suspected, as were discovered occurring sometimes for MD5.

If suggest storage of binary comparison metadata in a custom field, and that this field contains a JSON map with a hash/digest type, and a key for each file format, with the value map containing, a hexadecimal formatted file hash/digest (I'd suggest SHA256), the last file size, and the file last modified timestamp; the later two for validation.

If this custom field was not configured, like the "Last Modified" plugin does, a warning should be displayed, then the current _slow_ full file compare functionality used instead.

During a binary search:
* If any of the field maps are missing, they should be created.
* If the field value is junk or if the hash/digest type is obsolete, the whole field map should be recreated.
* If a format file is missing, it should be removed from the field map.
* If a format file was added, it should be added to the map.
* If the hash/digest value is out-of-date (file size or last modified changed), the type map should be re-built.

It would be a nice if the above rules were applied after a book entry was added, after any formats were added/removed, and after any in-calibre format file changes; obviously, I would not expect this to be able to spot any updates outside of calibre.

I'd suggest that this field, and its value creation, validation, and updating should really be provided by calibre itself.

Last edited by jluaioyj; 09-08-2022 at 02:22 PM.
jluaioyj is offline   Reply With Quote
Old 09-08-2022, 08:20 PM   #969
capink
Wizard
capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.
 
Posts: 1,196
Karma: 1995558
Join Date: Aug 2015
Device: Kindle
Quote:
Originally Posted by jluaioyj View Post
I'd like to suggest that Find Duplicates's Binary Compare be enhanced to support storage and comparison by hash/digest to significant speed it up, by avoiding most/all redundant full file compares. A file compare on hash/digest could be an option if hash/digest collision false-matches are suspected, as were discovered occurring sometimes for MD5.

If suggest storage of binary comparison metadata in a custom field, and that this field contains a JSON map with a hash/digest type, and a key for each file format, with the value map containing, a hexadecimal formatted file hash/digest (I'd suggest SHA256), the last file size, and the file last modified timestamp; the later two for validation.

If this custom field was not configured, like the "Last Modified" plugin does, a warning should be displayed, then the current _slow_ full file compare functionality used instead.

During a binary search:
* If any of the field maps are missing, they should be created.
* If the field value is junk or if the hash/digest type is obsolete, the whole field map should be recreated.
* If a format file is missing, it should be removed from the field map.
* If a format file was added, it should be added to the map.
* If the hash/digest value is out-of-date (file size or last modified changed), the type map should be re-built.

It would be a nice if the above rules were applied after a book entry was added, after any formats were added/removed, and after any in-calibre format file changes; obviously, I would not expect this to be able to spot any updates outside of calibre.

I'd suggest that this field, and its value creation, validation, and updating should really be provided by calibre itself.
I never use the binary comparison. I keep my metadata as clean as possible and this helps me more in discovering duplicates even if the files are not identical. However, a quick look at the code reveals that the Find Duplicates already does everything you are asking for; with some caveats:
  • The data for size, hash and mtime are stored using a calibre api method called add_multiple_custom_book_data(), instead of storing them in custom columns. This makes more sense and has the advantage of not burdening the user with unnecessary custom columns.

    N.B. If you are familiar with the Action Chains Plugin, you can use the chain attached below to see the data stored by the plugin (if any). (Action Chains > Add/Modify Chains > Right click the chain table > import)
  • When you run the binary check, before using any stored hash, the plugin first verifies it is not stale. If the hash is stale, it is re-calculated.
  • The plugin does not calculate and store hashes for all books. For sake of being economical, it only calculates hashes for group of formats that share the same size, which is bound to be a small subset of the formats in the library.

Given the last point, automatic hash calculation on book additions does not make much sense. It can be done but will not be of much use, because only a small subset of these hashes will be needed based on size comparisons. In addition to this, calculating book hashes will slow down adding books, especially if the user is adding a large number of books.
Attached Files
File Type: zip show_find_duplicates_data.zip (499 Bytes, 230 views)
capink is offline   Reply With Quote
Old 09-17-2022, 02:30 AM   #970
Rellwood
Library Breeder (She/Her)
Rellwood ought to be getting tired of karma fortunes by now.Rellwood ought to be getting tired of karma fortunes by now.Rellwood ought to be getting tired of karma fortunes by now.Rellwood ought to be getting tired of karma fortunes by now.Rellwood ought to be getting tired of karma fortunes by now.Rellwood ought to be getting tired of karma fortunes by now.Rellwood ought to be getting tired of karma fortunes by now.Rellwood ought to be getting tired of karma fortunes by now.Rellwood ought to be getting tired of karma fortunes by now.Rellwood ought to be getting tired of karma fortunes by now.Rellwood ought to be getting tired of karma fortunes by now.
 
Rellwood's Avatar
 
Posts: 1,268
Karma: 1937891
Join Date: Apr 2015
Location: Fullerton, California
Device: Paperwhite 2015 (2), PW 2024 (12 GEN), PW 2023 (11 GEN), Scribe (1st)
I was wondering if you could add a feature that put tags with a specific character as the default rename. I hate having to keep right clicking on the tags with periods.
Rellwood is offline   Reply With Quote
Old 09-25-2022, 09:10 AM   #971
Winnito
Enthusiast
Winnito can tell if an avocado is ripe without touching it.Winnito can tell if an avocado is ripe without touching it.Winnito can tell if an avocado is ripe without touching it.Winnito can tell if an avocado is ripe without touching it.Winnito can tell if an avocado is ripe without touching it.Winnito can tell if an avocado is ripe without touching it.Winnito can tell if an avocado is ripe without touching it.Winnito can tell if an avocado is ripe without touching it.Winnito can tell if an avocado is ripe without touching it.Winnito can tell if an avocado is ripe without touching it.Winnito can tell if an avocado is ripe without touching it.
 
Winnito's Avatar
 
Posts: 34
Karma: 130226
Join Date: Apr 2020
Device: Kindle Voyage
Not sure if I asked this before or if this is some other place discussed.

I'm puzzled how when I do "similar" search in Metadata duplicates it can't find what I guess should be obvous hits like:

Doe, John
Doe, John L.

I have so many authors in my database with and without their middle names, but can't find most of them in duplicate.

Similar gives nothing, Fuzzy gives 20x more false positives...not even close to being useful. Am I missing something? TNx
Winnito is offline   Reply With Quote
Old 09-25-2022, 10:49 AM   #972
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,730
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Quote:
Originally Posted by Winnito View Post
Not sure if I asked this before or if this is some other place discussed.

I'm puzzled how when I do "similar" search in Metadata duplicates it can't find what I guess should be obvous hits like:

Doe, John
Doe, John L.

I have so many authors in my database with and without their middle names, but can't find most of them in duplicate.

Similar gives nothing, Fuzzy gives 20x more false positives...not even close to being useful. Am I missing something? TNx
Works fine for me - both as a normal duplicate check and as a metadata check
Click image for larger version

Name:	Dups.png
Views:	261
Size:	26.7 KB
ID:	196771
Click image for larger version

Name:	Dups_Metadata.png
Views:	260
Size:	35.2 KB
ID:	196772
kiwidude is offline   Reply With Quote
Old 09-26-2022, 07:05 AM   #973
capink
Wizard
capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.capink ought to be getting tired of karma fortunes by now.
 
Posts: 1,196
Karma: 1995558
Join Date: Aug 2015
Device: Kindle
@kiwidude: He does not want to use fuzzy author matching, because for him, it produces too much false positives.

This question was asked by the same user before, and an answer was provided here.
capink is offline   Reply With Quote
Old 09-26-2022, 07:15 AM   #974
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,730
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@capink - nice one - my bad for not properly reading the post while I was focused on other things...
kiwidude is offline   Reply With Quote
Old 09-26-2022, 12:55 PM   #975
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,076
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Then I would suggest the old Eyeball Mk II
In the Tag Browser: Right click (authors): Manage

Be sure to check (just do a Search and validate what you see) before you rename. Some Authors use their MI to differentiate from another Author with the same Fn Ln

And some Publishers leave the Initial off the cover . Gotta love consistancy in the same house

BTW I have to do this with Series. Sometimes I add a 'tie breaker': Series (Authors Initials), to the name. The Paren does not mess the sort.
theducks is online now   Reply With Quote
Reply

Tags
cross library duplicates, in library duplicates


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] Generate Cover kiwidude Plugins 862 07-24-2025 08:49 PM
[GUI Plugin] View Manager kiwidude Plugins 416 07-16-2025 05:35 PM
[GUI Plugin] Quality Check kiwidude Plugins 1251 07-07-2025 09:13 PM
[GUI Plugin] Open With kiwidude Plugins 404 02-21-2025 05:42 AM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM


All times are GMT -4. The time now is 09:46 PM.


MobileRead.com is a privately owned, operated and funded community.