![]() |
#1006 |
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,720
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Do you have a lot of duplicate authors? Feel free to PM me details if you prefer.
I do suspect I know what the issue is if that is indeed the case. I tried not to change the underlying approach which is to get a list of authors only and compare their names. Whereas the 1.9.7 version iterates across all the books, and then compares the authors from those books. Iterating just across the author names would ordinarily be the faster approach. But the downside now is that I do not have the details of the books for that author, and instead have to run an additional search to retrieve them in the case where they are showing up as a duplicate. For only a handful of duplicate authors I would not expect this to be a problem. If you have hundreds or thousands I can imagine this not scaling well... |
![]() |
![]() |
![]() |
#1007 |
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,720
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
This does have me wondering whether Find Duplicates needs some refactoring at some point to perform it's "Find Book Duplicates" and "Find Library Duplicates" searches as background jobs. For larger libraries searches can take a long time to run and it is pretty annoying to have the GUI blocked. Another one for the rainy day list...
|
![]() |
![]() |
Advert | |
|
![]() |
#1008 | |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,897
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
The viewed Library (Intake) Is on a Shared drive. The Target is on This PC (Yes I know. Networked drives can be an issue, but I have not ever had an issue with a PULL from a remote. OTOH Pushed did barf frequently) The Intake is fairly tiny (gets processed frequently). The Target has ~15K+ books (the libraries are on Rotating drives, although the Host has a SSD) Time is under a minute ![]() Could the 'slowness' others are seeing, be simply from drive trashing? |
|
![]() |
![]() |
![]() |
#1009 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,335
Karma: 8012652
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
I suggest you try it with the changed code, which does the reversal once then uses the reversed dict. This should be no slower than the old method that ignored VLs. |
|
![]() |
![]() |
![]() |
#1010 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,335
Karma: 8012652
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
Then, to get full metadata for a book with book_id you could then use Code:
new_api.get_proxy_metadata(book_id) |
|
![]() |
![]() |
Advert | |
|
![]() |
#1011 |
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,720
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Thanks chaley (and theDucks for their feedback). Those tips above look promising, obviously calibre's underlying apis have changed a bit in the last 10 years unsurprisingly to add this new_api stuff which looks to be worthy of some investigation.
I did find one other performance bottleneck which is the set_marked_ids when it is called with 50k rows (taking 11 seconds on my machine). I was progressively adding print statements through calibre source to find out where the slowdown is, got as far as refresh_ids() being invoked by marked_changed(). So my plugins normally do these two pairs of calls, for example: Code:
self.gui.current_db.set_marked_ids(marked_ids) self.gui.search.set_search_string('marked:library_duplicate') |
![]() |
![]() |
![]() |
#1012 | |
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,720
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
The downside now is that it then takes an additional 15 seconds for set_marked_ids() as per my comments above to complete for them to see the results. If there isn't a quick win on that one I may just make this a new configuration option on the Find Library Duplicates dialog as to whether to display the results or not. |
|
![]() |
![]() |
![]() |
#1013 | ||
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,335
Karma: 8012652
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
Question: are you calling set_marked_ids in a loop with the dict growing each time, or once with a complete dict? If the former then that might explain it because all the work is redone for each call. If the latter then ![]() Another test: how long does it take if you select all 50,000 books in the library then use Mark books to set a text mark? If this takes 11 seconds then we know the culprit. If it doesn't then there is something in the sequence of calls. Quote:
If you are willing, could you give me the metadata.db for the 50,000 book library? With that I could look at what it actually happening. It may be that we could add a "refresh=True" parameter to set_marked_ids(), telling it whether or not to do the actual refresh. My theory: if you do the search at some point after then the refresh_ids will occur there, involving only the books returned by the search. |
||
![]() |
![]() |
![]() |
#1014 |
Enthusiast
![]() Posts: 29
Karma: 10
Join Date: Jan 2012
Device: kindle
|
I know somebody's gonna hate me, but while doing a find metadata variation (with the 'alumoi special version') I ran into this:
calibre 6.5 embedded-python: True Windows-10-10.0.19044-SP0 Windows ('64bit', 'WindowsPE') ('Windows', '10', '10.0.19044') Python 3.10.1 Windows: ('10', '10.0.19044', 'SP0', 'Multiprocessor Free') Interface language: None Successfully initialized third party plugins: Gather KFX-ZIP (from KFX Input) (1, 49, 0) && DeDRM (7, 2, 1) && Package KFX (from KFX Input) (1, 49, 0) && Barnes & Noble (1, 3, 0) && Clean Metadata (0, 0, 6) && Extract ISBN (1, 5, 1) && Fantastic Fiction (1, 5, 1) && Fantastic Fiction Adults (1, 2, 0) && Find Duplicates (1, 10, 2) && Goodreads (1, 7, 0) && ISFDB (3, 0, 0) && KFX metadata reader (from KFX Input) (1, 49, 0) && KFX Input (1, 49, 0) && Kindle Collections (1, 7, 29) && Obok DeDRM (7, 2, 1) && Quality Check (1, 12, 0) && Smashwords Metadata (1, 0, 2) Traceback (most recent call last): File "calibre_plugins.find_duplicates.dialogs", line 901, in _on_variation_list_item_changed File "calibre_plugins.find_duplicates.dialogs", line 837, in _search_in_gui File "calibre_plugins.find_duplicates.dialogs", line 905, in _decode_list_item KeyError: 2926 while changing the author. |
![]() |
![]() |
![]() |
#1015 |
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,720
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
|
![]() |
![]() |
![]() |
#1016 | |||
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,720
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Quote:
Quote:
![]() Quote:
|
|||
![]() |
![]() |
![]() |
#1017 | |
Member
![]() Posts: 14
Karma: 10
Join Date: Sep 2014
Location: UK
Device: Kobo Aura, Kindle Paperwhite Signature Edition
|
Author Metadata Search bug
Quote:
I concur, since the last version update of the plugin I have been unable to perform any Author "find metadata variations" searches for Authors . I am really hoping this get fixed soon please. ![]() ![]() An example Error log below calibre, version 6.7.0 ERROR: Unhandled exception: <b>KeyError</b>:43653 calibre 6.7 embedded-python: True Windows-10-10.0.22621-SP0 Windows ('64bit', 'WindowsPE') ('Windows', '10', '10.0.22621') Python 3.10.1 Windows: ('10', '10.0.22621', 'SP0', 'Multiprocessor Free') Interface language: en_GB Successfully initialized third party plugins: Gather KFX-ZIP (from KFX Input) (1, 49, 0) && Package KFX (from KFX Input) (1, 49, 0) && Author Book Count (2, 2, 2) && Fantastic Fiction (1, 5, 1) && Find Duplicates (1, 10, 1) && KFX metadata reader (from KFX Input) (1, 49, 0) && KFX Input (1, 49, 0) && Set KFX metadata (from KFX Output) (1, 64, 0) && KFX Output (1, 64, 0) && Kindle Collections (1, 7, 29) && KoboTouchExtended (3, 6, 3) && Modify ePub (1, 7, 3) && Obok DeDRM (6, 5, 4) && Quality Check (1, 12, 0) && Reading List (1, 14, 0) Traceback (most recent call last): File "calibre_plugins.find_duplicates.dialogs", line 901, in _on_variation_list_item_changed File "calibre_plugins.find_duplicates.dialogs", line 837, in _search_in_gui File "calibre_plugins.find_duplicates.dialogs", line 905, in _decode_list_item KeyError: 43653 |
|
![]() |
![]() |
![]() |
#1018 |
Calibre Plugins Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,720
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Find Duplicates v1.10.2 Released
Release Notes:
https://github.com/kiwidude68/calibr...icates-v1.10.2 Releasing this version now so those impacted by the Metadata Variations bug (which I never reproduced) should be able to continue. If anyone has some exact steps to repro that problem let me know. |
![]() |
![]() |
![]() |
#1019 | |
Member
![]() Posts: 14
Karma: 10
Join Date: Sep 2014
Location: UK
Device: Kobo Aura, Kindle Paperwhite Signature Edition
|
Quote:
![]() ![]() ![]() |
|
![]() |
![]() |
![]() |
#1020 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,335
Karma: 8012652
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
I found yet another 'hidden' linear search deep in the guts of the gui's view of the database. The code verified that a book_id exists in the gui display (has a gui row number) by searching for the book in a list. It being a list, the average number of "looks" to make the check is len(books)/2. It did this for every book_id in the list of ids passed to refresh_ids(). For your db with all 50,000 books selected that meant 1.25 billion lookups (!) to refresh the booklist. The fix is to keep a dict {book_id:row} so the check is one dict probe per book id. I also found a place where on a mouse click calibre checked to see if the clicked-on row was in a selection, and if so checked some other things to see if drag-n-drop is being used. By refactoring the check I knocked 3 seconds off the right-click on a book. Qt takes a long time to return the selected rows. I can't do anything about that. The bottom line (all times measured on my machine, an Intel Core i7-10710U):
I will submit the changes to Kovid. The fix is a bit exotic so he might prefer an alternate approach. I will post here once we have a resolution in source. |
|
![]() |
![]() |
![]() |
Tags |
cross library duplicates, in library duplicates |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] Quality Check | kiwidude | Plugins | 1247 | 04-18-2025 01:51 AM |
[GUI Plugin] Generate Cover | kiwidude | Plugins | 852 | 03-26-2025 09:51 PM |
[GUI Plugin] Open With | kiwidude | Plugins | 404 | 02-21-2025 05:42 AM |
[GUI Plugin] View Manager | kiwidude | Plugins | 415 | 05-11-2024 03:28 AM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 12:27 PM |