01-17-2012, 03:05 PM | #1 |
Junior Member
Posts: 8
Karma: 10
Join Date: Jan 2012
Device: Kindle
|
[GUI Plugin] Find Similar Stories
This plug-in helps you to find other books within your Calibre library that are similar to your target book. It does this by examining the full text of the books in your library, as opposed to using tags or other metadata.
Main Features:
Version History: Spoiler:
Special Notes:
Installation & Usage:
Last edited by Ian_Stott; 03-08-2012 at 02:45 PM. Reason: V1.0.47, fixed bug leading to incompatibility with find_duplicates plug-in |
01-17-2012, 09:05 PM | #2 |
Member
Posts: 10
Karma: 10
Join Date: Mar 2011
Device: none
|
I tried to install it and I got this message:
calibre, version 0.8.35 ERROR: Excepción no considerada: <b>AttributeError</b>:'module' object has no attribute 'CalculateSimilarityAction' Traceback (most recent call last): File "site-packages\calibre\gui2\preferences\plugins.py", line 299, in add_plugin File "site-packages\calibre\gui2\preferences\plugins.py", line 387, in check_for_add_to_toolbars File "site-packages\calibre\customize\__init__.py", line 543, in load_actual_plugin AttributeError: 'module' object has no attribute 'CalculateSimilarityAction' |
Advert | |
|
01-18-2012, 12:31 AM | #3 |
Grand Sorcerer
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
An interesting idea and I couldn't resist playing with it. The first few I tried were some short stories in a series downloaded from the same site. These gave scores between .2 and .8 which seemed reasonable. But, then I thought I would compare Orson Scott Cards' "Ender's Game" and "Ender's Shadow". When I went to do this, I realised I had three different epub versions of Ender's Game, so I tried them first. They weren't very similar. One scored 0.333909 and the other "5.43363e-05". I did check the files and the they do contain the same text. But, the formatting is very different. Does this mean the comparison include the HTML code as well as the actual text of the book?
And for completeness, the score for "Ender's Shadow" was zero when compared to "Ender's Game". As the two books are the same story from a different viewpoint, I expected something a little closer. After writing the above, I remembered there was a choice for the algorithm. The above was using "Tanimoto". I tried them again with "Euclid": Game with Shadow: 0.997362 The three versions of Game: 0.999999 and 0.999498. The short stories: between 0.94 and 0.99 Those scores look better but I would almost think they are to close (except the versions of Game). Do you have a reference what the algorithms do? Added a bit later: Ok, I opened the help and found the info on the algorithms. I can see the definitions but I'll have to think about them a bit. As you mentioned the Harry Dresden series, I did a test comparing them to the first book. The results are similar to above. With "Euclid", they are all better than 0.9. With "Tanimoto" the closest is book 11 at 0.0295. I'm a little confused on this but it probably means that I don't understand how to interpret the scores properly. Last edited by davidfor; 01-18-2012 at 01:05 AM. Reason: A little more experimenting and reading |
01-18-2012, 12:39 AM | #4 |
Grand Sorcerer
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
And putting on my application developer hat and being a bit nit-picky:
When I worked out what the plugin did (I didn't actually read the full post before trying it), the menu item "Find Similar Books..." started to bug me. It doesn't actually do this. What it does is "Calculate Similarity Scores". To do what you first paragraph states, and the menu item implies, the plugin would need to do the calculation on every book in the library and display a list that had scores greater than defined amount. A "Reset Scores" option would be a good idea. Calculate the similarity for a set of books, then calculate it for a different set, the two sets of numbers don't have any relationship with each other. This could be a menu option or a setting to clear all the scores before doing a calculation. Last edited by davidfor; 01-18-2012 at 01:16 AM. Reason: an extra suggestion |
01-18-2012, 01:08 PM | #5 | |
Junior Member
Posts: 8
Karma: 10
Join Date: Jan 2012
Device: Kindle
|
Quote:
More experienced Calibre users may have a more informative view. |
|
Advert | |
|
01-18-2012, 01:15 PM | #6 | ||
Junior Member
Posts: 8
Karma: 10
Join Date: Jan 2012
Device: Kindle
|
Quote:
Quote:
|
||
01-18-2012, 01:38 PM | #7 | ||
Junior Member
Posts: 8
Karma: 10
Join Date: Jan 2012
Device: Kindle
|
Quote:
Quote:
However, if you select the books in the Ender series as well as a lot of other sci-fi books (eg all of the other books by Orson Scott Card), then you will find that the Ender books are scoring far higher. When I did this with the Orson Scott Card books in my library, using Ender's Game as the target, Ender in exile came out top, with a tanimoto score of 0.51, Enders shadow at 0.23 and Speaker at 0.21. A more satisfactory approach may be to replace the TF-IDF method with a some form of word count where the common words have been removed. However, this would require a dictionary that is language based. I have been ponted towards some python based text informatics libraries that would help with this - but I didn't want to launch into these for v1. |
||
01-18-2012, 11:16 PM | #8 | |||
Grand Sorcerer
Posts: 24,905
Karma: 47303822
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
Quote:
Just now, I took a copy of one of these epubs, change the name of the file and the title using Sigil and added it to calibre. The similarity score for this new book was 0.0000354521. But, apart from one extra word in the metadata, the books are be the same. Some of the other tests are OK. Comparing "Speaker for the Dead", "Xenocide" with "Children of the Mind" gave 0.24976 and 0.335613 respectively. Those scores make sense. But, my comparison of Ender's Game with Speaker and Shadow gives zero for both of them. And now I am a little bit more baffled. I get different result if I compare two books, than if I compare more. Comparing all the above books individually to Game, gave a zero score for each. But, comparing them at the same time, gave scores between 0.001 and 0.094. I thought the comparison was always to the first selected book. Quote:
|
|||
01-19-2012, 04:02 PM | #9 | |
Junior Member
Posts: 8
Karma: 10
Join Date: Jan 2012
Device: Kindle
|
Quote:
Let me know if this helps with determiing if your 3 copies of Ender's Game are the same. |
|
01-20-2012, 02:50 AM | #10 |
Member
Posts: 10
Karma: 10
Join Date: Mar 2011
Device: none
|
I couldn't install it, when I tried I've got this message and it said that it will uninstall the plugin. However, after I reinstall the windows, I changed to windows 7, I have no problems in installing the plugin.
|
01-23-2012, 05:54 PM | #11 |
Junior Member
Posts: 8
Karma: 10
Join Date: Jan 2012
Device: Kindle
|
I am pleased to learn that you have now managed to install the plugin. Could you let me know what operating system you were initally using, in case there is something about the plugin that is specific to Windows 7?
|
03-02-2012, 10:23 AM | #12 |
Connoisseur
Posts: 88
Karma: 200
Join Date: Nov 2010
Location: Dortmund, Germany
Device: Kindle Paperwhite (10. Generation)
|
For me this plugin creates a slight conflict with find_duplicates. When both are installed, find_duplicates displays "ERROR: Restart required: You must restart Calibre before using this plugin!".
I did some researching, and it seems that your plugin manages to overwrite some variables that find_duplicates uses, leading to fd searching for some of it's images in the fss zip archive. It seems you are using his common_utils and this causes the conflict... maybe you could just talk with kiwidude about avoiding the conflics |
03-02-2012, 10:35 AM | #13 |
Calibre Plugins Developer
Posts: 4,693
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@silentguy/Ian - I just downloaded your Similar Stories plugin to have a look at the source. The cause is undoubtedly just a copy/paste error in action.py. It should be referencing your *own* common_utils file, not the one in the find duplicates plugin.
i.e. Change this: from calibre_plugins.find_duplicates.common_utils import set_plugin_icon_resources, get_icon, \ to this: from calibre_plugins.similar_stories.common_utils import set_plugin_icon_resources, get_icon, \ |
03-08-2012, 02:32 PM | #14 |
Junior Member
Posts: 8
Karma: 10
Join Date: Jan 2012
Device: Kindle
|
Hi Folks,
Thanks for both letting me know of the issue and for identifying the issue. I'll fix it tonight and repost. |
03-08-2012, 02:46 PM | #15 |
Junior Member
Posts: 8
Karma: 10
Join Date: Jan 2012
Device: Kindle
|
I have now released a new version with the identified by kiwidude. Hopefully this will solve the issue.
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] Find Duplicates | kiwidude | Plugins | 1113 | 12-16-2024 02:50 AM |
[GUI Plugin] Open With | kiwidude | Plugins | 403 | 04-01-2024 09:39 AM |
[GUI Plugin] FanFictionDownLoader | JimmXinu | Plugins | 3985 | 05-09-2015 12:18 AM |
How to find, in library, books similar to one on device? | capnm | Library Management | 1 | 11-23-2011 07:24 PM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 01:27 PM |