|  01-17-2012, 02:05 PM | #1 | 
| Junior Member  Posts: 8 Karma: 10 Join Date: Jan 2012 Device: Kindle | 
				
				[GUI Plugin] Find Similar Stories
			 
			
			This plug-in helps you to find other books within your Calibre library that are similar to your target book. It does this by examining the full text of the books in your library, as opposed to using tags or other metadata. Main Features: 
 Version History: Spoiler: 
 Special Notes: 
 Installation & Usage: 
 Last edited by Ian_Stott; 03-08-2012 at 01:45 PM. Reason: V1.0.47, fixed bug leading to incompatibility with find_duplicates plug-in | 
|   |   | 
|  01-17-2012, 08:05 PM | #2 | 
| Member  Posts: 10 Karma: 10 Join Date: Mar 2011 Device: none | 
			
			I tried to install it and I got this message: calibre, version 0.8.35 ERROR: Excepción no considerada: <b>AttributeError</b>:'module' object has no attribute 'CalculateSimilarityAction' Traceback (most recent call last): File "site-packages\calibre\gui2\preferences\plugins.py", line 299, in add_plugin File "site-packages\calibre\gui2\preferences\plugins.py", line 387, in check_for_add_to_toolbars File "site-packages\calibre\customize\__init__.py", line 543, in load_actual_plugin AttributeError: 'module' object has no attribute 'CalculateSimilarityAction' | 
|   |   | 
|  01-17-2012, 11:31 PM | #3 | 
| Grand Sorcerer            Posts: 24,905 Karma: 47303824 Join Date: Jul 2011 Location: Sydney, Australia Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos | 
			
			An interesting idea and I couldn't resist playing with it.  The first few I tried were some short stories in a series downloaded from the same site.  These gave scores between .2 and .8 which seemed reasonable.  But, then I thought I would compare Orson Scott Cards' "Ender's Game" and "Ender's Shadow". When I went to do this, I realised I had three different epub versions of Ender's Game, so I tried them first. They weren't very similar.  One scored 0.333909 and the other "5.43363e-05". I did check the files and the they do contain the same text.  But, the formatting is very different.  Does this mean the comparison include the HTML code as well as the actual text of the book? And for completeness, the score for "Ender's Shadow" was zero when compared to "Ender's Game". As the two books are the same story from a different viewpoint, I expected something a little closer. After writing the above, I remembered there was a choice for the algorithm. The above was using "Tanimoto". I tried them again with "Euclid": Game with Shadow: 0.997362 The three versions of Game: 0.999999 and 0.999498. The short stories: between 0.94 and 0.99 Those scores look better but I would almost think they are to close (except the versions of Game). Do you have a reference what the algorithms do? Added a bit later: Ok, I opened the help and found the info on the algorithms. I can see the definitions but I'll have to think about them a bit. As you mentioned the Harry Dresden series, I did a test comparing them to the first book. The results are similar to above. With "Euclid", they are all better than 0.9. With "Tanimoto" the closest is book 11 at 0.0295. I'm a little confused on this but it probably means that I don't understand how to interpret the scores properly. Last edited by davidfor; 01-18-2012 at 12:05 AM. Reason: A little more experimenting and reading | 
|   |   | 
|  01-17-2012, 11:39 PM | #4 | 
| Grand Sorcerer            Posts: 24,905 Karma: 47303824 Join Date: Jul 2011 Location: Sydney, Australia Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos | 
			
			And putting on my application developer hat and being a bit nit-picky: When I worked out what the plugin did (I didn't actually read the full post before trying it), the menu item "Find Similar Books..." started to bug me. It doesn't actually do this. What it does is "Calculate Similarity Scores". To do what you first paragraph states, and the menu item implies, the plugin would need to do the calculation on every book in the library and display a list that had scores greater than defined amount. A "Reset Scores" option would be a good idea. Calculate the similarity for a set of books, then calculate it for a different set, the two sets of numbers don't have any relationship with each other. This could be a menu option or a setting to clear all the scores before doing a calculation. Last edited by davidfor; 01-18-2012 at 12:16 AM. Reason: an extra suggestion | 
|   |   | 
|  01-18-2012, 12:08 PM | #5 | |
| Junior Member  Posts: 8 Karma: 10 Join Date: Jan 2012 Device: Kindle | Quote: 
 More experienced Calibre users may have a more informative view. | |
|   |   | 
|  01-18-2012, 12:15 PM | #6 | ||
| Junior Member  Posts: 8 Karma: 10 Join Date: Jan 2012 Device: Kindle | Quote: 
  A Reset Scores option is a good idea. I'll look into implementing it for the next release. Quote: 
 | ||
|   |   | 
|  01-18-2012, 12:38 PM | #7 | ||
| Junior Member  Posts: 8 Karma: 10 Join Date: Jan 2012 Device: Kindle | Quote: 
 Quote: 
  One of the implications of using the TF-IDF method for describing the text of a document is that it focuses its importance upon the unusual words in a set of documents. The good side of this is that common words, eg for, of, him, her etc become irrelevant, as they occur in all (english) documents. The downside of this is that if you are comparing only 2 documents, only the words that are different between them will count and so the similarity score is likely to be low (especially for the tanimoto score). However, if you select the books in the Ender series as well as a lot of other sci-fi books (eg all of the other books by Orson Scott Card), then you will find that the Ender books are scoring far higher. When I did this with the Orson Scott Card books in my library, using Ender's Game as the target, Ender in exile came out top, with a tanimoto score of 0.51, Enders shadow at 0.23 and Speaker at 0.21. A more satisfactory approach may be to replace the TF-IDF method with a some form of word count where the common words have been removed. However, this would require a dictionary that is language based. I have been ponted towards some python based text informatics libraries that would help with this - but I didn't want to launch into these for v1. | ||
|   |   | 
|  01-18-2012, 10:16 PM | #8 | |||
| Grand Sorcerer            Posts: 24,905 Karma: 47303824 Join Date: Jul 2011 Location: Sydney, Australia Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos | Quote: 
 Quote: 
 Just now, I took a copy of one of these epubs, change the name of the file and the title using Sigil and added it to calibre. The similarity score for this new book was 0.0000354521. But, apart from one extra word in the metadata, the books are be the same. Some of the other tests are OK. Comparing "Speaker for the Dead", "Xenocide" with "Children of the Mind" gave 0.24976 and 0.335613 respectively. Those scores make sense. But, my comparison of Ender's Game with Speaker and Shadow gives zero for both of them. And now I am a little bit more baffled. I get different result if I compare two books, than if I compare more. Comparing all the above books individually to Game, gave a zero score for each. But, comparing them at the same time, gave scores between 0.001 and 0.094. I thought the comparison was always to the first selected book. Quote: 
 | |||
|   |   | 
|  01-19-2012, 03:02 PM | #9 | |
| Junior Member  Posts: 8 Karma: 10 Join Date: Jan 2012 Device: Kindle | Quote: 
 
 Let me know if this helps with determiing if your 3 copies of Ender's Game are the same. | |
|   |   | 
|  01-20-2012, 01:50 AM | #10 | 
| Member  Posts: 10 Karma: 10 Join Date: Mar 2011 Device: none | 
			
			I couldn't install it, when I tried I've got this message and it said that it will uninstall the plugin. However, after I reinstall the windows, I changed to windows 7, I have no problems in installing the plugin.
		 | 
|   |   | 
|  01-23-2012, 04:54 PM | #11 | 
| Junior Member  Posts: 8 Karma: 10 Join Date: Jan 2012 Device: Kindle | 
			
			I am pleased to learn that you have now managed to install the plugin. Could you let me know what operating system you were initally using, in case there is something about the plugin that is specific to Windows 7?
		 | 
|   |   | 
|  03-02-2012, 09:23 AM | #12 | 
| Connoisseur    Posts: 88 Karma: 200 Join Date: Nov 2010 Location: Dortmund, Germany Device: Kindle Paperwhite (10. Generation) | 
			
			For me this plugin creates a slight conflict with find_duplicates. When both are installed, find_duplicates displays "ERROR: Restart required: You must restart Calibre before using this plugin!". I did some researching, and it seems that your plugin manages to overwrite some variables that find_duplicates uses, leading to fd searching for some of it's images in the fss zip archive. It seems you are using his common_utils and this causes the conflict... maybe you could just talk with kiwidude about avoiding the conflics | 
|   |   | 
|  03-02-2012, 09:35 AM | #13 | 
| Calibre Plugins Developer            Posts: 4,735 Karma: 2197770 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis | 
			
			@silentguy/Ian - I just downloaded your Similar Stories plugin to have a look at the source. The cause is undoubtedly just a copy/paste error in action.py. It should be referencing your *own* common_utils file, not the one in the find duplicates plugin. i.e. Change this: from calibre_plugins.find_duplicates.common_utils import set_plugin_icon_resources, get_icon, \ to this: from calibre_plugins.similar_stories.common_utils import set_plugin_icon_resources, get_icon, \ | 
|   |   | 
|  03-08-2012, 01:32 PM | #14 | 
| Junior Member  Posts: 8 Karma: 10 Join Date: Jan 2012 Device: Kindle | 
			
			Hi Folks, Thanks for both letting me know of the issue and for identifying the issue. I'll fix it tonight and repost. | 
|   |   | 
|  03-08-2012, 01:46 PM | #15 | 
| Junior Member  Posts: 8 Karma: 10 Join Date: Jan 2012 Device: Kindle | 
			
			I have now released a new version with the identified by kiwidude. Hopefully this will solve the issue.
		 | 
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| [GUI Plugin] Find Duplicates | kiwidude | Plugins | 1124 | 04-18-2025 09:19 AM | 
| [GUI Plugin] Open With | kiwidude | Plugins | 404 | 02-21-2025 05:42 AM | 
| [GUI Plugin] FanFictionDownLoader | JimmXinu | Plugins | 3985 | 05-08-2015 11:18 PM | 
| How to find, in library, books similar to one on device? | capnm | Library Management | 1 | 11-23-2011 06:24 PM | 
| [GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 12:27 PM |