View Single Post
Old 07-09-2012, 02:16 PM   #351
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
Beta for v1.6.0

Here is a fairly significantly changed version of the plugin that I would appreciate some feedback on. As per the request above it now includes the option to generate readability statistics, be it Flesch Reading Ease, Flesch-Kincaid Grade Level or Gunning Fog Index.

Note these statistics are calculated based on the assumption that the book is in English, I have no interest in adding support for other languages. It uses a heavily modified tiny subset of the NLTK library (Natural Language Toolkit).

So you can now add up to a further three custom columns to display these additional statistics should you so choose, as per the screenshot below. Create columns of type floating point number to store the value, and use a format specifier of something like {0:.1f} to display the value to one decimal place when configuring your custom column.

Currently the readability statistics are calculated across the entire book (but only if you have any readability analysis columns enabled). It is just as fast to compute two or three as it is to compute one statistic, so don't think that by only doing one of them you will make it faster! I haven't yet decided whether I should instead do some sort of subset sampling of the book and if so what guise that should take - e.g. first 5,000 characters or % based or whatever. Feel free to suggest if you have an opinion - or if you don't mind the speed since it runs in the background anyways then that is fine too.

You may also find it mildly interesting to look at the log to see the more detailed statistics gathered. You may also notice a small difference in the word counts (if you normally calculate word count) between the existing calibre algorithm vs the NLTK one. Not something I am inclined to do anything about given the low importance but mentioned here in case anyone notices, as they do have different approaches. I have also made a slight change in how the calibre document is stripped of its html tags before counting which will bump up the word counts slightly (previously sentences would get joined) so don't be surprised to see a slightly different word count than any you compute currently.

Let me know if any issues, if there are no problems I will release it in a few days...
Attached Thumbnails
Click image for larger version

Name:	Screenshot_2_Configuration.png
Views:	78
Size:	27.2 KB
ID:	88924  

Last edited by kiwidude; 07-14-2012 at 12:48 PM. Reason: Removing attachment as released
kiwidude is offline   Reply With Quote