[GUI Plugin] Count Pages - Page 58

red_dragon · 01-11-2016, 06:05 AM

Btw, for the German language the formula to calculate the Flesh Reading Ease is different. I have created my own copy of the plugin as I have never figured out how to use the book language to automatically switch to the new algorithm.

As the plugin is still developed actively, maybe the following change can be integrated:

# German Flesh Reading Ease
score = 180 - text_analysis['averageWordsPerSentence'] - (58.5 * (text_analysis['syllableCount']/ text_analysis['wordCount']))

Namenlos · 01-12-2016, 12:06 PM

There is another (well four …) metric for German text, the "Wiener Sachtextformel"

Translation for the formula:

MS percentage of words with three or more syllables
SL average words in a sentence
IW percentage of words with six or more letters
ES percentage of words with one syllable

There are also other metrics for English that can be found in a NLTK based implementation on github.

davidfor · 01-12-2016, 05:52 PM

Adding a language specific version of the statistics is easy. I just needed to add the calculations and decide which to use based on the language in the book.

But, adding more statistics, is a lot harder. The plugin has a cut-down version of the NLTK library. From the notes in the plugin, it probably only has the English statistics. So, that would have to change to the full version. Then the configuration would have to be changed. And are these extra stats to calculate, or alternatives to use in place of the three English stats? At the moment, I'm not interested in going through this. If someone is, then I'll happily help.

red_dragon · 01-15-2016, 02:45 AM

Hello davidfor,

I'd appreciate if you could send me the code how to add a language specific version, or just post it here. Thanks!

In my private version I have tried out the "german.pickle" from the nltk package (modified to work with the plugin) but the difference was <1%. I don't care much about a higher accuracy, e.g. whether the reading ease is 75.5 or 76.7. If it's easy to select the correct pickle file on the fly, well, then it makes sense to use that one.

Regarding the "Wiener Sachtextformel", I am using the 4th variant which is calculated like this:

score = (0.2656 * text_analysis['averageWordsPerSentence']) + (0.2744 * (text_analysis['complexwordCount'] * 100 / text_analysis['wordCount']) ) -1.693

It can replace the "Gunning-Fog-Index" (as "years of education"), which doesn't work for German books anyway.

davidfor · 01-15-2016, 08:51 AM

OK, here is a beta that has the German version of the Flesh Reading Ease statistic. The code for getting the language is on jobs.py at line 167.

Code:

from calibre.utils.localization import get_lang
lang = iterator.opf.language
lang = get_lang() if not lang else lang

In this, lang is set to a three character language code such as "eng" or "deu". "get_lang()" gets the current interface language. This then passed to the statistic method which chooses the appropriate calculation. The pickle file is loaded and passed to the job, so choosing at the book level is a problem. There is a comment in the pickle load code about why this is done. Choosing based on the interface language, would be simple.

And thinking about the extra statistics, I have thought of a way that might work to handle this.

At the moment, there are five statistics: words, pages, Flesch Reading Ease, Flesch-Kincaid Grade and Gunning Fox Index. These are fixed and the options around them are where to store the results. My thought is to add the extra statistics, but make them selectable from a list. The word and page count would be kept as they are. For the others, have pairs of drop-down lists. The first of each pair lists the statistics. The second the column to store it in. With that, exactly which statistic used from the full set would be up to the user. I would probably limit this to three stats, but, with a little thought, it could be extended to as many as needed.

I haven't looked enough at the NLTK code to see how easy it would be to replace the version in the plugin with a more complete version. For the simpler statistics that use calculations similar to those already in place, adding them in this way should be practical.

red_dragon · 01-20-2016, 05:55 AM

I am not sure if the interface language helps much. My library is mixed with books in English, German and some French.

How can the language be retrieved from a book? That would be the preferred way to do it.

BetterRed · 01-20-2016, 06:30 AM

Quote:

Originally Posted by red_dragon

I am not sure if the interface language helps much. My library is mixed with books in English, German and some French.

How can the language be retrieved from a book? That would be the preferred way to do it.

Language is a standard metadata column in calibre, its also a Dublin Core element. I suggest you show the column in the book list - it's normally hidden I think. If you have them then . . .

BR

davidfor · 01-20-2016, 06:31 AM

Quote:

Originally Posted by red_dragon

I am not sure if the interface language helps much. My library is mixed with books in English, German and some French.

How can the language be retrieved from a book? That would be the preferred way to do it.

The code I am using attempts to get the language from the book. If it can't, then it uses the interface language. If we extend the statistics, there is a problem as an extra language specific file is used. At the moment, this is loaded early before the individual book languages are known, and that could be problem.

chaley · 01-20-2016, 11:55 AM

Quote:

Originally Posted by davidfor

The code I am using attempts to get the language from the book. If it can't, then it uses the interface language. If we extend the statistics, there is a problem as an extra language specific file is used. At the moment, this is loaded early before the individual book languages are known, and that could be problem.

It appears that "from the book" means from the OPF inside the book. If my assumption is right and if you have access to metadata.db then you might also check the language stored for the book in the db before falling back to the interface language.

rpgmaker · 01-20-2016, 01:02 PM

Quote:

Originally Posted by davidfor

The code I am using attempts to get the language from the book. If it can't, then it uses the interface language. If we extend the statistics, there is a problem as an extra language specific file is used. At the moment, this is loaded early before the individual book languages are known, and that could be problem.

Does every language has a custom implementation of this word counting method? If not I think this should be a deal breaker for the update that is being entertained.

Divingduck · 01-20-2016, 01:11 PM

Quote:

Originally Posted by chaley

It appears that "from the book" means from the OPF inside the book. If my assumption is right and if you have access to metadata.db then you might also check the language stored for the book in the db before falling back to the interface language.

+1
I like this too

davidfor · 01-20-2016, 06:07 PM

Quote:

Originally Posted by chaley

It appears that "from the book" means from the OPF inside the book. If my assumption is right and if you have access to metadata.db then you might also check the language stored for the book in the db before falling back to the interface language.

Yes, the OPF in the book. And getting it form the database was in the back of my mind when I was writing the post, but this is happening inside a job, so I didn't think I have access to the database. It could be part of the data collected before starting the job. And that probably is a good idea as there is no guarantee that the copy of the book in the library has been updated with the latest metadata.

davidfor · 01-20-2016, 07:35 PM

Quote:

Originally Posted by rpgmaker

Does every language has a custom implementation of this word counting method? If not I think this should be a deal breaker for the update that is being entertained.

The change to the word count is to use a different algorithm already implemented in calibre. This algorithm, and I assume most of the code, come from elsewhere. This accepts the language as a parameter. I have not looked closely enough at this to know if there is an implementation for every language, but I doubt it. What is likely is that the major languages have implementations and the rest fall back to a common method.

The method currently used for the word count is the same for all languages. It is a fairly simple method but is not to inaccurate. At least for English. I don't know about the other languages which is why I asked. If it wasn't for the other languages, I wouldn't bother about this as for nearly all uses, the count we have is close enough. But, I think the other languages should be treated properly.

The other thing that has been discussed is the other statistics. These are English specific stats. The first mention was because someone had the German calculation for one of them. Having that used automatically was easy and sensible. But the rest of the stats are more of a problem. And there are other stats that make more sense for other languages. Adding a way to calculate them is a lot more complex.

My comments about the other stats are really just me thinking out loud. At this point, I have no plan to implement them. There are other things I would prefer to do. Maybe in the future, I might get bored and return to it. Or maybe someone will see my comments and decide to do it. If someone does, I'll be very happy to help with suggestions, testing and other help.

My plan for Count Pages is to release the changes as is (different word count algorithm, German version of one of the other stats) plus one other change. The other change is something someone else has done and is about making the plugin work better when called from other plugins.

red_dragon · 01-21-2016, 04:32 AM

Quote:

Originally Posted by davidfor

The code I am using attempts to get the language from the book.

Somehow I missed this. The beta works fine! Having one column for the reading ease instead of two is much better.

ratanparai · 01-24-2016, 04:42 PM

Is there anyway to generate apnx file from this plugin without the send to kindle method. I'm trying to create apnx file for my android device. I used apnx generator plugin but I the page number is not accurate. So I'm searching for anyway to create apnx file from the goodreads page number.

I don't have any kindle devices. I searched for a way to fake my android device as kindle device so that I can use the column number to generate apnx file but with no luck.

I don't know why there is no way to generate apnx file for kindle application for android device when I choose send to device or in android device interface plugin setting.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
[GUI Plugin] Quality Check	kiwidude	Plugins	1184	04-17-2024 06:17 PM
[GUI Plugin] Open With	kiwidude	Plugins	403	04-01-2024 08:39 AM
[GUI Plugin] Quick Preferences	kiwidude	Plugins	62	03-16-2024 11:47 PM
[GUI Plugin] Kindle Collections (old)	meme	Plugins	2070	08-11-2014 12:02 AM
[GUI Plugin] Plugin Updater Deprecated	kiwidude	Plugins	159	06-19-2011 12:27 PM

01-11-2016, 06:05 AM	#856
red_dragon Daywalker Posts: 29 Karma: 52 Join Date: Jul 2008 Device: Kindle Paperwhite	Btw, for the German language the formula to calculate the Flesh Reading Ease is different. I have created my own copy of the plugin as I have never figured out how to use the book language to automatically switch to the new algorithm. As the plugin is still developed actively, maybe the following change can be integrated: # German Flesh Reading Ease score = 180 - text_analysis['averageWordsPerSentence'] - (58.5 * (text_analysis['syllableCount']/ text_analysis['wordCount']))

01-12-2016, 12:06 PM	#857
Namenlos Enthusiast Posts: 37 Karma: 10 Join Date: Jul 2014 Device: Kobo Mini	There is another (well four …) metric for German text, the "Wiener Sachtextformel" Translation for the formula: MS percentage of words with three or more syllables SL average words in a sentence IW percentage of words with six or more letters ES percentage of words with one syllable There are also other metrics for English that can be found in a NLTK based implementation on github.

01-12-2016, 05:52 PM	#858
davidfor Grand Sorcerer Posts: 24,907 Karma: 47303748 Join Date: Jul 2011 Location: Sydney, Australia Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos	Adding a language specific version of the statistics is easy. I just needed to add the calculations and decide which to use based on the language in the book. But, adding more statistics, is a lot harder. The plugin has a cut-down version of the NLTK library. From the notes in the plugin, it probably only has the English statistics. So, that would have to change to the full version. Then the configuration would have to be changed. And are these extra stats to calculate, or alternatives to use in place of the three English stats? At the moment, I'm not interested in going through this. If someone is, then I'll happily help.

01-15-2016, 02:45 AM	#859
red_dragon Daywalker Posts: 29 Karma: 52 Join Date: Jul 2008 Device: Kindle Paperwhite	Hello davidfor, I'd appreciate if you could send me the code how to add a language specific version, or just post it here. Thanks! In my private version I have tried out the "german.pickle" from the nltk package (modified to work with the plugin) but the difference was <1%. I don't care much about a higher accuracy, e.g. whether the reading ease is 75.5 or 76.7. If it's easy to select the correct pickle file on the fly, well, then it makes sense to use that one. Regarding the "Wiener Sachtextformel", I am using the 4th variant which is calculated like this: score = (0.2656 * text_analysis['averageWordsPerSentence']) + (0.2744 * (text_analysis['complexwordCount'] * 100 / text_analysis['wordCount']) ) -1.693 It can replace the "Gunning-Fog-Index" (as "years of education"), which doesn't work for German books anyway.

01-20-2016, 05:55 AM	#861
red_dragon Daywalker Posts: 29 Karma: 52 Join Date: Jul 2008 Device: Kindle Paperwhite	I am not sure if the interface language helps much. My library is mixed with books in English, German and some French. How can the language be retrieved from a book? That would be the preferred way to do it.

01-24-2016, 04:42 PM	#870
ratanparai Junior Member Posts: 3 Karma: 10 Join Date: Jan 2016 Device: Android device with Kindle	Is there anyway to generate apnx file from this plugin without the send to kindle method. I'm trying to create apnx file for my android device. I used apnx generator plugin but I the page number is not accurate. So I'm searching for anyway to create apnx file from the goodreads page number. I don't have any kindle devices. I searched for a way to fake my android device as kindle device so that I can use the column number to generate apnx file but with no luck. I don't know why there is no way to generate apnx file for kindle application for android device when I choose send to device or in android device interface plugin setting.