Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 01-11-2016, 06:05 AM   #856
red_dragon
Daywalker
red_dragon is on a distinguished road
 
Posts: 29
Karma: 52
Join Date: Jul 2008
Device: Kindle Paperwhite
Btw, for the German language the formula to calculate the Flesh Reading Ease is different. I have created my own copy of the plugin as I have never figured out how to use the book language to automatically switch to the new algorithm.

As the plugin is still developed actively, maybe the following change can be integrated:

# German Flesh Reading Ease
score = 180 - text_analysis['averageWordsPerSentence'] - (58.5 * (text_analysis['syllableCount']/ text_analysis['wordCount']))
red_dragon is offline   Reply With Quote
Old 01-12-2016, 12:06 PM   #857
Namenlos
Enthusiast
Namenlos began at the beginning.
 
Posts: 37
Karma: 10
Join Date: Jul 2014
Device: Kobo Mini
There is another (well four …) metric for German text, the "Wiener Sachtextformel"

Translation for the formula:

MS percentage of words with three or more syllables
SL average words in a sentence
IW percentage of words with six or more letters
ES percentage of words with one syllable

There are also other metrics for English that can be found in a NLTK based implementation on github.
Namenlos is offline   Reply With Quote
Old 01-12-2016, 05:52 PM   #858
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Adding a language specific version of the statistics is easy. I just needed to add the calculations and decide which to use based on the language in the book.

But, adding more statistics, is a lot harder. The plugin has a cut-down version of the NLTK library. From the notes in the plugin, it probably only has the English statistics. So, that would have to change to the full version. Then the configuration would have to be changed. And are these extra stats to calculate, or alternatives to use in place of the three English stats? At the moment, I'm not interested in going through this. If someone is, then I'll happily help.
davidfor is offline   Reply With Quote
Old 01-15-2016, 02:45 AM   #859
red_dragon
Daywalker
red_dragon is on a distinguished road
 
Posts: 29
Karma: 52
Join Date: Jul 2008
Device: Kindle Paperwhite
Hello davidfor,

I'd appreciate if you could send me the code how to add a language specific version, or just post it here. Thanks!

In my private version I have tried out the "german.pickle" from the nltk package (modified to work with the plugin) but the difference was <1%. I don't care much about a higher accuracy, e.g. whether the reading ease is 75.5 or 76.7. If it's easy to select the correct pickle file on the fly, well, then it makes sense to use that one.

Regarding the "Wiener Sachtextformel", I am using the 4th variant which is calculated like this:

score = (0.2656 * text_analysis['averageWordsPerSentence']) + (0.2744 * (text_analysis['complexwordCount'] * 100 / text_analysis['wordCount']) ) -1.693

It can replace the "Gunning-Fog-Index" (as "years of education"), which doesn't work for German books anyway.
red_dragon is offline   Reply With Quote
Old 01-15-2016, 08:51 AM   #860
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
OK, here is a beta that has the German version of the Flesh Reading Ease statistic. The code for getting the language is on jobs.py at line 167.

Code:
from calibre.utils.localization import get_lang
lang = iterator.opf.language
lang = get_lang() if not lang else lang
In this, lang is set to a three character language code such as "eng" or "deu". "get_lang()" gets the current interface language. This then passed to the statistic method which chooses the appropriate calculation. The pickle file is loaded and passed to the job, so choosing at the book level is a problem. There is a comment in the pickle load code about why this is done. Choosing based on the interface language, would be simple.

And thinking about the extra statistics, I have thought of a way that might work to handle this.

At the moment, there are five statistics: words, pages, Flesch Reading Ease, Flesch-Kincaid Grade and Gunning Fox Index. These are fixed and the options around them are where to store the results. My thought is to add the extra statistics, but make them selectable from a list. The word and page count would be kept as they are. For the others, have pairs of drop-down lists. The first of each pair lists the statistics. The second the column to store it in. With that, exactly which statistic used from the full set would be up to the user. I would probably limit this to three stats, but, with a little thought, it could be extended to as many as needed.

I haven't looked enough at the NLTK code to see how easy it would be to replace the version in the plugin with a more complete version. For the simpler statistics that use calculations similar to those already in place, adding them in this way should be practical.
Attached Files
File Type: zip Count Pages-beta.zip (245.0 KB, 157 views)

Last edited by davidfor; 01-15-2016 at 08:52 AM. Reason: Yet again, I forgot to attach the file.
davidfor is offline   Reply With Quote
Old 01-20-2016, 05:55 AM   #861
red_dragon
Daywalker
red_dragon is on a distinguished road
 
Posts: 29
Karma: 52
Join Date: Jul 2008
Device: Kindle Paperwhite
I am not sure if the interface language helps much. My library is mixed with books in English, German and some French.

How can the language be retrieved from a book? That would be the preferred way to do it.
red_dragon is offline   Reply With Quote
Old 01-20-2016, 06:30 AM   #862
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,572
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by red_dragon View Post
I am not sure if the interface language helps much. My library is mixed with books in English, German and some French.

How can the language be retrieved from a book? That would be the preferred way to do it.
Language is a standard metadata column in calibre, its also a Dublin Core element. I suggest you show the column in the book list - it's normally hidden I think. If you have them then . . .

BR
BetterRed is offline   Reply With Quote
Old 01-20-2016, 06:31 AM   #863
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by red_dragon View Post
I am not sure if the interface language helps much. My library is mixed with books in English, German and some French.

How can the language be retrieved from a book? That would be the preferred way to do it.
The code I am using attempts to get the language from the book. If it can't, then it uses the interface language. If we extend the statistics, there is a problem as an extra language specific file is used. At the moment, this is loaded early before the individual book languages are known, and that could be problem.
davidfor is offline   Reply With Quote
Old 01-20-2016, 11:55 AM   #864
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by davidfor View Post
The code I am using attempts to get the language from the book. If it can't, then it uses the interface language. If we extend the statistics, there is a problem as an extra language specific file is used. At the moment, this is loaded early before the individual book languages are known, and that could be problem.
It appears that "from the book" means from the OPF inside the book. If my assumption is right and if you have access to metadata.db then you might also check the language stored for the book in the db before falling back to the interface language.
chaley is offline   Reply With Quote
Old 01-20-2016, 01:02 PM   #865
rpgmaker
Connoisseur
rpgmaker began at the beginning.
 
Posts: 85
Karma: 10
Join Date: Oct 2014
Device: Kindle Paperwhite 2
Quote:
Originally Posted by davidfor View Post
The code I am using attempts to get the language from the book. If it can't, then it uses the interface language. If we extend the statistics, there is a problem as an extra language specific file is used. At the moment, this is loaded early before the individual book languages are known, and that could be problem.
Does every language has a custom implementation of this word counting method? If not I think this should be a deal breaker for the update that is being entertained.
rpgmaker is offline   Reply With Quote
Old 01-20-2016, 01:11 PM   #866
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,161
Karma: 1404241
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Quote:
Originally Posted by chaley View Post
It appears that "from the book" means from the OPF inside the book. If my assumption is right and if you have access to metadata.db then you might also check the language stored for the book in the db before falling back to the interface language.
+1
I like this too
Divingduck is offline   Reply With Quote
Old 01-20-2016, 06:07 PM   #867
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by chaley View Post
It appears that "from the book" means from the OPF inside the book. If my assumption is right and if you have access to metadata.db then you might also check the language stored for the book in the db before falling back to the interface language.
Yes, the OPF in the book. And getting it form the database was in the back of my mind when I was writing the post, but this is happening inside a job, so I didn't think I have access to the database. It could be part of the data collected before starting the job. And that probably is a good idea as there is no guarantee that the copy of the book in the library has been updated with the latest metadata.
davidfor is offline   Reply With Quote
Old 01-20-2016, 07:35 PM   #868
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by rpgmaker View Post
Does every language has a custom implementation of this word counting method? If not I think this should be a deal breaker for the update that is being entertained.
The change to the word count is to use a different algorithm already implemented in calibre. This algorithm, and I assume most of the code, come from elsewhere. This accepts the language as a parameter. I have not looked closely enough at this to know if there is an implementation for every language, but I doubt it. What is likely is that the major languages have implementations and the rest fall back to a common method.

The method currently used for the word count is the same for all languages. It is a fairly simple method but is not to inaccurate. At least for English. I don't know about the other languages which is why I asked. If it wasn't for the other languages, I wouldn't bother about this as for nearly all uses, the count we have is close enough. But, I think the other languages should be treated properly.

The other thing that has been discussed is the other statistics. These are English specific stats. The first mention was because someone had the German calculation for one of them. Having that used automatically was easy and sensible. But the rest of the stats are more of a problem. And there are other stats that make more sense for other languages. Adding a way to calculate them is a lot more complex.

My comments about the other stats are really just me thinking out loud. At this point, I have no plan to implement them. There are other things I would prefer to do. Maybe in the future, I might get bored and return to it. Or maybe someone will see my comments and decide to do it. If someone does, I'll be very happy to help with suggestions, testing and other help.

My plan for Count Pages is to release the changes as is (different word count algorithm, German version of one of the other stats) plus one other change. The other change is something someone else has done and is about making the plugin work better when called from other plugins.
davidfor is offline   Reply With Quote
Old 01-21-2016, 04:32 AM   #869
red_dragon
Daywalker
red_dragon is on a distinguished road
 
Posts: 29
Karma: 52
Join Date: Jul 2008
Device: Kindle Paperwhite
Quote:
Originally Posted by davidfor View Post
The code I am using attempts to get the language from the book.
Somehow I missed this. The beta works fine! Having one column for the reading ease instead of two is much better.
red_dragon is offline   Reply With Quote
Old 01-24-2016, 04:42 PM   #870
ratanparai
Junior Member
ratanparai began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jan 2016
Device: Android device with Kindle
Is there anyway to generate apnx file from this plugin without the send to kindle method. I'm trying to create apnx file for my android device. I used apnx generator plugin but I the page number is not accurate. So I'm searching for anyway to create apnx file from the goodreads page number.

I don't have any kindle devices. I searched for a way to fake my android device as kindle device so that I can use the column number to generate apnx file but with no luck.

I don't know why there is no way to generate apnx file for kindle application for android device when I choose send to device or in android device interface plugin setting.
ratanparai is offline   Reply With Quote
Reply

Tags
count, count pages, page count, pages, plugin


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] Quality Check kiwidude Plugins 1184 04-17-2024 06:17 PM
[GUI Plugin] Open With kiwidude Plugins 403 04-01-2024 08:39 AM
[GUI Plugin] Quick Preferences kiwidude Plugins 62 03-16-2024 11:47 PM
[GUI Plugin] Kindle Collections (old) meme Plugins 2070 08-11-2014 12:02 AM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM


All times are GMT -4. The time now is 06:04 PM.


MobileRead.com is a privately owned, operated and funded community.