View Single Post
Old 07-09-2016, 06:27 AM   #20
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,746
Karma: 24032915
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by Tex2002ans View Post
It felt quite rough:
It is indeed a bit rough, but it was the best I could do with my very limited Python skills.
BTW, I found a Windows bug related to the ngram spellcheck feature that required a minor update. If you want to experiment with ngrams, you'll need to install the latest version.

As for your questions:

Quote:
Originally Posted by Tex2002ans View Post
Is there any possible way for it to highlight the exact position in the text?
Only if I hard-coded some kind of highlight style, that you'd have to remove from the many false positives.

This feature might be easier to implement in Calibre, because it's based on Python.
Maybe Kovid Goyal will implement it, if you ask him nicely.

I'll also ask KevinH, whether he could add some kind of Python-accessible highlight function, but since that would probably require a lot of work and not that many people are interested in this plugin, it's not very likely to happen.

Quote:
Originally Posted by Tex2002ans View Post
Is there a way to split the messages into more columns?
Unfortunately, the software module used for validation messages doesn't support multi-line text.

Quote:
Originally Posted by Tex2002ans View Post
Is there any possible way for it to run on the entire EPUB at once? Or am I just crazy? (Or didn't read your instructions properly).
Actually, my instructions were a bit unclear on that. By default the plugin will only check the currently selected file. If you want to check all files, either select all files or none (e.g., select the Text folder). You can also force the plugin to always check all files by changing the following value in LanguageTool.json.

Code:
"allFiles": true
(If it's not the last entry, you'll also need to add a comma at the end.)

Quote:
Originally Posted by Tex2002ans View Post
Any stats/thoughts on adding the n-gram data?
It really slows LanguageTool down, but it did find some problems. It all depends on the texts that you want to check.

Quote:
Originally Posted by Tex2002ans View Post
How many more false positives might I have to sift through, or does it do a pretty good job?
It reports fewer false positives than the regular grammar check. I usually use it after the regular grammar check with a special LanguageTool.json file:

Code:
{
  "enabledOnly": true,
  "enabledRules": "CONFUSION_RULE",  
  "ngramIndexDir": "C:/ngrams",
  "ltPath": "C:/Program Files/LanguageTool-3.3/languagetool-commandline.jar", 
  "allFiles": true
}
With these settings LanguageTool will only run the ngram spellcheck. It's still rather slow.

If you want to experiment with the ngram spellcheak feature, you'll need to create a folder with an en subfolder in it and extract the ngram data files to that en folder. For example, on my machine the ngram files are in C:\ngrams\en (e.g. C:\ngrams\en\1grams).
As far as LanguageTool is concerned, ngrams is the ngram folder that you'll need to specify via ngramIndexDir.
Note also that you'll need to replace backslashes in folder names with slashes or write the backslash twice.
For example:

Code:
  "ngramIndexDir": "C:/ngrams",
or

Code:
    "ngramIndexDir": "C:\\ngrams",
BTW, the ngram spellcheck didn't flag "it original usefulness", but this could be easily added as a custom rule.

Last edited by Doitsu; 07-12-2016 at 07:07 AM. Reason: New version attached
Doitsu is offline   Reply With Quote