[LanguageTool]: Grammar check
Updated: August 8, 2023
Current Version: "0.4.6"
This plugin is a very simple
LanguageTool wrapper, which allows you to check the grammar of the currently loaded epub. It's a validation plugin that'll flag paragraphs with grammar errors.
It does not come with a GUI like the LibreOffice version.
For example, if you check the following sentence:
I Can Has Cheezburger?
you'll get the following validation panel message:
Quote:
...I Can >>Has<< Cheezburger? GRAMMAR:DID_BASEFORM: Grammatical problem: The verb 'Can' requires the base form of the verb: 'have' Suggestion(s): Have
|
(Note that
DID_BASEFORM is a LanguageTool rule and
GRAMMAR a LanguageTool category. You can use either value to enable/disable specific LanguageTool rules or categories.)
System requirements
Since LanguageTool is a Java tool, you'll need to
install Java.
It also requires Sigil
0.9.x. (Linux users will also need to install the
bs4 and
lxml Python libaries.)
You'll also need to download the latest
LanguageTool desktop version and unzip it. Remember the location of the folder that contains
languagetool-commandline.jar, because you'll later need to select that file.
Optionally, if you want to
use n-gram data sets for spellchecking, you'll need to download the huge n-gram data files (
En, De, ES, FR, NL,
HE, IT, RU, ZH) and define the path via
ngramIndexDir. (Using this option will significantly slow down LanguageTool!)
Installation
1. Select Manage Plugins from the Plugins menu. In the Manage Plugins dialog box, select
Use Bundled Python, if it isn't already selected.
2. Click Add Plugin and select
LanguageTool_v0.4.6.zip. This will install the plugin, which you can select via Plugins >
Validation > LanguageTool.
By default, the plugin will only check the currently selected file, to check all files click the Text folder or change "allFiles": false to "allFiles": true in the LanguageTool.json preference file.
Preferences
You can change the following settings via plugin preference settings. (All of these settings need to be terminated with a comma unless it's the last line in the .json file.)
A typical LanguageTool.json file looks like this:
Code:
{
"clipboard_copy": true,
"allFiles": false,
"update_check": false,
"ltPath": "C:/Program Files/LanguageTool-4.5/languagetool-commandline.jar",
"disabledRules": "MORFOLOGIK_RULE_EN_US,ENGLISH_WORD_REPEAT_BEGINNING_RULE,WHITESPACE_RULE,COMMA_PARENTHESIS_WHITESPACE",
"disabledCategories": "REDUNDANCY"
}
a) General preferences
clipboard_copy if enabled, messages will be copied to the clipboard; default:
false
allFiles if enabled, all files will be checked; default:
false
update_check if enabled, the plugin will check the LT website for updates; default:
true
ltPath path to
languagetool-commandline.jar
ngramIndexDir path to the n-gram directory
b) LanguageTool preferences (for more information on these parameters see
this website)
enabledRules
disabledRules
enabledCategories
disabledCategories
enabledOnly
In addition to these settings, the plugin will also look for a
user-rules.xml file in the plugin folder. (You can test this feature by renaming en_user-rules.xml to user-rules.xml. If you rename this file to user-rules.xml, LanguageTool will flag all
split infinitives in English epubs.)
For example, if you check the following sentence:
She used to secretly admire him.
LT will display the following message:
Quote:
...She used >>to secretly admire<< him... GRAMMAR:SPLIT_INFINITIVE Split infinitive: Don't split infinitives. Suggestion(s): to admire secretly
|
Special ngram check preference settings:
If you want to use ngrams for grammar checking, you'll need to add the following entry at the beginning of LanguageTool.json:
Code:
"ngramIndexDir": "C:/ngrams",
Note that
ngramIndexDir is the location of the
parent folder of the
en ngrams folder. For example, the folder structure on my machine is:
C:/ngrams/en/1grams
C:/ngrams/en/2grams
C:/ngrams/en/3grams
Obviously, the values in Magenta need to be changed to match the actual installation folders. Python also requires slashes (
/) or double back-slashes (
\\) for Windows folder names.
Note that n-gram checking is
very slow. However, it can detect some errors that rules can't detect. For example, if you check the following sentence:
Don't go their.
it'll report:
Quote:
...Don't go >>their<< ... TYPOS:CONFUSION_RULE Statistics suggests that 'there' (as in 'Is there an answer?') might be the correct word here, not 'their' (as in 'It’s not their fault.'). Please check. Suggestion(s): there
|
Troubleshooting:
Depending on the book type, you might get lots of false positives. You can filter them out via the
disabledRules and
disabledCategories settings.
Note that the plugin will use the language defined in the epub metadata section for
all files in the epub, regardless of
lang or
xml:lang attributes. If you define an unsupported language code, LanguageTool will fail.
Supported language codes are:
By default, LanguageTool will also do a spellcheck. However, since Sigil has a much more advanced spellchecker, this option should be disabled.
For US English it's already disabled by default via the MORFOLOGIK_RULE_EN_US rule. For other languages different rules will need to be added to the
disabledRules setting.
Update Check setting
By default, the plugin will connect to the LanguageTool Github web site to check for LanguageTool updates. If you don't want the plugin to connect to the Internet, change the following LanguageTool.json setting to
false.
Code:
"update_check": true
License: GNU General Public License v3 (GPL-3)