View Single Post
Old 07-07-2016, 12:02 PM   #1
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
[LanguageTool]: Grammar check

[LanguageTool]: Grammar check

Updated: August 8, 2023
Current Version: "0.4.6"

This plugin is a very simple LanguageTool wrapper, which allows you to check the grammar of the currently loaded epub. It's a validation plugin that'll flag paragraphs with grammar errors.

It does not come with a GUI like the LibreOffice version.

For example, if you check the following sentence:

I Can Has Cheezburger?

you'll get the following validation panel message:

Quote:
...I Can >>Has<< Cheezburger? GRAMMAR:DID_BASEFORM: Grammatical problem: The verb 'Can' requires the base form of the verb: 'have' Suggestion(s): Have
(Note that DID_BASEFORM is a LanguageTool rule and GRAMMAR a LanguageTool category. You can use either value to enable/disable specific LanguageTool rules or categories.)

System requirements

Since LanguageTool is a Java tool, you'll need to install Java.
It also requires Sigil 0.9.x. (Linux users will also need to install the bs4 and lxml Python libaries.)
You'll also need to download the latest LanguageTool desktop version and unzip it. Remember the location of the folder that contains languagetool-commandline.jar, because you'll later need to select that file.

Optionally, if you want to use n-gram data sets for spellchecking, you'll need to download the huge n-gram data files (En, De, ES, FR, NL, HE, IT, RU, ZH) and define the path via ngramIndexDir. (Using this option will significantly slow down LanguageTool!)

Installation

1. Select Manage Plugins from the Plugins menu. In the Manage Plugins dialog box, select Use Bundled Python, if it isn't already selected.
2. Click Add Plugin and select LanguageTool_v0.4.6.zip. This will install the plugin, which you can select via Plugins > Validation > LanguageTool.

By default, the plugin will only check the currently selected file, to check all files click the Text folder or change "allFiles": false to "allFiles": true in the LanguageTool.json preference file.

Preferences

You can change the following settings via plugin preference settings. (All of these settings need to be terminated with a comma unless it's the last line in the .json file.)

A typical LanguageTool.json file looks like this:

Code:
{
  "clipboard_copy": true,
  "allFiles": false,
  "update_check": false,
  "ltPath": "C:/Program Files/LanguageTool-4.5/languagetool-commandline.jar", 
  "disabledRules": "MORFOLOGIK_RULE_EN_US,ENGLISH_WORD_REPEAT_BEGINNING_RULE,WHITESPACE_RULE,COMMA_PARENTHESIS_WHITESPACE", 
  "disabledCategories": "REDUNDANCY" 
}
a) General preferences

clipboard_copy if enabled, messages will be copied to the clipboard; default: false
allFiles if enabled, all files will be checked; default: false
update_check if enabled, the plugin will check the LT website for updates; default: true
ltPath path to languagetool-commandline.jar
ngramIndexDir path to the n-gram directory

b) LanguageTool preferences (for more information on these parameters see this website)

enabledRules
disabledRules
enabledCategories
disabledCategories
enabledOnly

In addition to these settings, the plugin will also look for a user-rules.xml file in the plugin folder. (You can test this feature by renaming en_user-rules.xml to user-rules.xml. If you rename this file to user-rules.xml, LanguageTool will flag all split infinitives in English epubs.)

For example, if you check the following sentence:

She used to secretly admire him.

LT will display the following message:

Quote:
...She used >>to secretly admire<< him... GRAMMAR:SPLIT_INFINITIVE Split infinitive: Don't split infinitives. Suggestion(s): to admire secretly
Special ngram check preference settings:

If you want to use ngrams for grammar checking, you'll need to add the following entry at the beginning of LanguageTool.json:

Code:
  "ngramIndexDir": "C:/ngrams",
Note that ngramIndexDir is the location of the parent folder of the en ngrams folder. For example, the folder structure on my machine is:

C:/ngrams/en/1grams
C:/ngrams/en/2grams
C:/ngrams/en/3grams

Obviously, the values in Magenta need to be changed to match the actual installation folders. Python also requires slashes (/) or double back-slashes (\\) for Windows folder names.

Note that n-gram checking is very slow. However, it can detect some errors that rules can't detect. For example, if you check the following sentence:

Don't go their.

it'll report:

Quote:
...Don't go >>their<< ... TYPOS:CONFUSION_RULE Statistics suggests that 'there' (as in 'Is there an answer?') might be the correct word here, not 'their' (as in 'It’s not their fault.'). Please check. Suggestion(s): there
Troubleshooting:

Depending on the book type, you might get lots of false positives. You can filter them out via the disabledRules and disabledCategories settings.

Note that the plugin will use the language defined in the epub metadata section for all files in the epub, regardless of lang or xml:lang attributes. If you define an unsupported language code, LanguageTool will fail.

Supported language codes are:
Spoiler:
ast-ES Asturian
be-BY Belarusian
br-FR Breton
ca-ES Catalan
ca-ES-valencia Catalan (Valencian)
da-DK Danish
de German
de-AT German (Austria)
de-CH German (Swiss)
de-DE German (Germany)
de-DE-x-simple-language Simple German
el-GR Greek
en English
en-AU English (Australian)
en-CA English (Canadian)
en-GB English (GB)
en-NZ English (New Zealand)
en-US English (US)
en-ZA English (South African)
eo Esperanto
es Spanish
fa Persian
fr French
gl-ES Galician
is-IS Icelandic
it Italian
ja-JP Japanese
km-KH Khmer
lt-LT Lithuanian
ml-IN Malayalam
nl Dutch
pl-PL Polish
pt Portuguese
pt-BR Portuguese (Brazil)
pt-PT Portuguese (Portugal)
ro-RO Romanian
ru-RU Russian
sk-SK Slovak
sl-SI Slovenian
sv Swedish
ta-IN Tamil
tl-PH Tagalog
uk-UA Ukrainian
zh-CN Chinese


By default, LanguageTool will also do a spellcheck. However, since Sigil has a much more advanced spellchecker, this option should be disabled.
For US English it's already disabled by default via the MORFOLOGIK_RULE_EN_US rule. For other languages different rules will need to be added to the disabledRules setting.

Update Check setting

By default, the plugin will connect to the LanguageTool Github web site to check for LanguageTool updates. If you don't want the plugin to connect to the Internet, change the following LanguageTool.json setting to false.

Code:
  "update_check": true
License: GNU General Public License v3 (GPL-3)
Attached Files
File Type: zip LanguageTool_v0.4.6.zip (14.4 KB, 271 views)

Last edited by Doitsu; 08-05-2023 at 03:58 AM. Reason: Updated for Qt 6.5.2 and Python 3.11.3
Doitsu is offline   Reply With Quote