Okay, just as a proof of concept, I have taken Doitsu's foreign word plugin (Thank you Doitsu!) and written an HTMLLangTextParser class (htmllangtextparser.py) (based on my quickparser - thank you varlog!) and created a text word parser (textwordparser.py) that tokenizes text into words using the current language dictionaries WORDCHARS, just like Sigil does now) to create a SpellMLDemo validation plugin.
NOTE: THIS PLUGIN ONLY WORKS FOR BUILDS FROM SIGIL MASTER AS OF TODAY
If this seems to work, then I will use it as a model for some of the C++ code inside Sigil itself but replace the HTMLLangTextParser with something based on GumboParser's Node tree (ie. a DOM) based real html parser approach which will be more robust to parsing errors and well-formed errors. I will extract the text parsing code which uses dictionary wordchars into its own class, and change the current SpellCheck.cpp class to have multiple dictionaries open at the same time.
I have attached it in case anyone is interested in testing it or simply looking at the code.
KevinH
ps: I added a new plugin hunspell interface that handles multiple dictionaries
in a much smarter way. See the attached pluginhunspellml.py if you are interested in plugin based spellchecking.
Note, pluginhunspellml.py will be an additional plugin interface for hunspell. It will augment, not replace the current pluginhunspell.py so that full backwards compatibility is maintained for plugins that use the now older plugin interface.
Last edited by KevinH; 08-05-2020 at 03:26 PM.
Reason: added a new plugin hunspell interface that handles multiple languages in a much better fashion
|