@AnselmD: According to my tests with dialog-heavy German books, the number of contractions was on average significantly less than .005%. I.e., they're pretty much negligible and should be best handled with a custom word list.
BTW, the Sigil Python plugin interface has native Hunspell support. If you're a perfectionist, you
could write an edit plugin that does the following:
- Get the text from all HTML files with bs4/sigil_bs4.
- Use a regex to find all words that contain straight or curly apostrophes.
- Split all matches into two words, check the first word against the dictionary and add the regex match to a custom word list, if the first word was found in the dictionary.
- Write the custom word list to the user_dictionaries folder.
For more information see the
Sigil framework doc and the official
Sigil test plugin.
In case you're wondering how to get the local
user_dictionaries folder, you can find it with the following Python code:
Code:
#!/usr/bin/env python
import os
def run(bk):
user_dictionary_path = os.path.join(os.path.dirname(bk._w.plugin_dir), 'user_dictionaries')
print(user_dictionary_path)
return 0