View Single Post
Old 01-25-2017, 08:14 AM   #83
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,738
Karma: 24031403
Join Date: Dec 2010
Device: Kindle PW2
@AnselmD: According to my tests with dialog-heavy German books, the number of contractions was on average significantly less than .005%. I.e., they're pretty much negligible and should be best handled with a custom word list.

BTW, the Sigil Python plugin interface has native Hunspell support. If you're a perfectionist, you could write an edit plugin that does the following:
  1. Get the text from all HTML files with bs4/sigil_bs4.
  2. Use a regex to find all words that contain straight or curly apostrophes.
  3. Split all matches into two words, check the first word against the dictionary and add the regex match to a custom word list, if the first word was found in the dictionary.
  4. Write the custom word list to the user_dictionaries folder.
For more information see the Sigil framework doc and the official Sigil test plugin.

In case you're wondering how to get the local user_dictionaries folder, you can find it with the following Python code:

Code:
#!/usr/bin/env python
import os

def run(bk):
    user_dictionary_path = os.path.join(os.path.dirname(bk._w.plugin_dir), 'user_dictionaries')
    print(user_dictionary_path)

    return 0
Doitsu is offline   Reply With Quote