Quote:
Originally Posted by Tex2002ans
Yep, tab-delimited is usually my favorite. Commas are just too common, and make manually reading the file in a text editor a chore.
Whenever exporting CSVs into LibreOffice Calc, a nice window pops up giving you lots of import options.
You can also use the Spellcheck Lists in non-standard ways. Like in this thread, I explained how to use it to find a list of "foreign-language" words:
https://www.mobileread.com/forums/sh...59#post3812859
and go marking them up with xml:lang.
I've also done something similar when trying to normalize a collection of various articles between American/British spellings. You could:
- Mark ebook as English (US).
- Export CSV of "misspelled words".
- Mark ebook as English (UK).
- Export CSV of "misspelled words".
Compare both CSVs together, look at differences, and you can see:
- Words that appear in one list are almost all the differently spelled words.
- Words that appear in both lists are almost all the actual misspelled/foreign words.
|
Yes you can do a lot of stuff, i want to use it to upgrade dictionaries with misssing words. But not only hunspell... stardict/mobi dictionaries. I'll create a hunspell dictionary from stardict, and find missing words in different books to improve the main dictionary, main definitions and inflected forms.
Probably the best choice will be to create a script that checks all the words from a epub book against a hunspell dictionary and export the missing words, but a to begin the manual method can work.