MobileRead Forums - View Single Post

elchamaco · 07-09-2019, 12:23 PM

Quote:

Originally Posted by Tex2002ans

Yep, tab-delimited is usually my favorite. Commas are just too common, and make manually reading the file in a text editor a chore.

Whenever exporting CSVs into LibreOffice Calc, a nice window pops up giving you lots of import options.

You can also use the Spellcheck Lists in non-standard ways. Like in this thread, I explained how to use it to find a list of "foreign-language" words:

https://www.mobileread.com/forums/sh...59#post3812859

and go marking them up with xml:lang.

I've also done something similar when trying to normalize a collection of various articles between American/British spellings. You could:

Mark ebook as English (US).
Export CSV of "misspelled words".
Mark ebook as English (UK).
Export CSV of "misspelled words".

Compare both CSVs together, look at differences, and you can see:

Words that appear in one list are almost all the differently spelled words.
- color <-> colour
Words that appear in both lists are almost all the actual misspelled/foreign words.
- forign" + "sofritos"

Yes you can do a lot of stuff, i want to use it to upgrade dictionaries with misssing words. But not only hunspell... stardict/mobi dictionaries. I'll create a hunspell dictionary from stardict, and find missing words in different books to improve the main dictionary, main definitions and inflected forms.

Probably the best choice will be to create a script that checks all the words from a epub book against a hunspell dictionary and export the missing words, but a to begin the manual method can work.