Quote:
Originally Posted by 1v4n0
I often edit ebooks which feature many foreign words, whose (the words') language is not marked in the code [...] As things stand now, when correcting a long book (typically university textbooks on humanistic subjects), I find myself scrolling through a list of thousands of words, many of which are in some language other than the one the text is actually written in, and 99% of which are false positives.
|
Back in 2019, I wrote a rough breakdown of the method I currently use:
Post #11 in "Export list of words in spellcheck"
which also points to how I use (Calibre's) Spellcheck Lists + Regex:
Post #29 in "Is there a way to use the selection in a Saved Search?"
I've used that method successfully on journal articles + text from game files (millions of words).
For one game, I even hackishly assigned each character different langs, then used Calibre to give me a breakdown of all words spoken per character. This allowed me to normalize the translation. (For example, one character always said "dinnae" instead of "didn't". The word list method made sure to catch any strays.

)
For games, it also allowed me to easily catch any made-up fantasy words very easily, since they didn't appear in either the US or UK dictionaries.