Quote:
Originally Posted by Divingduck
Is it so simple?
What will you do with words like 3D printer in German language called 3-D-Drucker as one word? Count it as 3 words is definitely wrong for that language and this kind of exceptions happen a lot more. Guess, in other languages too. To cover this you will need a dictionary for each language and I am quite sure, you will not cover all exceptions as e.g. in German language there is no rule to prevent constructions with a "-" between words. This is often used for a better reading of long word constructions.
|
It isn't obvious from JSWolf's post, but the last one is actually an "en-dash", not a hyphen. I have no idea whether that should be considered a word delimiter or word joiner.
The method Kovid has mentioned for word counting accepts a locale. That should sort the differences out between the languages.