Quote:
Originally Posted by xxyzz
WorldLex's *CDPc values drop sharply after a few rows, most values are below one, these values looks like a "percentage" number. I not sure the meaning of *Freq and *FreqPm columns.
Google's Ngram has more words and is released more recently, but the frequency data needs to be computed from the "1-grams" files and the "Total counts for 1-grams" file. According to the Ngram viewer, the frequencies of google's data is also mostly below one.
Wiktionary also has many word frequency lists, they don't have frequency data though: https://en.wiktionary.org/wiki/Categ...ts_by_language
https://es.wiktionary.org/wiki/Wikci...de_frecuencias
I not sure which data source is better then the others. I'm planning to release a new version so this feature probably will be added in a future release.
|
Yes, it is quite difficult to guess the better way to face it. I only can support your idea and contribute when possible.
Thanks