Quote:
Originally Posted by Shark69
I'm interested in the project. I'd would like to help you I am interested in the project. I would like to help you to the best of my ability.
|
I think we should use currently available data since there are already many researchers working on this topic. I find some useful data:
Maybe we can calculate the word occurrence frequency from Google's data for languages that didn't filtered with a spellchecker in Wordlex and only enable words that have frequency lower than a threshold.
Which datasets do you think is more suitable for disabling easy words in Wiktionary? Or maybe you find some better datasets please let me know, because I think word frequency is not very accuracy compared to other metrics.