MobileRead Forums - View Single Post

xxyzz · 08-15-2022, 10:45 PM

Quote:

Originally Posted by Shark69

I'm interested in the project. I'd would like to help you I am interested in the project. I would like to help you to the best of my ability.

I think we should use currently available data since there are already many researchers working on this topic. I find some useful data:

Maybe we can calculate the word occurrence frequency from Google's data for languages that didn't filtered with a spellchecker in Wordlex and only enable words that have frequency lower than a threshold.

Which datasets do you think is more suitable for disabling easy words in Wiktionary? Or maybe you find some better datasets please let me know, because I think word frequency is not very accuracy compared to other metrics.