View Single Post
Old 08-15-2022, 10:45 PM   #438
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 443
Karma: 3000000
Join Date: Nov 2020
Device: none
Quote:
Originally Posted by Shark69 View Post
I'm interested in the project. I'd would like to help you I am interested in the project. I would like to help you to the best of my ability.
I think we should use currently available data since there are already many researchers working on this topic. I find some useful data:
Maybe we can calculate the word occurrence frequency from Google's data for languages that didn't filtered with a spellchecker in Wordlex and only enable words that have frequency lower than a threshold.

Which datasets do you think is more suitable for disabling easy words in Wiktionary? Or maybe you find some better datasets please let me know, because I think word frequency is not very accuracy compared to other metrics.

Last edited by xxyzz; 08-15-2022 at 10:49 PM.
xxyzz is offline   Reply With Quote