View Single Post
Old 08-15-2022, 11:29 AM   #437
Shark69
Zealot
Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.
 
Shark69's Avatar
 
Posts: 136
Karma: 493152
Join Date: Mar 2012
Location: Spain
Device: Kindle Oasis 2
Quote:
Originally Posted by xxyzz View Post
Lately I was thinking maybe we can use the same method in that prevalence paper(http://btjohns.com/pubs/JDJ_QJEP_2020.pdf) to calculate data for other languages. I know for sure there are some ebook hoarders in the forum, so maybe gathering data from tens of thousands books is possible.

I already know how to get text from ebook files, then I need to figure out how to calculate the SD-AP value, and finally convert the result data to difficulty value(this step is also already completed). The plugin can use this data to only enable hard words in Wiktionary.

Clearly I need help for this project, especially for reproducing that paper and data gathering. Maybe we can create a command line tool to gather data from book files and save the data to a sqlite file so anyone can contribute with their books.

I don't know if anyone is interested in this, any suggestions?

Or even better, find existing similar word prevalence data for non-English languages...

I already find some word frequency data for non-English languages, using these data is way more easier.
I'm interested in the project. I'd would like to help you I am interested in the project. I would like to help you to the best of my ability.
Shark69 is offline   Reply With Quote