MobileRead Forums - View Single Post

xxyzz · 09-17-2022, 09:41 AM

I have created two pull requests to enable wiktextract to parse non-English Wiktionary dump files:

The maintainers haven't give any response yet... With these changes, the Chinese Wiktionary dump file can be parsed. For other languages, some data files need to be created first(those JSON files in the "data" folder in each projects).

All tests passed for both projects, but a lot of work is needed to fully support each new language.

09-17-2022, 09:41 AM	#493
xxyzz Evangelist Posts: 442 Karma: 2666666 Join Date: Nov 2020 Device: none	I have created two pull requests to enable wiktextract to parse non-English Wiktionary dump files: https://github.com/tatuylonen/wikitextprocessor/pull/13 https://github.com/tatuylonen/wiktextract/pull/158 The maintainers haven't give any response yet... With these changes, the Chinese Wiktionary dump file can be parsed. For other languages, some data files need to be created first(those JSON files in the "data" folder in each projects). All tests passed for both projects, but a lot of work is needed to fully support each new language. Last edited by xxyzz; 09-17-2022 at 09:47 AM.