MobileRead Forums - View Single Post - Could calibre convert support zh-hant variable?

halts · 11-03-2024, 06:20 AM

Following up on the previous discussion "[When Calibre convert, the input language is zh-tw, but the output language become zh]"

I've noticed an ongoing issue with Traditional Chinese (zh-hant) language handling in Calibre's conversion process. Currently, while Calibre can process Traditional Chinese content, there seems to be inconsistent handling of the language tag specifications.

Current Behavior:

Input files with language tag zh-tw,zh-hk,zh-hant... are being converted
code: <dc:language>zh-tw</dc:language>
After input, the language tag is simplified to just zh
code: <dc:language>zh</dc:language>

This affects metadata consistency and could impact text processing

Why this matters:

Traditional Chinese (zh-hant) is distinctly different from Simplified Chinese (zh-hans)

Requested Enhancement:
Could we add support for maintaining the zh-hant language variable throughout the conversion process? This would:

Better align with international language standards
Improve accuracy in language identification
Help maintain consistent metadata
Support users working with Traditional Chinese content

Technical Note:
The zh-hant tag is the standard BCP 47 language tag for Traditional Chinese, which would be more precise than the region-specific zh-tw.
Has anyone else encountered this issue? Any thoughts on implementing this enhancement?
Thank you for considering this request.

11-03-2024, 06:20 AM	#1
halts Junior Member Posts: 5 Karma: 10 Join Date: Nov 2024 Device: Kindle Oasis 3	Could calibre convert support zh-hant variable? Following up on the previous discussion "[When Calibre convert, the input language is zh-tw, but the output language become zh]" I've noticed an ongoing issue with Traditional Chinese (zh-hant) language handling in Calibre's conversion process. Currently, while Calibre can process Traditional Chinese content, there seems to be inconsistent handling of the language tag specifications. Current Behavior: Input files with language tag zh-tw,zh-hk,zh-hant... are being converted code: <dc:language>zh-tw</dc:language> After input, the language tag is simplified to just zh code: <dc:language>zh</dc:language> This affects metadata consistency and could impact text processing Why this matters: Traditional Chinese (zh-hant) is distinctly different from Simplified Chinese (zh-hans) Requested Enhancement: Could we add support for maintaining the zh-hant language variable throughout the conversion process? This would: Better align with international language standards Improve accuracy in language identification Help maintain consistent metadata Support users working with Traditional Chinese content Technical Note: The zh-hant tag is the standard BCP 47 language tag for Traditional Chinese, which would be more precise than the region-specific zh-tw. Has anyone else encountered this issue? Any thoughts on implementing this enhancement? Thank you for considering this request.