Following up on the previous discussion
"[When Calibre convert, the input language is zh-tw, but the output language become zh]"
I've noticed an ongoing issue with Traditional Chinese (zh-hant) language handling in Calibre's conversion process. Currently, while Calibre can process Traditional Chinese content, there seems to be inconsistent handling of the language tag specifications.
Current Behavior:
Input files with language tag zh-tw,zh-hk,zh-hant... are being converted
code: <dc:language>zh-tw</dc:language>
After input, the language tag is simplified to just zh
code: <dc:language>zh</dc:language>
This affects metadata consistency and could impact text processing
Why this matters:
Traditional Chinese (zh-hant) is distinctly different from Simplified Chinese (zh-hans)
Requested Enhancement:
Could we add support for maintaining the zh-hant language variable throughout the conversion process? This would:
- Better align with international language standards
- Improve accuracy in language identification
- Help maintain consistent metadata
- Support users working with Traditional Chinese content
Technical Note:
The zh-hant tag is the standard BCP 47 language tag for Traditional Chinese, which would be more precise than the region-specific zh-tw.
Has anyone else encountered this issue? Any thoughts on implementing this enhancement?
Thank you for considering this request.