Hello Calibre developer,
This topic is a continuation of the older post:
When Calibre convert, the input language is zh-tw, but the output language become zh. I find myself not able to reply to that thread, therefore open a new thread. Just in case it's not a good manner here, please let me know.
1. I am willing to help write code and debug
I understand Calibre is an open source project, and it is not an obligation for any developer to solve any problem. Therefore, I am willing to help write code and debug. But currently I have no knowledge of AZW3 internal format and the architecture of Calibre, so I post here to gather information and seek help.
2. The reading difference between zh-cn and zh-tw
The main difference is that the Kindle operating system provides different fonts for them.
For zh-cn, they are: 宋体, 黑体, 楷体, 圆体.
For zh-tw, they are: 宋體, 黑體, 楷體, 圓體.
(Notice the slight difference in the names)
It seems that Kindle not yet supports zh-{hk,mo,sg,my}. But zh-{hk,mo} is similar to zh-tw, and zh-{sg,my} is similar to zh-cn.
3. Why font matters?
To save Unicode encode space, the Unicode consortium merges CJKV characters from different country or territory into same Unicode representation. This caused a result that the reader must choose the correct font, otherwise character shapes from mixed country or territory will appear in-mid of a paragraph. Most shared characters have similar shapes so the reader can guess, but roughly less than 1% of the characters are unintelligible because the shapes are not similar.
You can learn the Unicode same-codepoint-different-shape problem from
this picture on Wikipedia.
4. Possible values for zh-cn and zh-tw
From previous posts, I know that it is not clear which XML value does Kindle recognize as zh-cn and zh-tw. I think they might be one of the following:
Code:
zho-cn / zho-tw
zho-hans / zho-hant
zho-sim / zho-trad (or maybe zho-tra)
zho_CN / zho_TW
...
This way we can narrow down the search so the amount of work may be less, ... probably.
4. Possible fallback method?
In case none of them work, maybe it would be possible to add an
Code:
<html lang="zh-tw">
attribute to force the Kindle to use the correct font if Kindle uses an HTML render that understands this.
Thank you.