12-07-2019, 12:31 PM | #1 |
Junior Member
Posts: 1
Karma: 10
Join Date: Dec 2019
Device: Kindle Oasis
|
More info regarding the zh-cn / zh-tw differences for AZW3 output
Hello Calibre developer,
This topic is a continuation of the older post: When Calibre convert, the input language is zh-tw, but the output language become zh. I find myself not able to reply to that thread, therefore open a new thread. Just in case it's not a good manner here, please let me know. 1. I am willing to help write code and debug I understand Calibre is an open source project, and it is not an obligation for any developer to solve any problem. Therefore, I am willing to help write code and debug. But currently I have no knowledge of AZW3 internal format and the architecture of Calibre, so I post here to gather information and seek help. 2. The reading difference between zh-cn and zh-tw The main difference is that the Kindle operating system provides different fonts for them. For zh-cn, they are: 宋体, 黑体, 楷体, 圆体. For zh-tw, they are: 宋體, 黑體, 楷體, 圓體. (Notice the slight difference in the names) It seems that Kindle not yet supports zh-{hk,mo,sg,my}. But zh-{hk,mo} is similar to zh-tw, and zh-{sg,my} is similar to zh-cn. 3. Why font matters? To save Unicode encode space, the Unicode consortium merges CJKV characters from different country or territory into same Unicode representation. This caused a result that the reader must choose the correct font, otherwise character shapes from mixed country or territory will appear in-mid of a paragraph. Most shared characters have similar shapes so the reader can guess, but roughly less than 1% of the characters are unintelligible because the shapes are not similar. You can learn the Unicode same-codepoint-different-shape problem from this picture on Wikipedia. 4. Possible values for zh-cn and zh-tw From previous posts, I know that it is not clear which XML value does Kindle recognize as zh-cn and zh-tw. I think they might be one of the following: Code:
zho-cn / zho-tw zho-hans / zho-hant zho-sim / zho-trad (or maybe zho-tra) zho_CN / zho_TW ... 4. Possible fallback method? In case none of them work, maybe it would be possible to add an Code:
<html lang="zh-tw"> Thank you. |
12-07-2019, 05:22 PM | #2 | |
null operator (he/him)
Posts: 20,550
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
It's a warning to deter piggy back posts to old threads - but it shouldn't prevent new posts, especially from the original poster. Let me know, if you want this thread to be merged with the old one. BR |
|
12-07-2019, 07:58 PM | #3 |
creator of calibre
Posts: 43,840
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If you wish to contribute code, feel free to do so. The azw3 output plugin in in the writer8 folder. Search for lang in that folder. As far as I know the azw3 format has no support for anything other than ISO 639-1 lang codes, but if you have a azw3 file that does specify country code, you will have to use a hex editor to check how the country code is stored in the header and implement it in the azw3 output plugin. There is a description of the header fields of MOBI/AZW3 files in the mobileread wiki.
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[Conversion Output] KePub Output Plugin | jgoguen | Plugins | 551 | 07-18-2023 06:22 AM |
Images from AZW3 Lonely Planet books are downsampled in ePub output | wealthychef | Conversion | 3 | 05-27-2018 10:34 AM |
Setting default output to azw3 instead of mobi | gweminence | Calibre | 3 | 06-17-2013 02:18 AM |
catalogue builder output columns are not in the same order in the output | KWhytte | Library Management | 5 | 12-04-2012 02:03 AM |
Mobi output: how to suppress Calibre version info in Creator metadata | Doitsu | Calibre | 1 | 10-20-2011 04:14 AM |