![]() |
Should Chinese Fonts be Embedded in Ebooks?
8 Attachment(s)
Since I don't read/write Chinese, I was wondering if anyone on MR could help.
I know with many CJK Unicode characters, they can render differently depending on which language they're in (Chinese/Korean/Japanese). (See "Han unification" on Wikipedia.) The Fonts/Sentences The documents I'm converting used these 4 fonts in the original DOCs:
Here's an example sentence of each: Spoiler:
(There are ~80 in total.) I converted all to use lang="zh" + xml:lang="zh": Code:
(<i>Shujing</i>, “The Great Declaration I”, <span class="chinese" lang="zh" xml:lang="zh">泰誓上</span>)1. Is "zh" the proper lang to use in this case? (I used Google Translate and it seems like all the characters are in Chinese, but I'm not sure if it's Simplified/Traditional [zh-Hans or zh-Hant].) 2. When working with these characters, would it be best to embed a Chinese/language-specific font? If so, which one? (Free/Open font preferable.) 3. Is there any better way of handling conversion to ebook? Or should I just trust the source document had them typed in correctly and that ereaders will render okay? I visually inspected some, and they seem to render similar to the source documents, but I'm not sure how they'll appear on actual ereaders. The examples all look the same except for some small differences in #1 (SimSun + whatever font Sigil is rendering these in): SimSun Attachment 179540 Attachment 179539 MS Gothic Attachment 179542 Attachment 179541 PMingLiU Attachment 179544 Attachment 179543 MS Mincho Attachment 179546 Attachment 179545 Side Note: For some more CJK unicode goodness, also see: https://meta.stackexchange.com/quest...port-han-chara https://modelviewculture.com/pieces/...-write-my-name Seems like even many sites don't handle certain cases properly... so I can't imagine the ebook side of things. :P |
Are these books being produced for sale?
Do you have specific ecosystems in mind for these books? I don't read or speak Chinese but I know that Kindles have fonts for Chinese books and have different handling for simplified vs. traditional Chinese. |
Quote:
Quote:
Quote:
With Chinese, I previously ran across only ~2-3 characters in an entire book. In that case, I either didn't bother (2 characters likely wouldn't be missed if the reader didn't display), or I subset a font (like Droid Sans Fallback) just for those. In this specific case, it's 2 articles (out of ~230) that have dozens of Chinese words inside... and now that I've since learned about the language-dependent glyphs, I want this done right. :) Side Note: Just now I ran across this: https://en.wikipedia.org/wiki/List_of_CJK_fonts which lists:
None are open-source (so definitely not embeddable). And I may be dealing with different languages than I thought... I also wonder if Droid Sans Fallback is substitutable for all those, and will morph depending on lang... has anyone tested this across different ereaders? Side Note #2: Here's the 2 actual PDFs if anyone wants to take a closer look: http://libertarianpapers.org/wp-cont...3/lp-5-1-5.pdf http://libertarianpapers.org/wp-cont...6/lp-8-1-6.pdf Everything is all CC3.0. |
Do the PDFs embed the required fonts? Otherwise you don't know what it should look like :(
|
Quote:
But there's two parallel issues here: 1. Fonts: Since I can't use any of those 4 proprietary fonts, I'm going to have to rely on different fonts in the ebook. On the proofing side of things, it's hard to tell if this is simple font differences (like a difference between Serif/Sans-Serif fonts)... or if stripping those fonts can cause the displayed text to now be wrong. Side Note: It looks like "Source Han Sans" may be another potential font candidate. 2. HTML Language: There are actual language variations (different swashes and swooshes). For example, this single character: 返 (U+8FD4) in different languages, has at least 5 different representations: https://en.wikipedia.org/wiki/File:S...Difference.svg In ebooks, this would require proper lang markup: Code:
<span lang="zh-Hans">返</span> (Simplified Chinese)I mean, to me, the few sample images I posted in #1 look similar, but I don't know, because it all looks Chinese to me :rofl:. Side Note: My best guess currently, is that I can change anything that was in: PMingLiU -> lang="zh-Hant" (Traditional Chinese) SimSun -> lang="zh-Hans" (Simplified Chinese) MS Gothic + MS Mincho -> lang="ja" (Japanese) then substitute in a thoroughly vetted Asian font (like Source Han Sans). But then comes actual device support... has anyone meticulously tested this stuff across devices? |
I get it now. The book is primarily in English with Chinese characters here and there.
As this relates to Kindle there are language specific fonts for Simplified and Traditional Chinese, but those won't come into play since they are enabled based on the primary language of the book. The regular fonts probably won't have the characters you want and I believe that the fallback is the Code2000 font. I doubt that has any handling of language-specific character variants. So it does appear that embedding a font with the correct language variant would need to be done. Using images instead would be more foolproof. |
I gave up and used an image (screen captured and reduced from source!) at first occurrence with transliteration and then just transliteration. Which may or may not have been correct. It was a few years ago and I tended to get [][][][][] on the actual ebook, but I didn't know much about Calibre or Font Embedding or CSS for language support then.
Also if you had someone Chinese, would they be the "right" Chinese person, though the various written scripts are simple compared with the bewildering variety of spoken "Chinese" languages. |
Quote:
Quote:
This is an English book with the occasional Chinese/Japanese character (~80 foreign words). Side Note: Do you know which fonts Kindles have for Simplified/Traditional Chinese? Quote:
Symbola is also a "fallback font" I embed whenever I'm dealing with very obscure Unicode characters (like Wingdings/Webdings, which I wrote about in 2016). Quote:
Quote:
Side Note: On many Asian font bugs and poor support across all types of programs... I recommend checking out some of these talks:
That's where I first learned about many of these Asian-specific issues. |
Quote:
|
Quote:
It's a shame that these issues were largely solved at the OS level before anyone made any eink reader and that the early Kindles are so poor. What I do now isn't the same as even four years ago. |
Doing a bit more research into "Source Han Sans":
https://github.com/adobe-fonts/source-han-sans They offer it as:
You can read more about why in the readme, or this helpful explanation post: Adobe's CJK Type Blog: "Source Han Sans: OTF, OTC, Super OTC, or Subset OTF?" Turns out, OTC (or TTC) is an "OpenType/CFF Collection". (All technical details can be read in Microsoft: "The OpenType Font File".) Doubt this works in ebooks. So, best bet would probably be to download the OTFs as needed, then embed. That would:
See Microsoft's The Old New Thing: "What happened to the Arial Unicode MS font?" and Wikipedia: "Arial Unicode MS". |
1 Attachment(s)
Quote:
Quote:
@Tex2002ans You also might want to check out Noto CJK. |
How are you going to handle the Chinese characters in Mobi eBooks?
|
1 Attachment(s)
Quote:
Yeah, a lot of the Android fonts are also good, since they're (usually) open source + have to work across the entire world for billions of users at all different DPIs. Here's all of the Asian characters being used (Sigil's "Characters in HTML" Report): Code:
「」えとるアジ丈三上世之京仁佐保倉儒公六凱利到剛劉勢化南口古史司合君周命和商啟嘲四報墨夢大天太好子存学學專小岡崖州帝平年弼從德惠戰揚教文料斯景書末朱李束東林格業樹殘毅民氣江法泰津派浦淮清湖為無熹營爭片物狐獨玉王理瑞產用申發盜目研祖禮秀私程究紂紓經編老臣自蒙虎術袁覚言記詩誓說譜谷資造連遊道遠遺鉄録鏢鐵長開陰陳雲青非革韓頤魯鴉鶚黃黄齊
Note: I attached the 2 articles in EPUB if anyone wants to do testing. It's WIP files as of today, and I currently have no idea if I marked the languages up properly, but you can search for: Code:
<span class="chinese"And in the CSS file: Code:
span.chineseOriginal PDFs are in Post #3. If anyone wants the HTML straight from Word, let me know and I can attach that too (since it has the original font markup too). But let me warn you, it's disgusting, and the characters are wrongly marked as... "French". :rofl: Quote:
Side Note: Also, Chapter 18 "East Asia" of the Unicode Standard: http://www.unicode.org/versions/Unicode13.0.0/ covers a ton of stuff (like half-width/full-width characters). I guess I have some more reading to do. Quote:
|
Quote:
Quote:
|
| All times are GMT -4. The time now is 06:49 PM. |
Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.