View Full Version : Support of Special Unicode Characters?

03-26-2013, 03:52 AM
First, please accept my apologize, because this is a bit off-topic, as I am not a Sigil user (and won't be, because I prefer to use XML+XSLT to create EPUBs).

But the Sigil forum was recommended to me as a first reaction to my original post, because a lot of EPUB creators are participating here.
In the (EPUB) e-books I create, I need support of ligatures (and the contrary, the "non-joining of glyphs"). However, I am quite desperate because my impression is that...:
-- either I am doing something completely wrong, or...
-- the support of the related OpenType features is simply non-existent, when it comes to recent mobile e-book devices.

In particular:
I am wondering whether/how it is possible to have Unicode characters like ZWNJ (U+200C) correctly "displayed" by e-book devices/software, when these appear in EPUB documents. (Just for completeness: "My" EPUBs look fine when using PC software like e.g. the Calibre preview!)

I am currently doing some tests on my Pocketbook Touch and also tried a Sony reader, and I get the impression that all EPUB software there simply ignores this character. (That means: All ligatures are automatically built, just like defined within the embedded OTF font, but they are also built at places where they must not appear (although being separated by ZWNJ from each other).)

Anyone else here in this forum interested in "Typography and EPUB"?
Can someone...
--- confirm or deny this typography problem?
--- tell me [if my impression is correct] whether this is an unchangeable fact that I simply have to accept (for the time being), or is there anything I can do?

The only trick that seems to come somewhat near to what there should be is to use U+200A (Hair Space) instead, because it prevents a ligature being built from the glyphs left and right to this character. But the downside of it is that a "Hair Space" allows a line-break, which no-one wants within a word, of course. I haven't found the time yet to test characters like U+2060 ("Word Joiner") or similar. But probably it's pointless anyway to check all the endless ranges of Unicode characters, because ZWNJ is the officially declared candidate for this purpose.

Maybe some other ideas?

Thanks for your efforts!

03-26-2013, 04:17 AM
Almost all knowledgable people linger in multiple groups, so you really should have posted this in the ePUB forum instead of the Sigil forum.
Now, with regards to your questions. The support of ligatures is dependent of the reader device. The older devices will not be able to handle it, since the ligature characters are not part of the font of the readers. The only way to have the ligatures in the old devices, is to embed a font with the ligatures.
That being said, the readers that support ligatures also have the tendency to create ligatures when the characters are next to each other. I believe the Sony does that and the reasoning is that it is better read experience for the reader. I think that for some readers you are able to turn it off.

Can you give cases where a ligature should not appear? If I recall correctly, it is only for certain languages like German. I think there lies the issue. The readers (and its software) are build in general for the English market (I know there are localized versions, but those are just translation). I do agree that the ZWNJ should be honored, but I am not too surprised it gets ignored actually... Have you tried with different embedding fonts like Charis SIL? You stand a good change ZWNJ is part of that font.
If the ZWNJ character is not part of the font (sounds silly since it is empty, but it needs to be defined), it will be ignored. Since on most readers the internal fonts are crippled/not complete, I would not be surprised if the ZWNJ is not part of it.

03-26-2013, 11:20 AM
Moved to EPUB

03-26-2013, 01:43 PM
You might try a zero-width space alongside each zero-width non-joiner. Worth a try, anyway. Both break ligatures and are treated as potential wrapping points, so I'm really not clear on the difference between the two, other than ostensibly some theoretical semantic difference between "not joining" and "unjoining". Anybody?

03-26-2013, 04:29 PM
Some of us are interested in "typography and EPUB", although it's usually a rather disappointing subject... Regarding the spaces, you can see here ( that some special ones are at least supported in some reader, and in particular ‌ worked fine in my test.

03-27-2013, 06:59 AM
Can you give cases where a ligature should not appear? If I recall correctly, it is only for certain languages like German.
Yes, this is probably a field that is interesting only for some particular languages (German, in my case).

(Just as a general info for non-Germans: In good German typography, it is forbidden to have a ligature across different parts of a compound word. For a word like e.g. auffinden, which is made from the two morphems "auf"+"finden", one may bind the 2nd "f" with the "i", but one must not bind the two "f". A similar idea might also exist for English, but because English rarely combines words together to a new single word this way, this is probably not relevant here.)

@theducks: Thanks for moving to the right place! :)

@Jellby: Thanks for pointing me to the Test EPUB! So, obviously at least some readers are able to handle these things correctly. My own device ("Pocketbook Touch"), however, even fails completely at the "Ligatures" chapter of your Test EPUB(s). (Unfortunately this rises more questions as it answers, because this device succeeds in building ligatures for the fonts that I am using for creating my EPUBs, whereas it fails completely at building the ligatures in your Test EPUB. So, there must still be some [font-internal!?] differences that I've not understood so far...)

@dgatwood: Thanks for the suggestion! The good news is: It works (at least with my EPUBs on the Pocketbook Touch), i.e. the combination U+200C U+2060 breaks ligatures (without generating unwanted spaces or line-breaks).
Concerning what the difference might be between these two, I can only guess: Maybe it is related to the full-text-search, maybe the U+2060 breaks the word into two different parts so that the search does not find it, whereas the U+200C should probably not prevent a full-text-search. (But I am not sure about that, and I can even observe for a normal MozillaFirefox on a PC that the full-text-search finds neither a U+200C-separated word nor a U+2060-separated word.)

03-27-2013, 03:41 PM
Like I said, the internal fonts probably don't have the ZWNJ character built in. Therefore it works for some embedded fonts.