View Full Version : De-Ligature-ification


brewt
06-11-2010, 02:04 PM
Ok, so I'm back at the font embedding thing.
One of the "good" reasons to use font embedding is so you can use all those really cool ligatures that are in font sets.
But, with a little (finally) successful font embedding in Calibre, I'm finding that some of them are getting stripped (like 'st'). And, the calibre reader doesn't apparently know what to do some of with them, so leaves a blank hole.

Attached is the pic, showing IE (which displays them correctly), Calibre and ADE.

I know, I know, don't use ligatures.

But I wanna!

-bjc

Jellby
06-11-2010, 03:19 PM
The proper way to deal with ligatures is not using them in your text, but rely on the rendering software to pick them when available. In other words, don't write anything special for "Th", "ffl" or "st", just write the usual individual letters. If you use a font that has ligatures for these combinations but the reader is not displaying them, complain to those responsible of the rendering software, the book itself is right.

By the way, note that in your sample, calibre is not using the same font (that's why some of the glyphs are not available), look at the W shape. As for splitting some ligatures, I think there is some checkbox to disable this.

charleski
06-11-2010, 03:35 PM
If you really want to get the alternate character forms in ADE, open the font in fontforge and find the actual address of the glyph (eg. the code for the st ligature in Minion is FB06), then insert that as a hexcode ( &xFB06; ).

Jellby is right, though.

frabjous
06-11-2010, 03:42 PM
But there's a remaining question of why it is that Calibre isn't using the right font. Calibre's viewer does support embedded fonts (though it's a bit buggy with them, in the same way that WebKit is generally, especially with fonts sharing the same family name—still, that's not the issue here). You've got to figure out why calibre isn't supporting the embedded fonts. Are you using some kind of obfuscation on the fonts?

brewt
06-11-2010, 06:57 PM
No obfustication. And my encrypterator is on the fritz (hence the pic and not the epub).

It seems that the Calibre Viewer is overriding my embedment with it's defaults, i.e., Times New Roman. Bleah. I don't see a switch to turn that off in the Calibre viewer - do I just not know where to look?

Or does Font Embedding only work with the calibre viewer if the fonts are in a certain place in the epub? Like in OEBPS, or Fonts, or something else? The epub shown had them in the root....

No, the Character for the [ st ] glyph was hand-replaced in the right version of the word 'staring' with Unicode character FB06 - it shows up correctly in the Internet Explorer rendition of the source xhtml. It seems that the conversion out of Calibre changed it from
<p class="MsoBodyTextFirstIndent">staring staring</p>
to
<p class="MsoBodyTextFirstIndent">staring staring</p>

I'm assuming that's by design - it can't have just 'accidently' done that.

And, why, yes, there is a "Keep ligatures" checky-boxy thingy in calibre's Look & Feel section, but it doesn't seem to work. Same results on or off. At least, not the way I'm pushing it....

And I can't seem to put my hand to a rendering display software solution (for epubs, anyway) that inserts ligatures when availalble. And don't get me started on kerning, or hyphenation, or or or....


-bjc

frabjous
06-11-2010, 07:17 PM
It seems that the Calibre Viewer is overriding my embedment with it's defaults, i.e., Times New Roman. Bleah. I don't see a switch to turn that off in the Calibre viewer - do I just not know where to look?

Or does Font Embedding only work with the calibre viewer if the fonts are in a certain place in the epub? Like in OEBPS, or Fonts, or something else? The epub shown had them in the root....


As you probably know there's a way to choose your default fonts by clicking on the hammer button, but Calibre overrides those with embedded fonts for me.

They should be in the proper location relative to the CSS file that calls them. E.g., if the CSS file has

src: url("fontfile.otf") format("opentype");

then the font file should be in the same directory as the CSS file. If it says:

src: url("fonts/fontfile.otf") format("opentype");

then it should be in the fonts subfolder of the folder where the CSS file is, and so on.

Not sure how calibre handles this, but I do think it would be best practice to have these under OEBPS or a subfolder of OEBPS.

It might be worth posting the relevant portion of the CSS file so we can see if there might be issues.

No, the Character for the [ st ] glyph was hand-replaced in the right version of the word 'staring' with Unicode character FB06 - it shows up correctly in the Internet Explorer rendition of the source xhtml. It seems that the conversion out of Calibre changed it from
<p class="MsoBodyTextFirstIndent">staring staring</p>
to
<p class="MsoBodyTextFirstIndent">staring staring</p>

I'm assuming that's by design - it can't have just 'accidently' done that.

And, why, yes, there is a "Keep ligatures" checky-boxy thingy in calibre's Look & Feel section, but it doesn't seem to work. Same results on or off. At least, not the way I'm pushing it....


I'd ask about that in the calibre forum, or else open a ticket on the calibre bug tracker.

And I can't seem to put my hand to a rendering display software solution (for epubs, anyway) that inserts ligatures when availalble. And don't get me started on kerning, or hyphenation, or or or....

Newever versions of Firefox do automatic ligature substitution for OpenType fonts. (Well, the common ones anyway--probably not the historical ones like st and ct.) So ePub software built on Firefox like EPUBReader (https://addons.mozilla.org/en-US/firefox/addon/45281/) will do it. Of course, I'm not aware of any mobile devices to which that applies, save Netbooks and the like.

With hyphenation and kerning you're probably out of luck. You could use Jellby's script (http://www.mobileread.com/forums/showthread.php?t=62939) to convert the ePub to suitably-formatted PDF via PrinceXML, however, with kerning and hyphenation.

brewt
06-11-2010, 07:38 PM
If I really wanted to do the h&j bit, I'd just build pdfs out of Indesign. That is something it's actually good for, unlike epubbing. But thanks - epubs/mobi solves more problems for me at the moment that hard-coded page sizes don't.

I think my css calls are ok - the fonting works in ADE and B&N, which are fussier than calibre, most of the time.

No, my calibre viewer is always set to override all fonts every tiime no matter what the epub source - by my hand, indesign, the best of the boards here. I must be doing something wrong. I'll try to come up with a sensible demo when I get my encrypterator spun up again.

But thanks.

-bjc

charleski
06-11-2010, 09:32 PM
If your source is already xhtml, then I'd really recommend using Sigil instead of calibre. I've spent too much time cleaning up conversions that were done in calibre.

frabjous
06-11-2010, 11:19 PM
Out of curiousity, what version of calibre are you using? Mine (0.6.51) definitely does embedded fonts.



You might ask about it in the calibre forum.

brewt
06-12-2010, 01:21 AM
Out of curiousity, what version of calibre are you using? Mine (0.6.51) definitely does embedded fonts.


Todays: 7.2

And yes, I remember earlier versions having embedded fonts override what I could pick (in the viewer).

And I still prefer calibre over sigil - multi-format output is easier there when the css is different for mobi than epub (yes, it's true, I do things the hard way when possible), and I just got the font embedding plugin to work,and I can encrypt again, and all is weller with the world for my needs in calibre.

-b

Jellby
06-12-2010, 06:22 AM
You are not using a custom css in calibre viewer that has "!important" for fonts, are you?

Maybe the font is in a format calibre does not support. I think I tried it with TTF and it worked, but I've never tried OTF. Or maybe the font is not correctly embedded, and ADE and Explorer are using the version of the font installed in your system, not the embedded one, while calibre just doesn't use the embedded one and picks a default instead.

Why don't you post a sample epub, so we can test?

Valloric
06-12-2010, 01:06 PM
The reason why calibre is having problems displaying embedded fonts is because QtWebKit has problems displaying embedded fonts.

See this bug (https://bugs.webkit.org/show_bug.cgi?id=29433).

That bug is the #1 reason why Sigil doesn't have explicit font embedding support.

brewt
06-12-2010, 01:41 PM
My font call in the css has no !Important!'s on it. the font family reads like this:

font-family:"Minion Pro","Dutch801 Rm BT","Times New Roman","Goudy Old Style","Baskerville Old Face","Georgia",serif;

The font being used by Calibre viewer is indeed Times New Roman, despite me having the sony Dutch Roman properly installed on my system; if css rules are being followed, shouldn't they go in order if the first one isn't usable/available?

Clearly (at least to me), Calibre viewer 7.2 is overriding the embedment with it's defaults. And yes, I can set those defaults to Minion and have it work 'better'. I've had minion work out before as embedded (ade picks up the embedded font, even if I remove it from my system fonts directory) as an otf - hain't got a ttf version.

I just got my encrypter off its knees last night; I come up with an intelligible demo epub to post.

Seems odd to me that as font embedding is part of the spec that the implementation of it in the viewers is so spotty. I mis-spake earlier; my B&N PC Viewer doesn't seem to utilize embedded fonts - any body have luck seeing embeddings on a real nook or the viewers?

-bjc

Jellby
06-12-2010, 02:19 PM
My font call in the css has no !Important!'s on it. the font family reads like this:

font-family:"Minion Pro","Dutch801 Rm BT","Times New Roman","Goudy Old Style","Baskerville Old Face","Georgia",serif;

The font being used by Calibre viewer is indeed Times New Roman, despite me having the sony Dutch Roman properly installed on my system; if css rules are being followed, shouldn't they go in order if the first one isn't usable/available?

Is that the css inside the book or the "user css" of calibre viewer? The second is what I was asking about.

Then, one should make sure the font names specified are the correct ones, in the case of embedded fonts they should match the names given in the @font-face, for system-installed fonts, it could be anything.

But I can tell you calibre (0.7.0) can work with embedded fonts, as the attached screenshot shows. The title and drop-cap fonts are embedded in the ePUB, and they are not installed in this computer. If you are worried about embedding an un-obfuscated version of a commercial font, try with a free font instead and post a sample file, please.

frabjous
06-12-2010, 03:02 PM
The reason why calibre is having problems displaying embedded fonts is because QtWebKit has problems displaying embedded fonts.

See this bug (https://bugs.webkit.org/show_bug.cgi?id=29433).

That bug is the #1 reason why Sigil doesn't have explicit font embedding support.

That bug is indeed very annoying, but it is possible to get around it by modifying the embedded fonts so that they all contain distinct font family names. I did that for the ePub available here (http://people.umass.edu/phil335-klement-2/tlp/tlp.epub), and now it supports all the font variants in calibre's viewer.

But I don't think this is the issue here. Brewt isn't getting any of the fonts in the family to show.

brewt
06-12-2010, 04:33 PM
Usually I'm the one who can shout in a loud and confident voice "Man, am I ever doing this wrong!" So, here goes. WinXPsp3, calibre uninstalled and reinstalled at 7.1, upgraded yesterday to 7.2.

Attached: Lots of things:

MinionMyriad-source.zip - actual source before calibre 7.2, created in Word 2007, cleaned up in Dreamweaver cs4 and converted to xhtml. Word quickstyles match (close enough, anyway) to the styles.css. Xhtml and css are close to validating - couple lines here and there about colors in hyperlinks and it's there.

Clipboard01.jpg - a picture in firefox showing the top of that source with the ligatures properly in the correct fonts, Minion Pro and MyriadPro-BlackSemiExt.

Basic Set - Brewt Himself.zip - the source zip file created by calibre.

MyriadMinion3_plugin.zip - calibre plugin that embeds fonts in epubs, based on source by Paul Tomashevskyi posted in the sticky "How to Embed Fonts After Calibre" (http://www.mobileread.com/forums/showthread.php?t=61587) on this forum. Fonts sourced out of C:\Program Files\Calibre2\resources\fonts\MinionMyriad

fontencrypt.zip - font encryption python script written by 'Paul Durrant' posted in the thread "fontencrypt.py - Add Adobe encryption to fonts in ePub (http://www.mobileread.com/forums/showthread.php?t=57034&highlight=fontencrypt.py)" in this forum. I am running it separately; haven't succeeded in making it a plugin yet.

BasicSet.epub - epub created by calibre with embedded fonts embeddeed by above plugin and encrypted by above python script. [Keep ligatures] checked in conversion. Results below are the same before and after encryption.

So. I can see the fonts embedded with ligatures in ADE, minion.
I don't see the ligatures, but it looks like the headline font in ade.

I can see the ligatures in Calibre viewer, minion, but only if I set the default font to minion.
I don't see the ligatures in calibre nor the correct font (myriad) in the headlines, even if I default the sans font to myriad.
No overriding css in viewer.

I see neither the ligatures nor the fonts in B&N/pc.

So. Obviously, I'm doing things, lots of things, wrong because it almost works, but only sometimes. Have at me.

-bjc

charleski
06-12-2010, 07:51 PM
As I thought, you've run into encoding issues. 'BasicSet-12.htm' shows no ligatures at all in Firefox, and looking at it with a hex editor shows that your ligatures have been transformed into garbled 3-byte UCS-2 surrogates.

I can only recommend that you follow the advice I gave yesterday and use explicit escape codes rather than trying to paste in characters and hope they remain intact across all those hops.

brewt
06-12-2010, 08:30 PM
I'll believe you, but what am I seeing here?

Original-original:
<p class="MsoBodyTextFirstIndent">st lost / st lost</p>
<p class="MsoBodyTextFirstIndent">ff gruff /  gru</p>
<p class="MsoBodyTextFirstIndent">fi finish /  nish</p>
<p class="MsoBodyTextFirstIndent">fl flattery / * *attery</p>
<p class="MsoBodyTextFirstIndent">ft swift /  swi</p>
<p class="BodyText4">ffl affluence /  auence</p>
<p class="MsoBodyText">Ligatures in Headline Font: (left plain, right hand
changed)</p>
<h1>ff gruff / ff gruff</h1>
<h1>fi finish / fi finish</h1>
<h1>fl flattery / fl flattery</h1>
<h1>ffl affluence / ffl affluence</h1>

Caliber-ified source that rendering is based on:
<p class="MsoBodyTextFirstIndent">st lost / st lost</p>
<p class="MsoBodyTextFirstIndent">ff gruff /  gru</p>
<p class="MsoBodyTextFirstIndent">fi finish /  nish</p>
<p class="MsoBodyTextFirstIndent">fl flattery / * *attery</p>
<p class="MsoBodyTextFirstIndent">ft swift /  swi</p>
<p class="BodyText4">ffl affluence /  auence</p>
<p class="MsoBodyText">Ligatures in Headline Font: (left plain, right hand
changed)</p>
<h1>ff gruff / ff gruff</h1>
<h1>fi finish / fi finish</h1>
<h1>fl flattery / fl flattery</h1>
<h1>ffl affluence / ffl affluence</h1>

-bjc

JSWolf
06-12-2010, 08:42 PM
My font call in the css has no !Important!'s on it. the font family reads like this:

font-family:"Minion Pro","Dutch801 Rm BT","Times New Roman","Goudy Old Style","Baskerville Old Face","Georgia",serif;

The font being used by Calibre viewer is indeed Times New Roman, despite me having the sony Dutch Roman properly installed on my system; if css rules are being followed, shouldn't they go in order if the first one isn't usable/available?

Clearly (at least to me), Calibre viewer 7.2 is overriding the embedment with it's defaults. And yes, I can set those defaults to Minion and have it work 'better'. I've had minion work out before as embedded (ade picks up the embedded font, even if I remove it from my system fonts directory) as an otf - hain't got a ttf version.

I just got my encrypter off its knees last night; I come up with an intelligible demo epub to post.

Seems odd to me that as font embedding is part of the spec that the implementation of it in the viewers is so spotty. I mis-spake earlier; my B&N PC Viewer doesn't seem to utilize embedded fonts - any body have luck seeing embeddings on a real nook or the viewers?

-bjc

Are these font part or the ePub or are they just installed in your system?

charleski
06-12-2010, 09:30 PM

is an example of the mangled 3-byte code (EFAC86) that has resulted from something in your chain mis-interpreting the 2-byte FB06 code for the st ligature and attempting to convert it. Hard to say if that's Word or Dreamweaver, but Word seems to produce html with the correct escape sequence for a private-use character. It seems calibre has somehow managed to separate this code back into the letters s and t, but can't do the same for the other mangled codes.

Ligatures (like swash caps, text figures and other typographic variants) are not part of the UTF spec* and you can't rely on programs to recognise such font-specific alternative characters. If you want to use them, make sure they're embedded as explicit escape sequences from the start.

*[Edit]Unlike useful stuff like Linear B (which died out around 1100B.C.) and 38 different types of arrow... :rolleyes: The lack of UTF codes for text figures is especially annoying.

brewt
06-12-2010, 09:51 PM
Ligatures (like swash caps, text figures and other typographic variants)
are not part of the UTF spec and you can't rely on programs to recognise such font-specific alternative characters.
If you want to use them, make sure they're embedded as explicit escape sequences from the start.

Ah. Makes sense - finally - thanks. So much for that idea.



font-family:"Minion Pro","Dutch801 Rm BT","Times New Roman","Goudy Old Style","Baskerville Old Face","Georgia",serif;
Are these font part or the ePub or are they just installed in your system?

That's the ccs I'm using in the epub, which "should" represent a preferred font sequence. I've got some weirdo ideas about compensating for double quotes that the indent numbers only work with certain fonts on, and without embedding, I've got to depend on the idea that someone else has the 'right' font(s) installed on their system.

If I make the call
font-family: serif;
then I am increasing the chance they won't have a font defaulted to on their system that works with my metric design on their system, having the overall look get slaughtered, and run the risk of further complaints and ridicule.

Oh, like I'm not gonna get that anyway for having such a lame idea.

-bjc

Jellby
06-13-2010, 06:21 AM
BasicSet.epub - epub created by calibre with embedded fonts embeddeed by above plugin and encrypted by above python script. [Keep ligatures] checked in conversion. Results below are the same before and after encryption.

As far as I know, calibre does not support font encryption/obfuscation.

Looking at the differences between your css and the one I use (which works with calibre):

Yours:
@font-face {
font-style: normal;
font-family: 'Minion Pro', serif;
font-weight: normal;
src: url(MinionPE.otf);
}

Mine:
@font-face {
font-family: "Carolus FG";
src: url("../fonts/CAROF___.TTF") format("truetype");
}

You could try something like:

@font-face {
font-family: 'Minion Pro';
src: url(MinionPE.otf) format('opentype');
}

and see if that helps.

Valloric
06-13-2010, 08:23 AM
But I don't think this is the issue here. Brewt isn't getting any of the fonts in the family to show.

That's another wonderful feature of QtWebKit. I've seen epub books that show the main embedded font variant in Sigil on Windows and Mac just fine, but display with a default font on Linux. I've seen other books that display their embedded fonts + variants on Mac just fine, and fail on Windows and Linux. Also those that work on Linux, but fail on the other two platforms.

Still haven't found any logic to it. The QtWebKit font loading problems are more extensive than just not showing font variants.

This was all with the same version of Sigil and QtWebkit.

So the font issues are directly related to the platform on which QtWebKit is running... Macs seem to have the least problems (but still have quite a few), probably because QtWebKit defers some operations to the system Webkit on Macs (but that's an educated guess).

WillAdams
06-13-2010, 10:18 AM
Unicode is set up for scholarship and backwards compatibility.

Variant forms which don't have linguistic significance (and are not needed for backwards compatibility like fi and fl) do not rate and will not get code points (and should not be assigned points in the PUA if one wants the under-lying text to be seen as text)

brewt
06-13-2010, 12:47 PM
As far as I know, calibre does not support font encryption/obfuscation.

Really really? 'cause if that's true, the attached shouldn't work.

And tho I didn't take a lot of time on it (quick & dirty, lots of other problems), the ligatures show up just fine in ade and calibre viewer.

Made with Indesign (gak!)

-bjc

charleski
06-13-2010, 12:51 PM
Unicode is set up for scholarship and backwards compatibility.Luckily, we'll be able to read .txt files from an Ancient Cretian thumb drive.

Variant forms which don't have linguistic significance (and are not needed for backwards compatibility like fi and fl) do not rate and will not get code points (and should not be assigned points in the PUA if one wants the under-lying text to be seen as text)And this is a failure on the part of the Unicode consortium, simple as that.

Ligatures certainly can have linguistic significance, just look at the ampersand sign, and Unicode honours this for many other languages. I wouldn't be so bothered if if weren't for the fact that the Unicode range contains such a large amount of rubbish.

Jellby
06-13-2010, 01:14 PM
Really really? 'cause if that's true, the attached shouldn't work.

OK, it works, so calibre supports obfuscation, that's good.

But what is your problem then? Is it only that you don't get it to work when you generate the ePUB with calibre?

Note that your file BasicSet.epub does not pass epubcheck.

brewt
06-13-2010, 03:26 PM
But what is your problem then? Is it only that you don't get it to work when you generate the ePUB with calibre?

Note that your file BasicSet.epub does not pass epubcheck.

Maybe I don't know how to do it, but none of my creations out of calibre can pass epubcheck, even if I go to the trouble to fix the last bits of the xhtml to completely validate (along with valid css) and nothing fancy happens in the epub. Same problem with indesign output. And I'm still not finding a deep benefit to validation, given all that I want to do.

And my problem? Where do I start? Lazy, dull, dimwitted, lacking in understanding, frustrated, tired, poor, old, etc., etc. None of which the good folk here can do anything about.

But I do appreciate your considerations, even if I am left with a generalized sense of a lack of direction. Every method leaves large numbers of things left out, making no choice the long term right choice. Ah well, Damn the decisions, full steam ahead. Guess I just want too much to be easier. Or maybe I just want to do more than is allowed, given the current state of he technologies.

No hard feelings?

-bjc

WillAdams
06-13-2010, 09:46 PM
charleski, more importantly, scholars are able to unambiguously set Cretan scripts and discuss them unambiguously.

If one wants ligatures it's simple enough to set a run of text and turn them on --- no need for unique code points which aren't practical in any case --- there are simply too many possibilities. I had to do quite a bit of fiddling to manage to encode just the options in Zapfino:

http://www.tug.org/TUGboat/Articles/tb24-2/tb77adams.pdf

Assigning codepoints for ligatures would be nightmarish and would quickly exhaust them.

Jellby
06-14-2010, 09:54 AM
No hard feelings?

None at all.

But at this moment I think if would be beneficial to re-state what you want to attain. At first I thought you just wanted an ePUB with embedded fonts and ligatures that works, but you have already shown a sample that works in ADE and calibre viewer... So, what else do you want? Do you want to understand how to do it by hand? Do you want a workflow to get it from your source files?

Also, when creating sample documents, try to make them simple. To test ligatures and embedded fonts, often a single font is enough, and including several families of different fonts only make the file larger and more cumbersome to analyse.

As a general comment on ligatures I think the "right way" would be to have something like an OTF flag or feature that can be enabled/disabled, either through CSS or through software preferences, the same holds for old-style numbers and other features. Prince (http://www.princexml.com/bb/viewtopic.php?f=3&t=1588&start=0) works along this lines.

charleski
07-29-2011, 05:54 AM
The good news is that the ADE 1.8 preview will automatically substitute a ligature if the font supports it.

Using the epub Brewt uploaded a few posts back:
ADE 1.7.2:
http://www.mobileread.com/forums/attachment.php?attachmentid=74903&stc=1&d=1311929336

ADE 1.8 preview:
http://www.mobileread.com/forums/attachment.php?attachmentid=74904&stc=1&d=1311929336

Encoding unique font code points was never a satisfactory solution, but it looks like we no longer need to be held hostage by the scholars of Ancient Crete.

Jellby
07-30-2011, 05:22 AM
And the Cybook Orizon now uses ligatures too. I'll have to check whether it also applies font-specified kerning as well.

Jellby
07-31-2011, 05:45 AM
Apparently, kerning is applied too, but ligatures and kerning are linked to hyphenation: disable hyphenation and you disable ligatures and kerning.