MobileRead Forums - View Single Post - FontShrinker

Jellby · 01-24-2013, 01:20 PM

I'll try to explain. A text typically has no explicit ligatures, it could have some, but it should not, and I've only seen some very old text files with them. What a text has is just normal unicode characters, let's say a text consists of the single word "office", that's only 5 different letters: c, e, f, i, o.

Now, a font could have ligatures defined, and a reading software may use them (although many do not, I'm afraid). Let's say that the font we are dealing with has the ligatures "fi", "ffi" and "fj" defined. Defining a ligature means that the font has a glyph (a character shape) for the combination "fi" and some instructions saying that whenever there's an f and an i in the text, they should be rendered as the "fi" ligature and not as the separate characters (ditto for "ffi" and "fj").

OK, then our text will ideally be displayed as 4 glyphs: "o", "ffi", "c" "e". There are different things a font subsetter could do:

1) Remove everything but "o", "f", "i", "c", "e", including ligatures and their definition. This is not ideal, but it's probably the simplest.

2) Same as 1, but do not remove ligatures or their definition. That's much better, but it leaves unused glyphs, such as "fi" or "fj".

3) Detect ligatures, find out that "i" and "f" are never used alone, and remove everything but "o", "ffi", "c", "e". This is not a good idea, as renderers that do not support ligatures will not be able to display "f" and "i".

4) Remove all unused single characters, and related ligatures. This would remove "fj", since "j" is not in the source text, but leave "fi" since both "f" and "i" are, although the "fi" ligature is never used (because we have "ffi" already). I think this is the perfect combination of subsetting and not too demanding.

5) Remove some or all ligatures (the glyphs), but do not remove their definitions. This is not a good idea either, and I think this was the bug in Calibre. It means a renderer supporting ligatures would believe there is a ligature to use for "ffi", but it would't find it.

So, if you can, go for #4. But things may be significantly harder. A font (particulary an OTF one) may contain other alternate shapes for glyphs (final forms, swash forms, older variants, small-caps, etc.), those are currently unused by practically all renderers, but there's still hope that some day we'll be able to enjoy some more advanced typesetting options...

01-24-2013, 01:20 PM	#28
Jellby frumious Bandersnatch Posts: 7,516 Karma: 18512745 Join Date: Jan 2008 Location: Spaniard in Sweden Device: Cybook Orizon, Kobo Aura	I'll try to explain. A text typically has no explicit ligatures, it could have some, but it should not, and I've only seen some very old text files with them. What a text has is just normal unicode characters, let's say a text consists of the single word "office", that's only 5 different letters: c, e, f, i, o. Now, a font could have ligatures defined, and a reading software may use them (although many do not, I'm afraid). Let's say that the font we are dealing with has the ligatures "fi", "ffi" and "fj" defined. Defining a ligature means that the font has a glyph (a character shape) for the combination "fi" and some instructions saying that whenever there's an f and an i in the text, they should be rendered as the "fi" ligature and not as the separate characters (ditto for "ffi" and "fj"). OK, then our text will ideally be displayed as 4 glyphs: "o", "ffi", "c" "e". There are different things a font subsetter could do: 1) Remove everything but "o", "f", "i", "c", "e", including ligatures and their definition. This is not ideal, but it's probably the simplest. 2) Same as 1, but do not remove ligatures or their definition. That's much better, but it leaves unused glyphs, such as "fi" or "fj". 3) Detect ligatures, find out that "i" and "f" are never used alone, and remove everything but "o", "ffi", "c", "e". This is not a good idea, as renderers that do not support ligatures will not be able to display "f" and "i". 4) Remove all unused single characters, and related ligatures. This would remove "fj", since "j" is not in the source text, but leave "fi" since both "f" and "i" are, although the "fi" ligature is never used (because we have "ffi" already). I think this is the perfect combination of subsetting and not too demanding. 5) Remove some or all ligatures (the glyphs), but do not remove their definitions. This is not a good idea either, and I think this was the bug in Calibre. It means a renderer supporting ligatures would believe there is a ligature to use for "ffi", but it would't find it. So, if you can, go for #4. But things may be significantly harder. A font (particulary an OTF one) may contain other alternate shapes for glyphs (final forms, swash forms, older variants, small-caps, etc.), those are currently unused by practically all renderers, but there's still hope that some day we'll be able to enjoy some more advanced typesetting options...