View Full Version : ePub Font Subsetting


Cygfrydd
04-25-2010, 05:23 PM
I'm working on an entirely Python-based ePub build toolchain (I use Subversion for source management); have it working quite nicely, including font embedding and obfuscation. However, the resultant ePubs are suffering bloat, since I'm using fonts that have fairly extensive collections of glyphs, so I needed to implement some sort of subsetting.

This turned out to be far more complicated than I initially realised. epub-tools (http://code.google.com/p/epub-tools/) has been mentioned several times as supporting both obfuscation and subsetting, however, it's implemented in Java, and doesn't appear to be able to take an already-compiled ePub and modify it. Subsetting requires, it seems, two rather complex tasks: 1) parsing the content of the component files of the ePub for all elements that aren't set display: none (and possibly alt-text for images), parsing the embedded/inline-set styles to generated a computed style for each element, resolving the computed style to point at an embedded font, and then collecting the used glyphs from that font to decide what needs to be subset, and 2) subsetting the font[s] appropriately, which, as I've discovered, isn't as simple as just deleting all glyphs from the font that aren't needed (besides .notdef); apparently just modifying the Truetype 'glyf' table is insufficient.

I have an extremely ugly solution partially working, by using the Java tool css2xslfo (http://www.re.be/css2xslfo/index.xhtml) to convert my content into XSL:FO, parsing the results to get font information and glyph coverage (drastically easier than trying to parse XHTML+CSS, and get computed styles), and then subsetting the font using a Perl tool font-optimizer (http://bitbucket.org/philip/font-optimizer/src/) to take the list of glyphs and actually do the subsetting.

This is ugly, and certainly doesn't meet my goal of doing everything in Python.

Does anyone have any suggestions? I can probably manage to cobble together workable font-subsetting using fonttools (http://sourceforge.net/projects/fonttools/), which has a truly lovely roundtripping TTF-to-XML conversion, but the actual parsing of XHTML and associated stylesheets seems to be beyond me (though I find it difficult to believe someone hasn't already implemented this, beyond the basic stuff that cssutils (http://cthedot.de/cssutils/) does).

So... anyone have any ideas?

óCyg

kovidgoyal
04-25-2010, 06:08 PM
calibre resolves all CSS into simple classes of computed values as part of the conversion pipeline. This is then used for things like font size rescaling. Finding embedded fonts for subsetting should be trivial.

billingd
08-17-2010, 09:53 AM
sorry for the noise.