|04-25-2010, 04:23 PM||#1|
Join Date: Jan 2007
Location: St. Petersburg, Florida, US
Device: Sony Reader PRS-505
ePub Font Subsetting
I'm working on an entirely Python-based ePub build toolchain (I use Subversion for source management); have it working quite nicely, including font embedding and obfuscation. However, the resultant ePubs are suffering bloat, since I'm using fonts that have fairly extensive collections of glyphs, so I needed to implement some sort of subsetting.
This turned out to be far more complicated than I initially realised. epub-tools has been mentioned several times as supporting both obfuscation and subsetting, however, it's implemented in Java, and doesn't appear to be able to take an already-compiled ePub and modify it. Subsetting requires, it seems, two rather complex tasks: 1) parsing the content of the component files of the ePub for all elements that aren't set display: none (and possibly alt-text for images), parsing the embedded/inline-set styles to generated a computed style for each element, resolving the computed style to point at an embedded font, and then collecting the used glyphs from that font to decide what needs to be subset, and 2) subsetting the font[s] appropriately, which, as I've discovered, isn't as simple as just deleting all glyphs from the font that aren't needed (besides .notdef); apparently just modifying the Truetype 'glyf' table is insufficient.
I have an extremely ugly solution partially working, by using the Java tool css2xslfo to convert my content into XSL:FO, parsing the results to get font information and glyph coverage (drastically easier than trying to parse XHTML+CSS, and get computed styles), and then subsetting the font using a Perl tool font-optimizer to take the list of glyphs and actually do the subsetting.
This is ugly, and certainly doesn't meet my goal of doing everything in Python.
Does anyone have any suggestions? I can probably manage to cobble together workable font-subsetting using fonttools, which has a truly lovely roundtripping TTF-to-XML conversion, but the actual parsing of XHTML and associated stylesheets seems to be beyond me (though I find it difficult to believe someone hasn't already implemented this, beyond the basic stuff that cssutils does).
So... anyone have any ideas?
|04-25-2010, 05:08 PM||#2|
creator of calibre
Join Date: Oct 2006
Location: Mumbai, India
calibre resolves all CSS into simple classes of computed values as part of the conversion pipeline. This is then used for things like font size rescaling. Finding embedded fonts for subsetting should be trivial.
|08-17-2010, 08:53 AM||#3|
Join Date: May 2010
Location: Melbourne, Australia
Last edited by billingd; 08-17-2010 at 08:55 AM. Reason: deleting irrelevant post
|epub font subset python|
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|PDF 2 EPUB - font problem||sulka||Calibre||18||09-16-2010 06:20 AM|
|Font Difference Between ePUB and LRF?||EatingPie||Sony Reader||7||05-14-2010 05:32 PM|
|PRS-600 Default EPUB font?||jamadams||Sony Reader||5||04-06-2010 11:07 PM|
|ePub with external font||DairyKnight||Sony Reader||34||02-22-2010 02:31 AM|
|How do I insert a font in my epub using Sigil?||Haya||Sigil||2||11-10-2009 09:47 AM|