Someone raised the question of ligatures and alternate characters. As I said, these are very good questions, as the creator of the ePub has no control over the glyph choice of the display software.
In PDFs, the subsetting task is a lot simpler, as all the glyphs used (not just characters) are fixed in the PDF.
For ePubs and KF8, I think we must take this into account in any solution. But this doesn't need to be part of the font subsetting code, which should work from a passed list of glyphs that should be included. (And should return an error if any are missing from the font.)
For ligatures we might need to generate not only a list of all characters in a file, but also of all character pairs. But, of course, there are also three character ligatures (ffi in English, for examples) and I suppose some languages might have more.
Hmmm... Perhaps we just need to include all ligatures for which the source file includes all the characters in the ligature.
Or perhaps we also need a script to get information on ligatures present in a font, so that that information can be used when parsing the XHTML.
Or should we start off with a very basic solution, and elaborate once that's working?