Quote:
Originally Posted by eriĉjo
I am hand-converting a PDF book to ePub and have run into a problem. I don't want to use the PDF fonts because they are licensed in a proprietary format. So, I'm using as close as possible open source equivalents. One problem I am running into is that I am importing the entire font into the ePub file, making it much larger than it really needs to be. What I'd like to do is strip all unnecessary characters from the font so that it is as small as possible. I know how to do this with FontForge, but what I don't know how to do is determine which unicode characters, exactly, are used in a given work. The author likes to use various characters here and there beyond the normal Esperanto ones (most of the English alphabet, plus ĉĈĝĜĥĤĵĴŝŜŭŬ). I'm worried about missing various characters, I would have to examine the whole document by hand if I guessed. Is it possible to trick Acrobat Pro to do it for me (by converting it to PDF and then getting Acrobat to do it)? Are there scripts for detecting which characters are used in a Unicode text file? Any ideas for solving this problem are appreciated.
|
The following link might help you get started:
http://scripts.sil.org/cms/scripts/p...CharacterCount
If you're handy with perl you might be able to use the counts to write a FontForge script to automatically generate the stripped font.
Regards,
Joop