10-21-2017, 06:33 AM | #1 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
To create a subset
Hi
One of the drawbacks of the embedded fonts are their size. Sometimes, for example when you are dealing with a large Chinese font (15 MB), and when you can't even create a proper subset. This is why I am evaluating this technique, following the advice of a friend. Let me describe how to do it: First, if you do not have it yet, Code:
sudo pip install fonttools - the ttf font exported from the EPUB, say STSong.ttf - a UTF-8 txt file named china.txt containing the Chinese characters you wish to include in the subset. These characters can easily be copied for example from the last line of the Characters tab of the Calibre Editor reports tool. To create the subset, just use this command Code:
pyftsubset STSong.ttf --text-file="china.txt" This way of creating a subset seems quite useful, for example if you have a book containing some words or expressions in some exotic language. You can also use it for standard books. I subsetted this way the regular font of Linux Libertine (800k) embedded in an EPUB with 120 different characters in a 58k subset. I could have followed creating subsets for the italic and bold fonts using the same 120 characters txt file. I just wonder how to reduce this figure to the real number of italic or bold characters used in the book. Somebody has an idea? Last edited by roger64; 10-21-2017 at 06:39 AM. |
10-21-2017, 12:26 PM | #2 |
Resident Curmudgeon
Posts: 73,661
Karma: 127838198
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
You are doing a rather complicated way of subsetting. Create your ePub with the embedded fonts and load the ePub into the Calibre editor and let Calibre do the subsetting. Don't forget to turn off metadata updating on load in the editor.
Last edited by JSWolf; 10-21-2017 at 12:30 PM. |
Advert | |
|
10-21-2017, 01:51 PM | #3 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
Roger, what you describe is more or less what my Font subsetter tool already does...
|
10-21-2017, 04:02 PM | #4 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Tell me what do you do when the Chinese font is not subsetted with the Calibre editor?
Last edited by roger64; 10-21-2017 at 04:07 PM. |
10-21-2017, 04:05 PM | #5 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
|
Advert | |
|
10-21-2017, 04:20 PM | #6 |
Resident Curmudgeon
Posts: 73,661
Karma: 127838198
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
|
10-21-2017, 05:24 PM | #7 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
Sure, I can tell you how I do it... I just go through all the text and determine based on the styling if it is regular, bold, italic or bolditalic. Of the resulting text I pick the unique characters and very important add the ligatures. Others you might face issues when a reader automatically uses ligatures if the character sequence occurs.
|
10-22-2017, 12:25 AM | #8 |
A Hairy Wizard
Posts: 3,070
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Just curious. Aren't ligatures dependent on the language being used? Do you have a database of all ligatures for all languages??
|
10-22-2017, 02:59 AM | #9 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Hi
As a user -and not a programmer-, I use only one tool, named pyftsubset, and with it, only one command line. For those who question its use, I advise them to try the tool. As mentioned above, I also make use of the list of unicode characters produced by the Calibre editor reports tool. pyftsubset is part of bigger project named fonttools. I failed to see who are its five authors and its maintainer. Anyway thanks to them. I use it with Archlinux, but it can work on many other operating systems as long as you have a recent version of Fontforge and Python (and python-pip for Arch). Here is some more information about its use. If I understand correctly, by default, it says that it preserves ligatures. As you can read, it's quite an advanced tool. See its help file in the attachment below. From my very limited experience, it gave me good results for the two tests I did. One, with a huge Chinese ttf font, two with a standard ttf font. In both case, I have been able to use an Epubcheck compliant EPUB and to read both of them without any problem on my Koreader. @Toxaris Perusing through the text, with the finger extended, looking for italic or bold characters, is not an option for me. I had hoped there could exist some kind of automatic tool to do it. Anyway, if I compare the size obtained for the 120 characters of the regular font (58k) and I presume it would be the same for the italic font if I proceed from the same 120 characters, it still compares with advantage to the 166k and 129k I obtain by other means. Last edited by roger64; 10-22-2017 at 05:07 AM. Reason: attachment |
10-22-2017, 03:26 AM | #10 | ||
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
Quote:
Quote:
|
||
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Trouble with fonts subset | Bertrand | Editor | 2 | 12-05-2016 09:58 AM |
Subset of a library | Giuseppe Chillem | Calibre Companion | 1 | 08-14-2016 02:50 PM |
Library subset not working on Dropbox | dlfuller | Calibre Companion | 5 | 04-25-2016 06:20 PM |
Testing a subset | roger64 | Editor | 6 | 03-12-2014 01:22 AM |
group an ARBITRARY subset of records | RotAnal | Library Management | 6 | 10-09-2012 11:53 AM |