Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 10-21-2017, 06:33 AM   #1
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,239
Karma: 2334301
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
To create a subset

Hi

One of the drawbacks of the embedded fonts are their size. Sometimes, for example when you are dealing with a large Chinese font (15 MB), and when you can't even create a proper subset. This is why I am evaluating this technique, following the advice of a friend.

Let me describe how to do it:

First, if you do not have it yet,
Code:
sudo pip install fonttools
Then, prepare a work folder. We'll find here
- the ttf font exported from the EPUB, say STSong.ttf
- a UTF-8 txt file named china.txt containing the Chinese characters you wish to include in the subset. These characters can easily be copied for example from the last line of the Characters tab of the Calibre Editor reports tool.

To create the subset, just use this command

Code:
pyftsubset STSong.ttf --text-file="china.txt"
It will create a subsetted font "STSong.subset.ttf" (12.9 k) with which you can replace the original one.

This way of creating a subset seems quite useful, for example if you have a book containing some words or expressions in some exotic language.

You can also use it for standard books. I subsetted this way the regular font of Linux Libertine (800k) embedded in an EPUB with 120 different characters in a 58k subset.

I could have followed creating subsets for the italic and bold fonts using the same 120 characters txt file. I just wonder how to reduce this figure to the real number of italic or bold characters used in the book. Somebody has an idea?

Last edited by roger64; 10-21-2017 at 06:39 AM.
roger64 is offline   Reply With Quote
Advert
Old 10-21-2017, 12:26 PM   #2
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 52,137
Karma: 45956325
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Aura H2O, Sony PRS-650, Sony PRS-T1, nook STR, iPad 4, iPhone 5
You are doing a rather complicated way of subsetting. Create your ePub with the embedded fonts and load the ePub into the Calibre editor and let Calibre do the subsetting. Don't forget to turn off metadata updating on load in the editor.

Last edited by JSWolf; 10-21-2017 at 12:30 PM.
JSWolf is offline   Reply With Quote
Old 10-21-2017, 01:51 PM   #3
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,437
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
Roger, what you describe is more or less what my Font subsetter tool already does...
Toxaris is offline   Reply With Quote
Old 10-21-2017, 04:02 PM   #4
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,239
Karma: 2334301
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Quote:
Originally Posted by JSWolf View Post
You are doing a rather complicated way of subsetting. Create your ePub with the embedded fonts and load the ePub into the Calibre editor and let Calibre do the subsetting. Don't forget to turn off metadata updating on load in the editor.
Tell me what do you do when the Chinese font is not subsetted with the Calibre editor?

Last edited by roger64; 10-21-2017 at 04:07 PM.
roger64 is offline   Reply With Quote
Old 10-21-2017, 04:05 PM   #5
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,239
Karma: 2334301
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Quote:
Originally Posted by Toxaris View Post
Roger, what you describe is more or less what my Font subsetter tool already does...
Happy to learn it. It gives me confidence.

So maybe you can teach me how to create a list of the italic and bold characters used in a book?
roger64 is offline   Reply With Quote
Advert
Old 10-21-2017, 04:20 PM   #6
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 52,137
Karma: 45956325
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Aura H2O, Sony PRS-650, Sony PRS-T1, nook STR, iPad 4, iPhone 5
Quote:
Originally Posted by roger64 View Post
Tell me what do you do when the Chinese font is not subsetted with the Calibre editor?
Go to the Calibre forum and post a bug report.
JSWolf is offline   Reply With Quote
Old 10-21-2017, 05:24 PM   #7
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,437
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
Quote:
Originally Posted by roger64 View Post
Happy to learn it. It gives me confidence.

So maybe you can teach me how to create a list of the italic and bold characters used in a book?
Sure, I can tell you how I do it... I just go through all the text and determine based on the styling if it is regular, bold, italic or bolditalic. Of the resulting text I pick the unique characters and very important add the ligatures. Others you might face issues when a reader automatically uses ligatures if the character sequence occurs.
Toxaris is offline   Reply With Quote
Old 10-22-2017, 12:25 AM   #8
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 1,590
Karma: 11700000
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 6/5/iPad 1,2 & Air/Surface Pro/Kindle PW
Just curious. Aren't ligatures dependent on the language being used? Do you have a database of all ligatures for all languages??
Turtle91 is offline   Reply With Quote
Old 10-22-2017, 02:59 AM   #9
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,239
Karma: 2334301
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Hi

As a user -and not a programmer-, I use only one tool, named pyftsubset, and with it, only one command line. For those who question its use, I advise them to try the tool. As mentioned above, I also make use of the list of unicode characters produced by the Calibre editor reports tool.

pyftsubset is part of bigger project named fonttools. I failed to see who are its five authors and its maintainer. Anyway thanks to them.

I use it with Archlinux, but it can work on many other operating systems as long as you have a recent version of Fontforge and Python (and python-pip for Arch).

Here is some more information about its use. If I understand correctly, by default, it says that it preserves ligatures. As you can read, it's quite an advanced tool.

See its help file in the attachment below.

From my very limited experience, it gave me good results for the two tests I did. One, with a huge Chinese ttf font, two with a standard ttf font. In both case, I have been able to use an Epubcheck compliant EPUB and to read both of them without any problem on my Koreader.

@Toxaris
Perusing through the text, with the finger extended, looking for italic or bold characters, is not an option for me. I had hoped there could exist some kind of automatic tool to do it. Anyway, if I compare the size obtained for the 120 characters of the regular font (58k) and I presume it would be the same for the italic font if I proceed from the same 120 characters, it still compares with advantage to the 166k and 129k I obtain by other means.
Attached Files
File Type: txt pyftsubset.txt (14.0 KB, 53 views)

Last edited by roger64; 10-22-2017 at 05:07 AM. Reason: attachment
roger64 is offline   Reply With Quote
Old 10-22-2017, 03:26 AM   #10
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,437
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
Quote:
Originally Posted by Turtle91 View Post
Just curious. Aren't ligatures dependent on the language being used? Do you have a database of all ligatures for all languages??
Some, but a lot of western languages uses the same ligatures. If you add the ligatures to the selection and they do not exist in the source, they of course will not be added. There is only a limited set of ligatures anyway, so it is not that big a list. Adding them will however help in preventing question marks in the text..

Quote:
Originally Posted by roger64 View Post
Hi

@Toxaris
Perusing through the text, with the finger extended, looking for italic or bold characters, is not an option for me. I had hoped there could exist some kind of automatic tool to do it. Anyway, if I compare the size obtained for the 120 characters of the regular font (58k) and I presume it would be the same for the italic font if I proceed from the same 120 characters, it still compares with advantage to the 166k and 129k I obtain by other means.
True, but I don't do it with the finger of course, but by analyzing the text. It is usually quite fast, depending on the size of the source. To be honest I am thinking about removing this part, as it usually does not save as much space as the usual suspects are almost always there.
Toxaris is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Trouble with fonts subset Bertrand Editor 2 12-05-2016 09:58 AM
Subset of a library Giuseppe Chillem Calibre Companion 1 08-14-2016 02:50 PM
Library subset not working on Dropbox dlfuller Calibre Companion 5 04-25-2016 06:20 PM
Testing a subset roger64 Editor 6 03-12-2014 01:22 AM
group an ARBITRARY subset of records RotAnal Library Management 6 10-09-2012 11:53 AM


All times are GMT -4. The time now is 07:59 PM.


MobileRead.com is a privately owned, operated and funded community.