Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 02-16-2013, 08:49 AM   #1
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
glyphIgo: font minimizer for EPUB ebooks

Hi,

I would like to receive feedback and comments on a funny/crazy/useful idea I am working on in my spare time these days.

For my business needs I developed a small, simple but extremely useful Python 2.7 script to do the following things:

Code:
$ python glyphIgo.py [ARGUMENTS]

Arguments:
 -h, --help          : print this usage message and exit
 -f, --font <font>   : font file, in TTF/OTF/WOFF format
 -g, --glyphs <list> : use this list of glyphs instead of opening a font file
 -e, --ebook <ebook> : ebook in EPUB format
 -p, --plain <ebook> : ebook file, in plain text UTF-8 format
 -m, --minimize      : retain only the glyphs of <font> that appear in <ebook>
 -o, --output <name> : use <name> for the font to be created
 -s, --sort          : sort output by character count instead of character codepoint
 -q, --quiet         : quiet output
 -v, --verbose       : verbose output of Unicode codepoints

Exit codes:

 0 = no error / no missing glyphs in the font file
 1 = invalid argument(s) error
 2 = missing glyphs in the font file to correctly display the given file/ebook
 4 = minimization/conversion failed

Examples:

 1. Print this usage message
    $ python glyphIgo.py -h

 2. Print the list of glyphs in font.ttf
    $ python glyphIgo.py -f font.ttf

 3. Print the list of glyphs in ebook.epub
    $ python glyphIgo.py -e ebook.epub

 4. Print the list of glyphs in page.xhtml
    $ python glyphIgo.py -p page.xhtml

 5. Check whether all the glyphs in ebook.epub are available in font.ttf
    $ python glyphIgo.py -f font.ttf -e ebook.epub

 6. As above, but use font_glyph_list.txt containing a list of decimal codepoints for the font glyphs
    $ python glyphIgo.py -g font_glyph_list.txt -e ebook.epub

 7. As in Example 5, but sort missing glyphs (if any) by character count (in ebook.epub) instead of by Unicode codepoint
    $ python glyphIgo.py -f font.ttf -e ebook.epub -s

 8. Create new.font.otf containing only the glyphs of font.ttf that also appear in ebook.epub
    $ python glyphIgo.py -m -f font.ttf -e ebook.epub -o new.font.otf

 9. Convert font.ttf (TTF) in font.otf (OTF)
    $ python glyphIgo.py -f font.ttf -o font.otf
I am thinking of creating a web site that, using it, exposes its functionality to a user. The idea is to know, before you start reading an eBook that might contain non-Latin glyphs (say, the "Upanishads" or the "Haft Peykar"), if your preferred font can handle all the Unicode chars contained in the eBook.

A "video" of the prototype is here: http://www.albertopettarin.it/glyphIgo/glyphIgo.mov

Currently, I am considering the idea of 1) releasing the Python script; 2) setting up a web site for offering the service for free --- but my sysadmin/PHP skills are minimal and I do not have funds for the server (I need PHP exec and the ability of running my python script => colo server).

I would like to hear your thoughs, especially if they are amazingly crazy or funny

Last edited by AlPe; 02-16-2013 at 08:52 AM. Reason: typo
AlPe is offline   Reply With Quote
Old 02-16-2013, 02:36 PM   #2
dgatwood
Curmudgeon
dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.dgatwood ought to be getting tired of karma fortunes by now.
 
dgatwood's Avatar
 
Posts: 629
Karma: 1623086
Join Date: Jan 2012
Device: iPad, iPhone, Nook Simple Touch
I'd love to see this evolve into a proper EPUB font validator, although that would require a slightly more involved approach to working with the HTML than just checking to see whether all the glyphs in a particular book are present in the font.

An ideal implementation would start with the parsed HTML document and CSS declarations, then apply the CSS to determine which styles apply to which words, then check each glyph to make sure it appears in the font that is actually being used to display it. In other words, look at the font family declaration that is active at that point in the DOM tree and check each font sequentially, skipping any font name that isn't embedded.

In addition to printing errors when a glyph would be missing, it should also keep two totals of missing glyphs per font—one in which it includes every error that could potentially occur and one in which it only includes errors that do not result from falling back from another embedded font—and should present those in a summary report at the end, along with a count of the number of unused glyphs in each font.

Oh, and it could also print errors if you specify a font in your CSS and provide it in the bundle but fail to provide a proper @font-face declaration.

Start here:

https://github.com/rennat/pynliner
dgatwood is offline   Reply With Quote
Advert
Old 02-18-2013, 02:28 PM   #3
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
Hi, thanks for your suggestion.

I think about this project more on the reader's side, than the eBook production side (which should not need such a validator... but we all know how things are done in this business...).

If I have time and resources, I will try to include the features you suggested.
AlPe is offline   Reply With Quote
Old 02-21-2013, 11:00 AM   #4
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
I released glyphIgo through Google Code:

http://code.google.com/p/glyphigo/

Comments are welcome.
AlPe is offline   Reply With Quote
Old 02-22-2013, 11:15 AM   #5
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
glyphIgo v. 1.13 released (bug fix, better verbose output), plus I wrote a better Wiki documentation.

http://code.google.com/p/glyphigo/
http://code.google.com/p/glyphigo/wiki/UsageExamples
http://code.google.com/p/glyphigo/downloads/list
AlPe is offline   Reply With Quote
Advert
Old 02-24-2013, 04:27 AM   #6
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
glyphIgo v. 1.14 released (better handling of X(HT)ML tags).

http://code.google.com/p/glyphigo/
http://code.google.com/p/glyphigo/wiki/UsageExamples
http://code.google.com/p/glyphigo/downloads/list
AlPe is offline   Reply With Quote
Old 02-24-2013, 04:44 AM   #7
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 3,095
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
Thanks AlPe
Turtle91 is offline   Reply With Quote
Old 03-16-2013, 04:18 PM   #8
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
You are welcome.

glyphIgo v. 1.16 released.

Now you can export the lists of Unicode characters as EPUB files, for a quick check on your eReader. See usage in the wiki page linked below for a longer explanation.

http://code.google.com/p/glyphigo/
http://code.google.com/p/glyphigo/wiki/UsageExamples
http://code.google.com/p/glyphigo/downloads/list
AlPe is offline   Reply With Quote
Old 03-23-2013, 11:59 AM   #9
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
glyphIgo v. 1.17 released.

Added a function to retrieve information about a given Unicode character. For example:
Code:
$ python glyphIgo.py -l a
[INFO] Lookup results for query 'a'
[INFO] Matched Unicode character 'a'
Name          LATIN SMALL LETTER A
Character     a
Dec Codepoint 97
Hex Codepoint 0x61
Lowercase     a
Uppercase     A
Category      Ll
Bidirectional L
Mirrored      False
NFC           a
NFD           a
[INFO] === === === === === ===

$ python glyphIgo.py -l "LATIN CAPITAL LETTER A WITH MACRON"
[INFO] Lookup results for query 'LATIN CAPITAL LETTER A WITH MACRON'
[INFO] Matched Unicode character 'Ā'
Name          LATIN CAPITAL LETTER A WITH MACRON
Character     Ā
Dec Codepoint 256
Hex Codepoint 0x100
Lowercase     ā
Uppercase     Ā
Category      Lu
Bidirectional L
Mirrored      False
NFC           Ā
NFD           Ā
[INFO] === === === === === ===

$ python glyphIgo.py -l d97
[INFO] Lookup results for query 'a'
[INFO] Matched Unicode character 'a'
Name          LATIN SMALL LETTER A
Character     a
Dec Codepoint 97
Hex Codepoint 0x61
Lowercase     a
Uppercase     A
Category      Ll
Bidirectional L
Mirrored      False
NFC           a
NFD           a
[INFO] === === === === === ===
http://code.google.com/p/glyphigo/
http://code.google.com/p/glyphigo/wiki/UsageExamples
http://code.google.com/p/glyphigo/downloads/list
AlPe is offline   Reply With Quote
Old 03-24-2013, 01:53 PM   #10
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
glyphIgo v. 1.18 released.

Added a function to count the number of characters (displayable, not counting XHTML tags) in an EPUB eBook:

Code:
$ python glyphIgo.py -e ebook.epub -c
[INFO] Reading characters appearing in 'ebook.epub'...
[INFO] Reading characters appearing in 'ebook.epub'... Done
[INFO] Number of characters appearing in 'ebook.epub'...
1310564
[INFO] Number of characters appearing in 'ebook.epub'... Done

$ python glyphIgo.py -e ebook.epub -c -q
1310564
Please observe that the counting is somewhat "approximate", as explained in the Technical Notes in the project web page:
http://code.google.com/p/glyphigo/
http://code.google.com/p/glyphigo/wiki/UsageExamples
http://code.google.com/p/glyphigo/downloads/list
AlPe is offline   Reply With Quote
Old 03-24-2013, 07:11 PM   #11
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,015
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Will this tool analyze an ePub and figure out how to subset based on the font's usage like Calibre does? Will it tell us what embedded fonts are not being used? Does it ignore the ePub code in subsetting and just deal with text?
JSWolf is offline   Reply With Quote
Old 03-24-2013, 07:34 PM   #12
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
What glyphIgo does is explained in the Technical Notes section, at https://code.google.com/p/glyphigo/

However, I will try to answer:
1) I do not know what Calibre does w.r.t. subsetting, so I cannot tell
2) no, the idea of glyphIgo is checking that an external font (like those shipped with eReaders) can display all the characters contained in a given EPUB (the assumption here is that there are no "embedded" fonts in the eBook).
3) yes, that is the default behavior, but you can consider the entire source code by invoking with the --preserve switch, which will turn off stripping away tags, hence (roughly) just retaining the displayed text.

Last edited by AlPe; 03-24-2013 at 07:39 PM. Reason: Spelling
AlPe is offline   Reply With Quote
Old 03-28-2013, 10:36 AM   #13
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,015
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
What Calibre does is check what each font is going to be displaying. So when it subsets a given font, it on;y subsets what that font is going to display. So for example, if you have ABCDEF the regular font will contain ABC and the bold font will contain DEF and the italic and bold italic will be removed because they have nothing to display.

Also, does glyphIgo handle ligatures? Some versions of ADE automatically use ligatures when you have the two characters together that make up the ligature such as fi or ff.
JSWolf is offline   Reply With Quote
Old 03-28-2013, 12:51 PM   #14
AlPe
Digital Amanuensis
AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.AlPe ought to be getting tired of karma fortunes by now.
 
AlPe's Avatar
 
Posts: 727
Karma: 1446357
Join Date: Dec 2011
Location: Turin, Italy
Device: Several eReaders and tablets
Oh I see.

No, glyphIgo does not perform such an in-depth analysis, because it is not conceived to do that. Even if it can be used to subset title fonts, its main goal consists in fast checking that a font (external to the eBook, for example a font shipped with an eReader) displays correctly the text --- irrespective to embedded fonts, font style, etc.

(The typical use case, as I wrote in the first post: you have an EPUB of the Haft Paykar. Are all those "strange Unicode characters" going to be properly rendered by your favorite font X or are they going to appear as those nasty empty rectangles? Run glyphIgo with -e ebook.epub and -f font.ttf and you will know.)

Also, glyphIgo does not handle ligatures, if you mean whether it is able to detect them, collapse them and use the appropriate Unicode symbol. However, if a ligature is already specified as a single Unicode character, it is managed properly (as a single Unicode character).
AlPe is offline   Reply With Quote
Old 03-30-2013, 09:09 PM   #15
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,015
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by AlPe View Post
Also, glyphIgo does not handle ligatures, if you mean whether it is able to detect them, collapse them and use the appropriate Unicode symbol. However, if a ligature is already specified as a single Unicode character, it is managed properly (as a single Unicode character).
Then glyphIgo needs to be fixed. ADE 2.0 and some versions between 1.7.2 and 2.0 do convert to ligatures and if glyphIgo says everything is good to go in the fonts and it's not, then there could be missing characters.
JSWolf is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
ebooks that won't let you change the font mr ploppy Amazon Kindle 4 10-12-2012 09:01 PM
Determine font and font size on incoming epub? peaceridge Calibre 4 01-30-2012 03:35 PM
Open EPUB or Adobe EPUB Library ebooks? jana_leigh14 Kobo Tablets 2 11-08-2011 01:07 AM
Font Too Small To Read on eBooks Bought Through KoboBooks Mysterio Kobo Reader 13 06-29-2010 05:25 PM
Preferred Font Size in eBooks ahi Workshop 52 05-24-2009 12:59 PM


All times are GMT -4. The time now is 12:47 PM.


MobileRead.com is a privately owned, operated and funded community.