01-23-2013, 07:59 PM | #16 |
Bookmaker & Cat Slave
Posts: 11,482
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Tox:
Stupid question: NEVER MIND. DUH. H Last edited by Hitch; 01-23-2013 at 08:00 PM. Reason: Really, really stupid. |
01-24-2013, 02:29 AM | #17 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
Now I am curious what the question was...
|
01-24-2013, 03:10 AM | #18 |
Bookmaker & Cat Slave
Posts: 11,482
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
IN short, it was: for those of us who won't know the final character set until after the ePUB is basically created, and have 20-30 xhtml files....is there an easier way to obtain all the text than copy-and-pasting each of the xhtml files into the box? And, is there any way that you can see to incorporate this with, say, ePUBtweak.exe, in that vein? So that Font Shrinker could scour the exploded files when you have ePUBtweak open, and obtain the character sets that way?
H. |
01-24-2013, 04:06 AM | #19 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
For now, the only method is copy/pasting. I know that is not always handy, but it was the easiest to do and I needed the program now. I see the added value in having it reading ePUB and/or XHTML, but that will be some work. The main problem would be in identifying only the required characters in a class.
I will take a look at ePUBtweak to see if I can use the output from it. It might be a good idea. |
01-24-2013, 04:16 AM | #20 |
Sigil developer
Posts: 1,274
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
|
The next version of Sigil will have a report listing all the characters visible in Book View. It's not by class though, so to limit it to sections of text would still require you to do some work. It may be it needs to be changed to use Code View - this might allow seeing what is in a class but it would be guessing what is actually visible in Book View (e.g. if a style hides the text using display:none or similar).
|
01-24-2013, 05:05 AM | #21 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
I have thought about this for a while. I think I will start working on the following in the weekend (depends on a lot of personal stuff...):
- ability to select an ePUB - parse XHTML to find all characters in use by a certain CSS class - open the used fonts in the ePUB and shrink it according to the used characters for that font - replace the fonts in the ePUB by the shrinked ones. Don't expect it to be ready soon though, it needs quite some testing and the most difficult part will probably be the parsing of the stylesheet to find the classes where a font is defined/used. It might be that an intermediate version will be created where the styles class names have to be entered manually. As as special service to JSWolf () I will automatically add the ligatures to the unique characters used. |
01-24-2013, 11:26 AM | #22 | |
Bookmaker & Cat Slave
Posts: 11,482
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
Quote:
To have it perfect, later, would be, as I said, amazing, but right this second, what I'd love is if it could just open the ePUB and say, "VOILA!" I don't even care if I have to manually replace the fonts, that's not a big deal. Not that I'd turn DOWN Shrinker with all the extra goodies...just thinking aloud about what I, personally, need most. I realize my needs are probably different than almost everyone else's. OH, also: a way to direct the location of the output of the created subsetted font would be super. While I'm wish-listing. And if I didn't say it loudly enough, before: seriously, you are fabulous. H |
|
01-24-2013, 12:31 PM | #23 |
temp. out of service
Posts: 2,797
Karma: 24285242
Join Date: May 2010
Location: Duisburg (DE)
Device: PB 623
|
I'm really enthusiastic that this particular idea of epub tweaking found that much positive resonance - really no joking here.
Tox: while you work on manipulation of the font files you should consider auto-renaming them: both filename & the font name stored inside the font file. AFAIR It's often required even in licences of free fonts when they are changed. While it's relatively meaningless for personal uses it's crucial as soon as your tool matures to become a part of the toolchain used by professional producers. (and aren't more optimized professional books a goal we all wish for?) |
01-24-2013, 12:47 PM | #24 |
Resident Curmudgeon
Posts: 75,862
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I did report the problem with ligatures and that has been fixed in Calibre.
|
01-24-2013, 12:49 PM | #25 | |
Resident Curmudgeon
Posts: 75,862
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
I've figured out what would work with the current version. Take the ePub, convert it to HTMLZ and run the HTML file through the subsetter and there you go. Last edited by JSWolf; 01-24-2013 at 12:54 PM. |
|
01-24-2013, 12:55 PM | #26 |
Sigil developer
Posts: 1,274
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
|
I'm not sure. Do you have an example epub (a link or just a small file is fine) that contains ligatures? The code literally just reports each unicode character that appears in the text (and if it has an entity name).
|
01-24-2013, 01:01 PM | #27 | |
Resident Curmudgeon
Posts: 75,862
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
Oh and would it be possible to display each character for a given font for embedded fonts? |
|
01-24-2013, 01:20 PM | #28 |
frumious Bandersnatch
Posts: 7,533
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
I'll try to explain. A text typically has no explicit ligatures, it could have some, but it should not, and I've only seen some very old text files with them. What a text has is just normal unicode characters, let's say a text consists of the single word "office", that's only 5 different letters: c, e, f, i, o.
Now, a font could have ligatures defined, and a reading software may use them (although many do not, I'm afraid). Let's say that the font we are dealing with has the ligatures "fi", "ffi" and "fj" defined. Defining a ligature means that the font has a glyph (a character shape) for the combination "fi" and some instructions saying that whenever there's an f and an i in the text, they should be rendered as the "fi" ligature and not as the separate characters (ditto for "ffi" and "fj"). OK, then our text will ideally be displayed as 4 glyphs: "o", "ffi", "c" "e". There are different things a font subsetter could do: 1) Remove everything but "o", "f", "i", "c", "e", including ligatures and their definition. This is not ideal, but it's probably the simplest. 2) Same as 1, but do not remove ligatures or their definition. That's much better, but it leaves unused glyphs, such as "fi" or "fj". 3) Detect ligatures, find out that "i" and "f" are never used alone, and remove everything but "o", "ffi", "c", "e". This is not a good idea, as renderers that do not support ligatures will not be able to display "f" and "i". 4) Remove all unused single characters, and related ligatures. This would remove "fj", since "j" is not in the source text, but leave "fi" since both "f" and "i" are, although the "fi" ligature is never used (because we have "ffi" already). I think this is the perfect combination of subsetting and not too demanding. 5) Remove some or all ligatures (the glyphs), but do not remove their definitions. This is not a good idea either, and I think this was the bug in Calibre. It means a renderer supporting ligatures would believe there is a ligature to use for "ffi", but it would't find it. So, if you can, go for #4. But things may be significantly harder. A font (particulary an OTF one) may contain other alternate shapes for glyphs (final forms, swash forms, older variants, small-caps, etc.), those are currently unused by practically all renderers, but there's still hope that some day we'll be able to enjoy some more advanced typesetting options... |
01-24-2013, 01:21 PM | #29 |
temp. out of service
Posts: 2,797
Karma: 24285242
Join Date: May 2010
Location: Duisburg (DE)
Device: PB 623
|
Just what jellby said.
I was slower and less detailed at it. Last edited by Freeshadow; 01-24-2013 at 01:23 PM. |
01-24-2013, 01:33 PM | #30 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
For now I will probably just add the few ligature glyphs. There aren't that many, so the impact on the size is limited. I should think about the smallcaps, but that one will be at the bottom on the list.
Let me first work on the list and look for pink bidets later... |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Working on way to subset fonts for ePub/KF3 | Freeshadow | Workshop | 51 | 04-22-2013 04:18 PM |
Embedded font-subset sometimes fails | GrannyGrump | Sigil | 3 | 10-20-2012 09:47 AM |
group an ARBITRARY subset of records | RotAnal | Library Management | 6 | 10-09-2012 11:53 AM |
Kindle 1 Font Mod Tool v0.1 | lovebeta | Kindle Developer's Corner | 20 | 04-16-2012 03:06 PM |
Is there a tool to see the contents of an embedded font file (ttf)? | James_Wilde | ePub | 4 | 09-06-2010 03:53 PM |