Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 03-14-2012, 05:00 AM   #1
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Unique characters used

In another thread about including fonts, of course fontsize came up. Usually fonts are big so they will add to the ePUB size.
However, it is possible to reduce a font to only the characters you use (headers, notes, foreign). This helps to reduce the size tremendously.
Normal methods are via either FontSquirrel or FontForge. Usually the biggest problem is, how to determine which characters you actually need.

Therefore I created a Word macro to get the unique characters in a document. It can be for all fonts or for a specific font. With that list the font can be reduced.

Please take care, some (most?) fonts may not be distributed. Since with this you can make a subset, it might be allowed. To be safe, only use it on free fonts.

* Update. Small error
Attached Files
File Type: zip UniqueCharacters 1.1.zip (1.4 KB, 210 views)

Last edited by Toxaris; 03-14-2012 at 04:14 PM.
Toxaris is offline   Reply With Quote
Old 03-14-2012, 05:27 AM   #2
frostschutz
Linux User
frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.
 
frostschutz's Avatar
 
Posts: 2,279
Karma: 6123806
Join Date: Sep 2010
Location: Heidelberg, Germany
Device: none
Well if the subset is complete enough to cover an entire book, chances are it won't be allowed still.

I wasn't too successful with FontForge when it came to removing characters in a font - somehow what fontforge saved was several times the size of the original font.
frostschutz is offline   Reply With Quote
Advert
Old 03-14-2012, 07:57 AM   #3
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
As long as you actually delete the glyphs and not clear them, it should definitely be smaller. You can also use FontSquirrel and input the glyphs you need.
Toxaris is offline   Reply With Quote
Old 03-14-2012, 10:44 AM   #4
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,514
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
I guess you can create a single XHTML document, apply "display:none" to everything except the parts that use the font you want", open the file in a browser, copy the text somewhere else and identify the unique characters there.
Jellby is offline   Reply With Quote
Old 03-14-2012, 12:41 PM   #5
Trouhel
Enthusiast
Trouhel began at the beginning.
 
Posts: 25
Karma: 10
Join Date: Oct 2011
Device: none
@frostschutz :

Don't know if this may help, but here is how I subset fonts with fontforge, using a script which:

1) open the font with Open()
2) unselect all characters with SelectNone()
3) select all needed characters with SelectMore()
4) invert the selection with SelectInvert()
5) delete the selected characters with Clear()
6) create a new font with Generate()

If some needed characters are references, you need to selected both the characters and the referenced characters.

as an example: a 1252 characters, 138KB truetype fonts subsetted to 145 characters then weights 32KB
Trouhel is offline   Reply With Quote
Advert
Old 03-14-2012, 04:05 PM   #6
SBT
Fanatic
SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.
 
SBT's Avatar
 
Posts: 580
Karma: 810184
Join Date: Sep 2010
Location: Norway
Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad
Here's how you can find the characters used in a xhtml file (tags are excluded) in a unix bash shell:
Code:
cat file.xhtml|sed -e 's/<[^>]\+>//g' -e 's/./&\n/g'  |sort -u |tr "\n" " "
If you want to just find the characters in headers, you can try:
Code:
grep "<h[1-4]" OEBPS/vol1/12.xhtml|sed -e 's/<[^>]\+>//g' -e 's/./&\n/g'  |sort -u |tr "\n" " "
SBT is offline   Reply With Quote
Old 03-14-2012, 04:15 PM   #7
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
That will work, but usually I want to know it for a specific font and since Word is already in my process...

BTW, the macro is updated. Stupid VBA is not always case-sensitive.
Toxaris is offline   Reply With Quote
Old 03-15-2012, 04:01 PM   #8
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,514
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Those one-liners don't decode entities, I'm afraid (although they can be converted beforehand with recode).
Jellby is offline   Reply With Quote
Old 03-16-2012, 03:50 AM   #9
SBT
Fanatic
SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.
 
SBT's Avatar
 
Posts: 580
Karma: 810184
Join Date: Sep 2010
Location: Norway
Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad
@Jellby: *Sigh* isn't there just always something... thanks for pointing it out. I try again:
Code:
cat file.xhtml|xml2asc|sed -e 's/<[^>]\+>//g' -e 's/./&\n/g'  |sort -u |tr "\n" " "
How many lines do you need to do this properly, I wonder; find which tags use special fonts, extract their content etc.?
SBT is offline   Reply With Quote
Old 03-16-2012, 12:36 PM   #10
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,514
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Well, xml2asc in my box doesn't seem to do what you apparently use it for:

Reads an UTF-8 encoded text from standard input and writes to standard output, converting all non-ASCII characters to &#nnn; entities, so that the result is ASCII-encoded.

Also, consider what to do with an input such as:

Code:
Let a &lt; &epsilon; &gt; b
Do you get <, ε, and > in the character list?
Jellby is offline   Reply With Quote
Old 03-16-2012, 05:28 PM   #11
SBT
Fanatic
SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.
 
SBT's Avatar
 
Posts: 580
Karma: 810184
Join Date: Sep 2010
Location: Norway
Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad
@Jellby: Right.... Not exactly a one liner anymore, and the script has all the clarity of white noise. Couldn't be bothered to double-check @font-face in css, and tags with a font inside another tag with a font is not handled correctly (the outer tag will include characters from the inner tag). Typically used as :
./script.sh <xhtml-file> <css-file>
If the css is inline, it's just
./script.sh <xhtml-file>
Tries to detect extra fonts in the css-file, which classes/ids which use them, and lists which characters are used for which fonts.
Code:
#!/bin/bash
file=$1
css=${!#}
xmlns=$(grep -o "xmlns=.[^\"']\+" ${file}|cut -c8-180)
# remove comments
awk 'BEGIN{RS="\(<.\-\-\|\-\->\)"} {if ((NR % 2)==1) print;}' $file > tmp
#replace html entities
for x in $(sed 's/&[a-zA-Z0-9]\+;/&\n/g' tmp|grep -o "&[a-zA-Z0-9]\+;")
do sed -i "s/${x}/$(echo $x|recode HTML..UTF-8)/g" tmp
done
# extract inline css
(if [[ $(grep -Fxq '</style' $css) ]]
 then sed -n /<style/,/<\/style/p $css
 else cat $css
fi)|\
tr "\n" " " |\
sed 's/[>}]/&\n/g' |\
grep -v "@font-face" |\
sed -n "/font-family: *[\"']/{s/^ *\(.*\) *{.*font-family: *[\"']\([^\"']\+\).*/\1 \2/;p}" |\
sed -e '/^\./s/^/\*#/' -e 's#\(.*\)\.\([^ \]\+\)#a:\1[@class="\2"]#' -e 's/.*#\([^ ]*\)/*[@id="\1"]/'|\
while read line
do
echo
echo  "${line#* }: "
 echo -e "setns a=${xmlns}\ncat //${line%% *}//text()" |\
xmllint --shell tmp |\
sed -e 1d -e '/^\(\/ >\| -\{7\}$\)/d' -e 's/./&\n/g' |\
sort -u |\
sed '/[ \t]/d' |\
sed -n 'H;${x;s/\n//g;p}'
done

Last edited by SBT; 03-16-2012 at 06:12 PM.
SBT is offline   Reply With Quote
Old 03-16-2012, 05:47 PM   #12
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
If I look at the scripts, I rather use my Word macro...
Toxaris is offline   Reply With Quote
Old 03-16-2012, 06:17 PM   #13
SBT
Fanatic
SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.
 
SBT's Avatar
 
Posts: 580
Karma: 810184
Join Date: Sep 2010
Location: Norway
Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad
I have to admit it it perhaps not the most aesthetically pleasing program to rest one's weary eyes upon... However, it might come in handy when refurbishing existing epubs, since it operates on xhtml files.
SBT is offline   Reply With Quote
Old 03-18-2012, 03:52 AM   #14
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Hmm, someone mentioned smallcaps. It might be an idea to also make it possible for smallcaps. I think I will enhance it further later this week.
Toxaris is offline   Reply With Quote
Old 03-18-2012, 06:02 AM   #15
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,514
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
The problem with smallcaps is you'd like them to match the normal text. If you embed a font for smallcaps and not for the rest, you'll create the possibility of having, for instance, a sans-serif text with serif smallcaps (ugh).
Jellby is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Is Amazon unique jbcohen Reading Recommendations 2 05-11-2011 10:36 AM
Your Most Unique Bookmark? distant.star Lounge 12 09-12-2010 12:52 AM
Accessories Unique leather cases? 123YayKindle Amazon Kindle 1 08-17-2010 11:56 AM
hello, all the best for this unique community FiGi Introduce Yourself 4 04-03-2009 12:25 PM
Is this a unique idea for bookmarks? Dr. Drib Sony Reader Dev Corner 2 09-10-2008 04:40 PM


All times are GMT -4. The time now is 11:06 AM.


MobileRead.com is a privately owned, operated and funded community.