|
|
#1 | |
|
temp. out of service
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,981
Karma: 10577116
Join Date: May 2010
Location: Duisburg (DE)
Device: BeBook mini
|
Working on way to subset fonts for ePub/KF3
Quote:
http://www.mobileread.com/forums/sho...49#post2240749 the legal and practical need of such a script is currently discussed ibid.
__________________
|
|
|
|
|
|
|
#2 |
|
Mobile Reader Geek
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 34,226
Karma: 13801376
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Sony Reader PRS-650, iPad
|
What language does this script use? It is something that can easily by used with Windows 7? If so, can you please post it?
__________________
|
|
|
|
|
Enthusiast
|
|
|
|
#3 | |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 207
Karma: 64000
Join Date: May 2006
Location: Oslo, Norway
Device: Sony PRS-650
|
Quote:
|
|
|
|
|
|
|
#4 |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 207
Karma: 64000
Join Date: May 2006
Location: Oslo, Norway
Device: Sony PRS-650
|
@JSWolf: Seems it was Python, it works in Python 2.7. I've probably done horrible things to the Python language, but here it is, no guarantuees about anything:
Code:
import argparse, codecs
parser = argparse.ArgumentParser(description='''This script will accept utf-8 text files and write a list of unique characters to stdout or an output file''')
parser.add_argument("file", nargs='+',help="input (utf-8) file(s) for character counting")
parser.add_argument("-o", "--outfile", help="outputfile")
args = parser.parse_args()
disallowed = set('')
s=set()
for f in args.file:
s=s|set(char for line in codecs.open(f, encoding="UTF-8") for char in line
if char not in disallowed)
if args.outfile:
print 'Writing to file: '+args.outfile;
with codecs.open(args.outfile, "w", "utf-8") as f:
f.write(u''.join(s))
f.close
else:
print u''.join(s).encode('utf-8')
Use the output file option for Unicode files, as many glyhs won't show in a console. If you have questions about the code I can try to answer, but I heard there are some guys in the calibre forum who probably have a bit more experience with Python
|
|
|
|
|
|
#5 |
|
temp. out of service
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,981
Karma: 10577116
Join Date: May 2010
Location: Duisburg (DE)
Device: BeBook mini
|
Python means fontforge could be fed with it
just what I tought about.
__________________
|
|
|
|
|
|
#6 |
|
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,839
Karma: 23400772
Join Date: Jan 2010
Device: Kindle Fire HD, Kindle 2
|
It'd be nice to eliminate all characters from the script that occur inside html tags. Those wouldn't necessarily need to be a part of any embedded font since they won't be rendered.
__________________
“Politics: A strife of interests masquerading as a contest of principles. The conduct of public affairs for private advantage.” |
|
|
|
|
|
#7 | |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 207
Karma: 64000
Join Date: May 2006
Location: Oslo, Norway
Device: Sony PRS-650
|
Quote:
Since you might be interested only in special characters, you could just add a bunch of regular characters that you're not interested in to disallowed = set('') in line 6, ie Code:
disallowed = set('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
The script wasn't really intended for publication, so it's unfortunately pretty rough, and I don't really have enough experience to improve it. It works for my needs, though
|
|
|
|
|
|
|
#8 |
|
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,839
Karma: 23400772
Join Date: Jan 2010
Device: Kindle Fire HD, Kindle 2
|
Spoiler:
usage: uniquechars.py [-h] [-c CODEC] [-o OUTFILE] file [file ...] An attempt to modify so that only the text of an html document is parsed and also allow the input/output of other charset encodings. The default is utf-8 if not specified on the command-line. I got it to work with either utf-8 or windows-1252.
__________________
“Politics: A strife of interests masquerading as a contest of principles. The conduct of public affairs for private advantage.” |
|
|
|
|
|
#9 |
|
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,148
Karma: 2505637
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon
|
I would render the HTML in a browser, copy and paste in a text file, and extract the unique chars from there.
|
|
|
|
|
|
#10 | |
|
The Grand Mouse
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 24,345
Karma: 73595938
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle 2; iPhone 3G; Bookeen Opus; NOOK ST GlowLight
|
Quote:
Now we need something that can process a TTF or OTF file and create a sub-set of the font.
__________________
Kai Lung Raises His Voice, now available at Amazon and BooksOnBoard A new collection of ‘Kai Lung’ stories by Ernest Bramah, including four previously unpublished stories. Need professional help formatting your ebook? Send me email.....................Books I've read in 2013, 2012, 2011, 2010 |
|
|
|
|
|
|
#11 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,090
Karma: 2701341
Join Date: Dec 2010
Device: Kindle 3
|
BTW, Python-challenged ebook designers could simply compile an epub with KindlePreviewer/KindleGen and have a look at the detected Unicode ranges in the log file. For example, if you compile the book mentioned in roger64's post you'll see the following output:
Code:
Info(prcgen):I1045: Computing UNICODE ranges used in the book Info(prcgen):I1046: Found UNICODE range: Basic Latin [20..7E] Info(prcgen):I1046: Found UNICODE range: General Punctuation - Windows 1252 [2018..201A] Info(prcgen):I1046: Found UNICODE range: Latin-1 Supplement [A0..FF] Info(prcgen):I1046: Found UNICODE range: General Punctuation - other than Windows 1252 [2015..2017] Info(prcgen):I1046: Found UNICODE range: Latin Extended-A [100..17F] Info(prcgen):I1046: Found UNICODE range: Basic Greek [370..3FF] Info(prcgen):I1046: Found UNICODE range: Greek Extended [1F00..1FFF] |
|
|
|
|
|
#12 |
|
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,148
Karma: 2505637
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon
|
|
|
|
|
|
|
#13 |
|
The Grand Mouse
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 24,345
Karma: 73595938
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle 2; iPhone 3G; Bookeen Opus; NOOK ST GlowLight
|
I think that getting the ranges isn't fine grained enough. We're not wanting to check that our fonts cover the characters used, but to trim the fonts to cover only the characters used. Of course, making sure that all the needed characters are in the font will be part of this.
__________________
Kai Lung Raises His Voice, now available at Amazon and BooksOnBoard A new collection of ‘Kai Lung’ stories by Ernest Bramah, including four previously unpublished stories. Need professional help formatting your ebook? Send me email.....................Books I've read in 2013, 2012, 2011, 2010 |
|
|
|
|
|
#14 | |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 207
Karma: 64000
Join Date: May 2006
Location: Oslo, Norway
Device: Sony PRS-650
|
Quote:
![]() I suspect that most methods of subsetting would also give you a "free" coverage check in the bargain. |
|
|
|
|
|
|
#15 |
|
temp. out of service
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,981
Karma: 10577116
Join Date: May 2010
Location: Duisburg (DE)
Device: BeBook mini
|
Fontforge can AFAIR be controlled by python scripts.
__________________
|
|
|
|
![]() |
| Thread Tools | Search this Thread |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| group an ARBITRARY subset of records | RotAnal | Library Management | 6 | 10-09-2012 11:53 AM |
| Working with Fonts and Calibre | kiwidude | Development | 8 | 03-04-2011 07:49 PM |
| Fonts not working in a converted book | snape | Sony Reader | 9 | 11-08-2010 11:46 PM |
| Changing fonts not working? | tselling | Astak EZReader | 11 | 09-21-2009 03:03 PM |
| Why are some fonts not working?? | daviddem | HanLin eBook | 4 | 01-22-2009 09:14 AM |