Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 06-28-2010, 12:56 AM   #1
eriĉjo
Junior Member
eriĉjo began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jun 2010
Device: Sony PRS 505
Looking for a tool to help strip fonts of uncessary characters

I am hand-converting a PDF book to ePub and have run into a problem. I don't want to use the PDF fonts because they are licensed in a proprietary format. So, I'm using as close as possible open source equivalents. One problem I am running into is that I am importing the entire font into the ePub file, making it much larger than it really needs to be. What I'd like to do is strip all unnecessary characters from the font so that it is as small as possible. I know how to do this with FontForge, but what I don't know how to do is determine which unicode characters, exactly, are used in a given work. The author likes to use various characters here and there beyond the normal Esperanto ones (most of the English alphabet, plus ĉĈĝĜĥĤĵĴŝŜŭŬ). I'm worried about missing various characters, I would have to examine the whole document by hand if I guessed. Is it possible to trick Acrobat Pro to do it for me (by converting it to PDF and then getting Acrobat to do it)? Are there scripts for detecting which characters are used in a Unicode text file? Any ideas for solving this problem are appreciated.
eriĉjo is offline   Reply With Quote
Old 06-28-2010, 05:18 AM   #2
JvdW
Zealot
JvdW doesn't litterJvdW doesn't litter
 
Posts: 115
Karma: 146
Join Date: Jul 2008
Location: Netherlands Veenendaal
Device: Palm T5, Sony PRS-505, Nook Color
Quote:
Originally Posted by eriĉjo View Post
I am hand-converting a PDF book to ePub and have run into a problem. I don't want to use the PDF fonts because they are licensed in a proprietary format. So, I'm using as close as possible open source equivalents. One problem I am running into is that I am importing the entire font into the ePub file, making it much larger than it really needs to be. What I'd like to do is strip all unnecessary characters from the font so that it is as small as possible. I know how to do this with FontForge, but what I don't know how to do is determine which unicode characters, exactly, are used in a given work. The author likes to use various characters here and there beyond the normal Esperanto ones (most of the English alphabet, plus ĉĈĝĜĥĤĵĴŝŜŭŬ). I'm worried about missing various characters, I would have to examine the whole document by hand if I guessed. Is it possible to trick Acrobat Pro to do it for me (by converting it to PDF and then getting Acrobat to do it)? Are there scripts for detecting which characters are used in a Unicode text file? Any ideas for solving this problem are appreciated.
The following link might help you get started:
http://scripts.sil.org/cms/scripts/p...CharacterCount

If you're handy with perl you might be able to use the counts to write a FontForge script to automatically generate the stripped font.

Regards,

Joop
JvdW is offline   Reply With Quote
 
Enthusiast
Old 06-28-2010, 08:44 PM   #3
eriĉjo
Junior Member
eriĉjo began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jun 2010
Device: Sony PRS 505
Yes, thank you, this script helped a lot. In fact a script to process the font automatically might be nice, but FontForge already allows you to select font glyphs by unicode number, so I've decided to do this one by hand. If I end up having to do this quite often, I'll create a script and share it.

Keeping a record for myself and for those who come after me doing the same thing:

Downloaded just about every epub reader for the Mac trying to "cut-n-paste" the entire book so that I could feed it into the above script. Only "Stanza" was able to adequately perform this function. ADE, Calibre, and FBreader were all completely unsuitable to the task. Various word processors also failed to perserve the Esperanto characters. The Mac Terminal with vi worked fine though.

Last edited by eriĉjo; 06-28-2010 at 09:15 PM.
eriĉjo is offline   Reply With Quote
Old 06-29-2010, 09:44 PM   #4
eriĉjo
Junior Member
eriĉjo began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jun 2010
Device: Sony PRS 505
It's done! For anyone who likes short stories in Esperanto which are readable on a Sony in the ePub format (no doubt there are millions of you out there), here you go:

http://timwestover.com/marvirinstrato/?page_id=7

And thank you JvdW for your help.

Last edited by eriĉjo; 06-30-2010 at 12:30 AM.
eriĉjo is offline   Reply With Quote
Old 06-30-2010, 03:40 AM   #5
JvdW
Zealot
JvdW doesn't litterJvdW doesn't litter
 
Posts: 115
Karma: 146
Join Date: Jul 2008
Location: Netherlands Veenendaal
Device: Palm T5, Sony PRS-505, Nook Color
Quote:
Originally Posted by eriĉjo View Post
It's done!

And thank you JvdW for your help.
You're welcome. Part of it goes to Google which helped me to find it. I have to admit I didn't come up with that link the first time around but being a bit creative about the search terms and reading between the lines got me there.

The epub looks nice in ADE but in Sigil (0.23) it looks like its using a bitmap font instead of the included truetypes.

Regards,

Joop
JvdW is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Neo Weird characters and fonts ivanm BeBook 0 07-21-2010 09:38 AM
Large fonts / bold fonts for Kindle DX International tandyjames Amazon Kindle 5 03-23-2010 06:53 AM
iLiad how to strip ipdf? harpum iRex Developer's Corner 2 06-24-2009 08:32 PM
Best tool to strip text out of PDF for LRF conversion? the7gerbers LRF 3 03-22-2009 07:27 PM
Special Characters / Fonts Gatton IMP 4 03-21-2008 01:43 AM


All times are GMT -4. The time now is 10:01 PM.


MobileRead.com is a privately owned, operated and funded community.