10-17-2009, 08:31 AM | #1 |
eBook FANatic
Posts: 18,301
Karma: 16071131
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
Encoding of Emdash
I have been running into a series of PG books encoded in charset=iso-8859-1. The emdash is encoded as #8212 followed by a soft hyphen #173. I go into the PG file in the editor and replace these with #151.
I would like to add this to the Book Cleaner. The emdash seems to be handled by 2.bcf showing: find what: uni(137) replace with: uni(151) I must be mis-interpreting something because I cannot reference uni(137) with the endash. Would someone point me in the right direction? Charlie |
10-17-2009, 08:48 AM | #2 | |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Quote:
By default, BD converts dashes into hyphens, so this is a "workaround" to stop it from doing so. If you look at "1", you'll see that it replaces #151 with #137, and then "2" replaces #137 with #151 again. The affect of this is to make BD "preserve" dashes. What you need to do is to edit "1.bcf" and tell it to replace your character sequence with #137. |
|
Advert | |
|
10-17-2009, 11:47 AM | #3 |
eBook FANatic
Posts: 18,301
Karma: 16071131
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
Harry,
Is this correct? find: uni(8212) replace with uni(137) Charlie |
10-17-2009, 11:53 AM | #4 |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Sounds right to me, Charlie, but the best way to find out is to try it!
|
10-17-2009, 03:31 PM | #5 |
eBook FANatic
Posts: 18,301
Karma: 16071131
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
It works!
Thanks, Harry. Charlie |
Advert | |
|
10-17-2009, 05:41 PM | #6 |
Resident Curmudgeon
Posts: 75,838
Karma: 134367616
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
|
10-18-2009, 11:24 AM | #7 |
eBook FANatic
Posts: 18,301
Karma: 16071131
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
Jon,
I don't see how it could hurt. I don't know how many folks have run into it. I guess it could be the way I process the PG files. I download the complete HTM and the text file. I use the Gutenberg Prettifier to convert the text file to HTML. This gets rid of page numbers and other annoying junk. I then load the HTML file to BD and then place the BD display along side the original complete HTM and move down page by page to format the BD file. The HTM file shows charset=windows-1252. This should work fine with the BCF as it stands. The HTML files specifies no charset It seems to be Gutenberg Prettifier inserting the different codes.I guess only those using the Prettifier will run into this. As to why the prettifier is suddenly producing these codes I have yet to determine. I'm not sure it is worth your time and effort in making the change and then distributing the results. Charlie |
10-19-2009, 09:43 AM | #8 |
eBook FANatic
Posts: 18,301
Karma: 16071131
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
Jon,
Somewhere I have totally screwed-up. The attempt to use the BCF to catch the 8212 really doesn't work! I don't know how, but I did. I made the following change: ENTRY IN 1,BCF find what: uni(8212) replace by: uni(137) I built the fillowing test file: HTML TEST FILE Code:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <head> <title></title> <meta http-equiv="content-type" content="text/html;charset=us-ascii" /> </head> <body> <p>Test of emdash code 151: —</p> <br/> <br/> <p>Test of emdash code 8212: —</p> </body> </html> IE displays the following" Test of emdash code 151: — Test of emdash code 8212: — BOOK DESIGNER displays: Test of emdash code 151: — Test of emdash code 8212: - This file was created with BookDesigner program bookdesigner@the-ebook.org 10/19/2009 I am obviously into something I don't understand. Comments please. Charlie |
10-19-2009, 10:07 AM | #9 |
Resident Curmudgeon
Posts: 75,838
Karma: 134367616
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
On my screen (with Firefox), both encodings look the same. Do you have a file you know is a properly encoded? Can you zip it if you do and attach it here?
|
10-19-2009, 10:30 AM | #10 |
eBook FANatic
Posts: 18,301
Karma: 16071131
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
Couldn't upload as html file. Change .txt to .html.
Charlie |
10-27-2009, 08:31 PM | #11 |
eBook FANatic
Posts: 18,301
Karma: 16071131
Join Date: Apr 2008
Location: Alabama, USA
Device: HP ipac RX5915 Wife's Kindle
|
Jon,
As the old folks used to say "Even a blind hog will find an acorn sometimes." I have continued to play wirh the BCF file and think I have it working. I have tested it on several files with no problems. After looking at the BCF file every way I could, including HEX format. I finally inserted a row as the third row of the table instead of adding uni(8212) as the last row. What will be interesting will be to see what happens on the next one. Perhaps it did work at the beginning and somehow I screwed up. Anyway, give it a try if you like. I'll let you know what happens. Charlie |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Encoding | prusaks | Recipes | 0 | 09-27-2010 06:25 AM |
Sigil, UTF-8 and the emdash | crutledge | Sigil | 5 | 06-30-2010 12:35 PM |
how to add encoding? | nsg | Calibre | 5 | 02-25-2009 09:51 PM |
Emdash - punctuation macro | ProDigit | Sony Reader | 8 | 11-28-2008 02:32 AM |
More emdash woes | Patricia | Sony Reader | 10 | 07-06-2007 04:32 PM |