10-14-2019, 07:57 AM | #1 |
Wizard
Posts: 1,071
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
END OF GUARDED AREA character
Something weird
I had 'box' char in an ePub, which usually means a missing or a undisplayable character. The editor's char display says it's END OF GUARDED AREA. Google says that it's Unicode 97 (START is 96) Where did the 97's come from (I don't think it's the editor) and how can I get rid of them? |
10-14-2019, 09:38 AM | #2 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Do a search replace for \u0097
|
10-14-2019, 03:20 PM | #3 |
Wizard
Posts: 1,071
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
Thanks - did it the hard way by (finally) copying the invisible character and pasting it into Find. I'll use the right way the next time
Any idea where they came from and how they got into the ePub? I thought they might be from the html I loaded, but still no idea what they might have been used for Is that something that Smarten Puncuation should fix? |
10-14-2019, 04:11 PM | #4 | |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
http://blog.codekills.net/2011/01/22...ry-bad---151-/ it looks like someone wrongly thought it was an em dash (because some renderers display it that way). So do a search for that character like Kovid said, and then replace it with the actual EM DASH —. |
|
10-14-2019, 04:20 PM | #5 | |
Not Quite Dead
Posts: 194
Karma: 654170
Join Date: Jul 2015
Device: Paperwhite 4; Galaxy Tab
|
Quote:
|
|
10-14-2019, 05:13 PM | #6 |
Wizard
Posts: 1,071
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
Ah rats — I replaced it with space — I though it looked funny
Never though about an em-dash I can get an unmodified copy from my backups and do it the right way Thanks everyone |
10-14-2019, 10:32 PM | #7 | ||||||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
See one article on the topic, "When a Comma Isn't Enough": Quote:
Quote:
|
||||||
10-16-2019, 02:05 PM | #8 |
Wizard
Posts: 1,071
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
I don't remember the original format, most likely HTML since this was a PD book.
I might have 'misplaced' the em-dashs during an earlier edit session since they didn't show Agree: Now that I knew what I was dealing with, when I went to an archive b/u copy, and re-edited it, I made sure to put the em-dash outside the quote marks of the dialog that they were breaking I'm still curious to know what the character is actually used for Last edited by phossler; 10-16-2019 at 02:20 PM. |
10-16-2019, 02:24 PM | #9 | ||
Not Quite Dead
Posts: 194
Karma: 654170
Join Date: Jul 2015
Device: Paperwhite 4; Galaxy Tab
|
Quote:
From this page: https://mybookcave.com/authorpost/pu...otation-marks/ Quote:
Last edited by Brett Merkey; 10-16-2019 at 02:28 PM. |
||
10-16-2019, 02:38 PM | #10 | ||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
And you should always err on matching the original source when you can... it's not a good idea to make a broad style change like that for no reason. I also reference the Wikipedia page on Em Dashes quite often: https://en.wikipedia.org/wiki/Dash#Em_dash if you want to know about different usages. Quote:
I'll try to piece together some of the research I found on it. It seems like "Start of Protected Area" and "End of Protected Area" were from the ol' terminal days: https://en.wikipedia.org/wiki/C0_and_C1_control_codes which says were used in Block-Oriented Terminals: https://en.wikipedia.org/wiki/Block-...ented_terminal It was used while filling out forms and things like that, so that you couldn't go outside of the bounds (imagine limiting to a 16 character last name, trying to type a 17th character wouldn't let you + make a beep). It was also used so that certain blocks of text wouldn't go scrolling off the screen as the terminal moved down. A lot of those old control codes were included in Unicode for legacy reasons, but outside a small handful, the rest are barely used nowadays. Last edited by Tex2002ans; 10-16-2019 at 02:44 PM. |
||
10-16-2019, 03:03 PM | #11 | ||
Wizard
Posts: 1,071
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
Quote:
It would seem that the only em dash rule is that there is no em dash rule Quote:
It would seem that in THIS case ... Code:
<p>“In precisely” — he glanced down at the watch strapped English-style to the underside of his free wrist — “one and one-half minutes ........ ”</p> But I believe that the original had them inside. Since I edit just to please myself and make a book easier for me to read on the Kindle, I'll go with them outside for things like this Last edited by phossler; 10-16-2019 at 03:09 PM. Reason: Copy/paste messed up some quote chars |
||
10-16-2019, 03:06 PM | #12 |
Wizard
Posts: 1,071
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
@Tex2002ans -- thanks for the info. I'm going to assume that the EOPA was intended to be an em dash and not some weird old control character
|
10-16-2019, 04:43 PM | #13 |
null operator (he/him)
Posts: 20,568
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@phossler - I run Reports->Characters before I start editing an epub to check if it came with any spurious characters, and again when I finish editing to check I've not introduced any of my own.
An epub reader on a 3270, I'd like to see that BR Last edited by BetterRed; 10-16-2019 at 05:49 PM. |
10-16-2019, 05:27 PM | #14 |
Wizard
Posts: 1,071
Karma: 412718
Join Date: Jan 2009
Location: Valley Forge, PA, USA
Device: Kindle Paperwhite
|
@BR -- good tip. Would have saved a lot of head scratching.
That EOPA char was 0-width and invisible in the edit window. At least it showed as an empty square in Preview, or I never would have know I had a problem |
10-16-2019, 07:09 PM | #15 | ||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
In Calibre, it doesn't put it in Find, but it does jump you to the next spot the character occurs. Side Note: And I also take a quick look at that Report on every book for any strange characters. Usually things like soft hyphens stand out (if the thousands of red squigglies didn't give it away). One of the latest books I worked on accidentally had the Cyrillic letter С instead of the Latin capital C. Quote:
- Windows-1252 - ISO-8859-1 - Unicode While they're mostly the same... the obscure control code points just so happen to be where many differences lie. So when you make (wrong) assumptions about encoding: EN DASH (original) -> "Start of Protected Area" (Unicode) EM DASH (original) -> "End of Protected Area" (Unicode) Programs botch encoding along the way! https://stackoverflow.com/questions/...h-151-and-8212 https://stackoverflow.com/questions/...nicode-in-java https://unix.stackexchange.com/quest...cter-in-a-file https://stackoverflow.com/questions/...rea-characters Doesn't help that many browsers/renderers also decide to be helpful and assume you were a dunce... and display those characters instead of keeping them invisible (look at the "Browser" column): https://www.fileformat.info/info/uni...ement/list.htm So it can easily still LOOK like an EM DASH (U+2014), even though under the surface it's the END OF GUARDED AREA (U+0097). Last edited by Tex2002ans; 10-16-2019 at 07:19 PM. |
||
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Other Fiction Maxwell, William Babington: The Guarded Flame. v1. 30 Nov 2015 | crutledge | Kindle Books | 0 | 11-30-2015 06:11 PM |
Other Fiction Maxwell, William Babington: The Guarded Flame. v1. 30 Nov 2015 | crutledge | ePub Books | 0 | 11-30-2015 06:08 PM |
Other Fiction Maxwell, William Babington: The Guarded Flame. v1. 30 Nov 2015 | crutledge | BBeB/LRF Books | 0 | 11-30-2015 06:07 PM |
Are you a writer in the NY area? | CHunter_Author | General Discussions | 0 | 04-27-2011 08:06 PM |
Hello from Chicago Area | ericws | Introduce Yourself | 5 | 03-21-2011 08:31 PM |