View Single Post
Old 10-10-2017, 05:35 AM   #12
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by AlanHK View Post
I like to replace ellipses with spaced periods, since I don't like the usual ellipsis glyph, and it doesn't allow variations like . . . . or . . . ? or . . . !
I agree. These edge cases are why I also avoid using the ellipsis character.

It is also too common to run across fonts where the three-periods+spacing in the ellipsis looks vastly different than the single period:

.… (PERIOD + ELLIPSIS)
.... (FOUR PERIODS)

Arial Narrow
.…
....

Courier New
.…
....

Garamond
.…
....

Verdana
.…
....

Georgia
.…
....


Quote:
Originally Posted by AlanHK View Post
Also to space between nested quotemarks, which otherwise look like a triple mark ’” but with space ’ ”.
I see some books just use a normal space, but that allows a linewrap to occur, which should never be.
So I have been using  
Aside from being no-break, otherwise it acts the same as a normal space; and so it stretches or compresses when the text is justified, and that sometimes looks odd.

I just looked at a Random House epub that used thin spaces:  
Which looks better I think. However, is it treated as a no-break space, in all formats -- epub and Kindle?
  • Typographically, the correct space in this between inner/outer quotes would be a THIN SPACE (or more rarely, a HAIR SPACE).
  • Depending on the tools at hand, it might be better/easier to use a NO-BREAK SPACE. For maximum compatibility, this is the choice to go with.
  • Ultimately, that minor spacing issue would be something handled by kerning tables in the fonts themselves OR handled by the rendering software. So your source would say ’” and the renderer would pop out ’ ”.

Side Note: Things also get more complicated with language-/country-specific rules. For example, in French, they may use a NARROW NO-BREAK SPACE between opening/closing guillemets... but in Canadian French, a THIN SPACE. (See for example, LibreOffice's article explaining substituting in more compatible spaces, "Non Breaking Spaces Before Punctuation In French")

Quote:
Originally Posted by AlanHK View Post
While looking into this, I found this list of 17 Unicode space characters:
http://www.fileformat.info/info/unic...ry/Zs/list.htm

[...]

Are all these valid in ebooks?
Not really. The most supported whitespace would be SPACE + NO-BREAK SPACE. Anything outside of that will be in less fonts, and may be more prone to trouble (either getting the "missing font glyphs" or not rendering properly).

The next most common character would probably be the THIN SPACE, because that is officially used in a heck of a lot of languages (French). But again, may not render/display properly, so a NO-BREAK SPACE is a valid substitute.

The usage of the many of those other "fixed-width spaces" like the EN QUAD, EM QUAD, EN SPACE, TWO-EM QUAD, [...] were mostly used for backwards compatibility with Xerox's standard character encoding... these SHOULD NOT be used for manual spacing in modern documents.

Side Note: The only time these would be used in modern documents is in the VERY RARE case of Mathematics. See this fantastic post on the LaTeX Stack Exchange about using the proper spacing in Mathematics (also references the fantastic book, "Mathematics into Type").

Side Note #2: The fixed-width spaces were also measurements way back when things were manually typeset (think shoving metal boxes onto a rod). Putting them into documents now would be like manually typing pressing enter at the end of each line. It is POSSIBLE, but extremely unrecommended. :P Would probably cause a lot more harm than good.

Side Note #2.5: Hmmm... I would also be interested to test Text-to-Speech and see if these weird spaces might confuse it.

Quote:
Originally Posted by AlanHK View Post
And aside from nbsp, which are no-break?
These are considered No-Break:

NO-BREAK SPACE
NARROW NO-BREAK SPACE

See "Unicode Line Breaking Algorithm" (Unicode Standard Annex #14):

https://www.unicode.org/reports/tr14/

(For example, another non-breaking space is the FIGURE SPACE.)

If you take a look at Table 1, they give all the line-breaking categories + recommended rules. And breakdowns of each category.

But these are RECOMMENDATIONS, that isn't what the renderers WILL do. For example, if you take a look at my Post #48, I came up with 3 test cases that broke a THIN SPACE differently. I didn't test on ereaders specifically, but I did test on Word/LibreOffice/Notepad++, InDesign, Firefox/Chrome/IE. Some rendered it as non-breaking, others rendered it as breaking, and others added a break between punctuation, others did not. I bet ereaders are an even more giant mess when dealing with these rarer spaces.

Quote:
Originally Posted by AlanHK View Post
I assume that only the first two are elastic in size, is that correct?
Generally correct. To quote the "Unicode Line Breaking Algorithm" above:

Quote:
Originally Posted by AlanHK View Post
When expanding or compressing interword space according to common typographical practice, only the spaces marked by U+0020 SPACE and U+00A0 NO-BREAK SPACE are subject to compression, and only spaces marked by U+0020 SPACE, U+00A0 NO-BREAK SPACE, and occasionally spaces marked by U+2009 THIN SPACE are subject to expansion. All other space characters normally have fixed width. When expanding or compressing intercharacter space, the presence of U+200B ZERO WIDTH SPACE or U+2060 WORD JOINER is always ignored.
or a different part of the Unicode standard:

Quote:
The fixed-width space characters (U+2000..U+200A) are derived from conventional (hot lead) typography. Algorithmic kerning and justification in computerized typography do not use these characters. However, where they are used, as, for example, in typesetting mathematical formulae, their width is generally font-specified, and they typically do not expand during justification. The exception is U+2009 THIN SPACE, which sometimes gets adjusted.
... but you always have odd cases (like Monospaced fonts)... or fonts that don't have correct spaces... or cases where other layers above which may take priority over Unicode itself (like CSS or font kerning).

Last edited by Tex2002ans; 10-10-2017 at 06:20 AM.
Tex2002ans is offline   Reply With Quote