MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   ePub (https://www.mobileread.com/forums/forumdisplay.php?f=179)
-   -   break/no-break and other spaces (https://www.mobileread.com/forums/showthread.php?t=291090)

AlanHK 10-09-2017 05:49 AM

break/no-break and other spaces
 
I like to replace ellipses with spaced periods, since I don't like the usual ellipsis glyph, and it doesn't allow variations like . . . . or . . . ? or . . . !
Also to space between nested quotemarks, which otherwise look like a triple mark ’” but with space ’ ”.


I see some books just use a normal space, but that allows a linewrap to occur, which should never be.
So I have been using  
Aside from being no-break, otherwise it acts the same as a normal space; and so it stretches or compresses when the text is justified, and that sometimes looks odd.

I just looked at a Random House epub that used thin spaces:  
Which looks better I think. However, is it treated as a no-break space, in all formats -- epub and Kindle?


While looking into this, I found this list of 17 Unicode space characters:
http://www.fileformat.info/info/unic...ry/Zs/list.htm

U+0020 SPACE
U+00A0 NO-BREAK SPACE
U+1680 OGHAM SPACE MARK
U+2000 EN QUAD
U+2001 EM QUAD
U+2002 EN SPACE
U+2003 EM SPACE
U+2004 THREE-PER-EM SPACE
U+2005 FOUR-PER-EM SPACE
U+2006 SIX-PER-EM SPACE
U+2007 FIGURE SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+200A HAIR SPACE
U+202F NARROW NO-BREAK SPACE
U+205F MEDIUM MATHEMATICAL SPACE
U+3000 IDEOGRAPHIC SPACE

Are all these valid in ebooks?
I assume that only the first two are elastic in size, is that correct?
And aside from nbsp, which are no-break?

Apologies if this is a FAQ, please link if there is such.

JSWolf 10-09-2017 08:27 AM

If you make a sample ePub, I'll test it with ADE 2.0.1.

Notjohn 10-09-2017 12:16 PM

I don't like the glyph, either, and to my eye an ebook doesn't look right with spacing. I use space / three dots / space for an interrupted sentence, while ending sentences with four dots (i.e., one full stop following by the three for the ellipsis.

For the print edition, I revert to the traditional spacing, while ensuring an ellipsis is never broken at the end of a line.

Be aware that under your plan you would have to use a normal space following a three-dot ellipsis, for fear of forcing hyphenation where you don't want it.

JSWolf 10-09-2017 12:44 PM

I prefer the ellipse character with no space. That solves the problem.

RbnJrg 10-09-2017 02:26 PM

Try this:

1. In your .css stylesheet:

Code:

.nowrap {
    text-indent: 0;
    display: inline-block;
}

2. In your .xhtml file:

Code:

<p>Nullam ut massa rutrum dolor placerat tempor accumsan eget <span class="nowrap">purus.&thinsp;.&thinsp;.</span></p>
As you can see, you must include the word with ellipsis inside the class with the sytle "nowrap". It works fine with ADE 2.x, 3.x and 4.x.

Regards
Rubén

AlanHK 10-09-2017 03:41 PM

Quote:

Originally Posted by Notjohn (Post 3591006)
Be aware that under your plan you would have to use a normal space following a three-dot ellipsis, for fear of forcing hyphenation where you don't want it.

Yes, that's what I do.

Quote:

Originally Posted by RbnJrg (Post 3591078)
Try this:

That would work, but at the cost of making the code more complex.

I guess if you think that's necessary then thinsp is otherwise a breaking space? I'll stay with nbsp if so.

BetterRed 10-09-2017 05:42 PM

Whitespace character - Wikipedia

BR

RbnJrg 10-09-2017 07:04 PM

Quote:

Originally Posted by AlanHK (Post 3591114)
Yes, that's what I do.


That would work, but at the cost of making the code more complex.

I guess if you think that's necessary then thinsp is otherwise a breaking space?

Yes, thinsp is a breaking space. But if you work in Sigil, then you can create a clip for the ellipsis (with thinsp periods) and the the word to be enclosed together. Then you could apply the code in a blink (just a click of the mouse).

BetterRed 10-09-2017 07:26 PM

Quote:

Originally Posted by RbnJrg (Post 3591200)
Yes, thinsp is a breaking space. But if you work in Sigil, then you can create a clip for the ellipsis (with thinsp periods) and the the word to be enclosed together. Then you could apply the code in a blink (just a click of the mouse).

Curious - could you use Figure space or Non breaking thin space between the dots. I've used the former on things like telephone number or part numbers - in blog posts

BR

RbnJrg 10-09-2017 08:54 PM

Quote:

Originally Posted by BetterRed (Post 3591205)
Curious - could you use Figure space or Non breaking thin space between the dots. I've used the former on things like telephone number or part numbers - in blog posts

BR

Even with non breaking thin space between the dots, you need to enclose the dots with the preceding word (by applying the respective style) to avoid things like

Code:

some words here
...

With the style "display: inline-block;" you would get

Code:

some words
here...

Regards
Rubén

BetterRed 10-10-2017 01:35 AM

Quote:

Originally Posted by RbnJrg (Post 3591244)
Even with non breaking thin space between the dots, you need to enclose the dots with the preceding word (by applying the respective style) to avoid things like

Code:

some words here
...

With the style "display: inline-block;" you would get

Code:

some words
here...

Regards
Rubén

gotcha - ta

BR

Tex2002ans 10-10-2017 06:35 AM

Quote:

Originally Posted by AlanHK (Post 3590855)
I like to replace ellipses with spaced periods, since I don't like the usual ellipsis glyph, and it doesn't allow variations like . . . . or . . . ? or . . . !

I agree. These edge cases are why I also avoid using the ellipsis character.

It is also too common to run across fonts where the three-periods+spacing in the ellipsis looks vastly different than the single period:

.… (PERIOD + ELLIPSIS)
.... (FOUR PERIODS)

Arial Narrow
.…
....

Courier New
.…
....

Garamond
.…
....

Verdana
.…
....

Georgia
.…
....


Quote:

Originally Posted by AlanHK (Post 3590855)
Also to space between nested quotemarks, which otherwise look like a triple mark ’” but with space ’ ”.
I see some books just use a normal space, but that allows a linewrap to occur, which should never be.
So I have been using &nbsp;
Aside from being no-break, otherwise it acts the same as a normal space; and so it stretches or compresses when the text is justified, and that sometimes looks odd.

I just looked at a Random House epub that used thin spaces: &thinsp;
Which looks better I think. However, is it treated as a no-break space, in all formats -- epub and Kindle?

  • Typographically, the correct space in this between inner/outer quotes would be a THIN SPACE (or more rarely, a HAIR SPACE).
  • Depending on the tools at hand, it might be better/easier to use a NO-BREAK SPACE. For maximum compatibility, this is the choice to go with.
  • Ultimately, that minor spacing issue would be something handled by kerning tables in the fonts themselves OR handled by the rendering software. So your source would say ’” and the renderer would pop out ’ ”.

Side Note: Things also get more complicated with language-/country-specific rules. For example, in French, they may use a NARROW NO-BREAK SPACE between opening/closing guillemets... but in Canadian French, a THIN SPACE. (See for example, LibreOffice's article explaining substituting in more compatible spaces, "Non Breaking Spaces Before Punctuation In French")

Quote:

Originally Posted by AlanHK (Post 3590855)
While looking into this, I found this list of 17 Unicode space characters:
http://www.fileformat.info/info/unic...ry/Zs/list.htm

[...]

Are all these valid in ebooks?

Not really. The most supported whitespace would be SPACE + NO-BREAK SPACE. Anything outside of that will be in less fonts, and may be more prone to trouble (either getting the "missing font glyphs" or not rendering properly).

The next most common character would probably be the THIN SPACE, because that is officially used in a heck of a lot of languages (French). But again, may not render/display properly, so a NO-BREAK SPACE is a valid substitute.

The usage of the many of those other "fixed-width spaces" like the EN QUAD, EM QUAD, EN SPACE, TWO-EM QUAD, [...] were mostly used for backwards compatibility with Xerox's standard character encoding... these SHOULD NOT be used for manual spacing in modern documents.

Side Note: The only time these would be used in modern documents is in the VERY RARE case of Mathematics. See this fantastic post on the LaTeX Stack Exchange about using the proper spacing in Mathematics (also references the fantastic book, "Mathematics into Type").

Side Note #2: The fixed-width spaces were also measurements way back when things were manually typeset (think shoving metal boxes onto a rod). Putting them into documents now would be like manually typing pressing enter at the end of each line. It is POSSIBLE, but extremely unrecommended. :P Would probably cause a lot more harm than good.

Side Note #2.5: Hmmm... I would also be interested to test Text-to-Speech and see if these weird spaces might confuse it.

Quote:

Originally Posted by AlanHK (Post 3590855)
And aside from nbsp, which are no-break?

These are considered No-Break:

NO-BREAK SPACE
NARROW NO-BREAK SPACE

See "Unicode Line Breaking Algorithm" (Unicode Standard Annex #14):

https://www.unicode.org/reports/tr14/

(For example, another non-breaking space is the FIGURE SPACE.)

If you take a look at Table 1, they give all the line-breaking categories + recommended rules. And breakdowns of each category.

But these are RECOMMENDATIONS, that isn't what the renderers WILL do. For example, if you take a look at my Post #48, I came up with 3 test cases that broke a THIN SPACE differently. I didn't test on ereaders specifically, but I did test on Word/LibreOffice/Notepad++, InDesign, Firefox/Chrome/IE. Some rendered it as non-breaking, others rendered it as breaking, and others added a break between punctuation, others did not. I bet ereaders are an even more giant mess when dealing with these rarer spaces.

Quote:

Originally Posted by AlanHK (Post 3590855)
I assume that only the first two are elastic in size, is that correct?

Generally correct. To quote the "Unicode Line Breaking Algorithm" above:

Quote:

Originally Posted by AlanHK (Post 3590855)
When expanding or compressing interword space according to common typographical practice, only the spaces marked by U+0020 SPACE and U+00A0 NO-BREAK SPACE are subject to compression, and only spaces marked by U+0020 SPACE, U+00A0 NO-BREAK SPACE, and occasionally spaces marked by U+2009 THIN SPACE are subject to expansion. All other space characters normally have fixed width. When expanding or compressing intercharacter space, the presence of U+200B ZERO WIDTH SPACE or U+2060 WORD JOINER is always ignored.

or a different part of the Unicode standard:

Quote:

The fixed-width space characters (U+2000..U+200A) are derived from conventional (hot lead) typography. Algorithmic kerning and justification in computerized typography do not use these characters. However, where they are used, as, for example, in typesetting mathematical formulae, their width is generally font-specified, and they typically do not expand during justification. The exception is U+2009 THIN SPACE, which sometimes gets adjusted.
... but you always have odd cases (like Monospaced fonts)... or fonts that don't have correct spaces... or cases where other layers above which may take priority over Unicode itself (like CSS or font kerning).


All times are GMT -4. The time now is 10:16 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.