MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   [old thread] non breaking spaces (* and  ) automatically removed (https://www.mobileread.com/forums/showthread.php?t=206482)

artoros 02-22-2013 10:44 AM

[old thread] non breaking spaces (* and  ) automatically removed
 
Hello,
I have a really big problem with non breaking spaces, that are not written as " " but as "& #160;" or " ", what - as far as I understand it - is just the decimal or hexadecimal way of writing a non breaking space.

In the 0.5x.0 versions Sigil automatically replaced these with " " what was ok, since it was the same thing.

But in the 0.6 and now the 0.7. version Sigil just replaces them with "normal" spaces. When you use non breaking spaces for your layout, this is a big problem, since in HTML more than space is just treated as one single space.

Is there a way to disable this behaviour? I have to work with EPUBs that were created not by me, EPUB files, that have these non breaking spaces in it.

I tried to uncheck the checkboxes in the preferences "Automatically clean and format html source", but that does not work.

Sigil does that replacement when I open the EPUB, so there is no way of getting around it, by replacing the spaces myself with a search/replace or something like that.

Does anyone have an idea, because that bug (or feature?) makes it impossible for me to use newer Sigil versions and I have to use the old 0.5.3 version :-(

Thanks and bye
Artoros

meme 02-22-2013 12:44 PM

I can see the issue. * is converted to a space instead of being left as * or converted to  . This is primarily an issue with nbsp since its special in that Book View always changes the actual nbsp character to a space - so we have code that converts the nbsp character to the   entity. Need to check if we need to do the same for * or handle in a different way.

BobC 02-27-2013 05:46 AM

On a related note I am getting errors with non-breaking spaces when validating books - the error reported is
Code:

entity 'nbsp' not found
.

When I examine the html the error points to various entries such as :
Code:

‘But . . . I don’t—’
.

Which looks valid HTML to me.

While I can get rid of the "errors" by cleaning the file I can't see why it needs cleaning.

BobC

meme 02-27-2013 03:57 PM

This usually indicates a problem in your header's document type. The type is defined for something that doesn't know how to deal with an & nbsp ; entity. Just compare the header from the uncleaned and the cleaned versions to see the difference.

sjkramer 08-28-2014 07:58 PM

Is there something that I can "switch off" to prevent my nonbreaking spaces from being turned into regular old spaces?

DiapDealer 08-28-2014 10:59 PM

Quote:

Originally Posted by sjkramer (Post 2908737)
Is there something that I can "switch off" to prevent my nonbreaking spaces from being turned into regular old spaces?

The character or the entity?

The character will always be converted to an entity. It can't survive the Qt Widget being used (it will be changed to a normal space), so Sigil converts it to an entity first (when opening the epub) so that it can still fulfill its non-breaking purpose. The entity will usually stay an entity as long as you forego any editing whatsoever in Book View. Look at Book View ... Edit in Code View. And in the Clean Source preferences, make sure "Pretty Print" is selected instead of HTML Tidy.

If you can build Sigil from source, there's been a few new patches accepted -- one of which provides a "Preserve Entities" feature that will help make sure non-breaking space entities don't get zapped when editing in book view.

Notjohn 08-29-2014 07:15 AM

Quote:

Originally Posted by BobC (Post 2438561)
‘But . . . I don’t—’

IMHO you'd be better off shortening your ellipses to ... as at least some of the Big Five publishers are doing in their digital editions, while retaining the traditional mode in print.

I know how irritating it is when somebody answers a question by replying to a different question, but I went through this issue years ago when the Kindle was first introduced (the ellipses breaking at the end of a line), until I decided that on the digital "page" the space looked rather silly.

eschwartz 08-29-2014 06:09 PM

Quote:

Originally Posted by Notjohn (Post 2909104)
IMHO you'd be better off shortening your ellipses to ... as at least some of the Big Five publishers are doing in their digital editions, while retaining the traditional mode in print.

I know how irritating it is when somebody answers a question by replying to a different question, but I went through this issue years ago when the Kindle was first introduced (the ellipses breaking at the end of a line), until I decided that on the digital "page" the space looked rather silly.

…Or use an actual ellipsis… (Like I just did.)

BetterRed 08-29-2014 10:52 PM

Quote:

Originally Posted by Notjohn (Post 2909104)
IMHO you'd be better off shortening your ellipses to ... as at least some of the Big Five publishers are doing in their digital editions, while retaining the traditional mode in print.

Ay, there's the rub, IMO what works well on paper - "blah blah blah . . .! More blah blah." - doesn't always work so well on digital media, especially if it spans two lines as in:

blah blah blah . .
.! More blah blah

On paper I prefer single curly quotes for dialogue, on digital I prefer double curly quotes.

Maybe ebooks could be user configurable ;)

BR

theducks 08-29-2014 10:56 PM

Quote:

Originally Posted by BetterRed (Post 2909938)
Ay, there's the rub, IMO what works well on paper - "blah blah blah . . .! More blah blah." - doesn't always work so well on digital media, especially if it spans two lines as in:

blah blah blah . .
.! More blah blah

On paper I prefer single curly quotes for dialogue, on digital I prefer double curly quotes.

Maybe ebooks could be user configurable ;)

BR

& hellip; is single char (and is available on the omega icon tool) no more break worries

Tex2002ans 08-30-2014 01:02 AM

Quote:

Originally Posted by theducks (Post 2909939)
& hellip; is single char (and is available on the omega icon tool) no more break worries

Not necessarily. There is the situation that can occur like this:

Quote:

"etc., etc.
…"
There are also many books that use "four dot ellipsis" or some older books used even more. The ellipsis character does not work well in that situation.

Some fonts also have oddities with the ellipsis character, in which the "dots" don't match your typical period, or do not have similar kerning to the default period, making it look quite odd when they are near eachother.

I had a large post typed up covering my personal annoyances with ellipses in EPUBs, but then scrapped it.

Here are some more resources on the topic:

https://english.stackexchange.com/qu...s-for-ellipses
http://www.thebookdesigner.com/2013/...dobe-indesign/
https://tex.stackexchange.com/questi...xetex-document

Different Style Guides and different languages also have different rules.

Ultimately, ellipses are a a huge pain in the bottom, and these "Smarten Punctuation" algorithms completely mangle them.

Notjohn 08-30-2014 11:56 AM

Actually, I think in standard book-making, all ellipses are three characters. When a fourth is added, then it is a full stop (period) and not technically part of the ellipsis. (It could be a question mark, exclamation mark, or even a comma or semi-colon or colon instead.)

I was assuming that three or even four dots without a space between would be regarded as a single word by most or all e-book platforms. Am I wrong about that?

The only case I can think of where a four-dot ellipsis would change under my e-book formula is where the omission comes at the beginning of the following sentence. In a printed book, I would go dot/space/dot/space/dot/space/dot/space, but in an e-book I would go dot/space/dot/dot/dot/space.

Jellby 08-30-2014 12:32 PM

Quote:

Originally Posted by Notjohn (Post 2910393)
I was assuming that three or even four dots without a space between would be regarded as a single word by most or all e-book platforms. Am I wrong about that?

Probably. I've seen too many linebreaks before/after question/quote marks, with or without a hyphen, to keep any faith I initially had on the linebreaking algorithms of ebook readers.

Tex2002ans 08-31-2014 08:35 PM

Quote:

Originally Posted by Notjohn (Post 2910393)
Actually, I think in standard book-making, all ellipses are three characters. When a fourth is added, then it is a full stop (period) and not technically part of the ellipsis.

In modern English typography... perhaps. Different Style Guides may or may not agree.

Also, the typographer of an older book may have had older Style Guides that completely disagreed with the rules now given by the modern versions. You also have to keep in mind that other languages/countries may have their own typography and Style Guides.

I don't have any samples of these older books on hand, although I do remember it in a handful of Archive.org scans I have digitized.

Side Note: This reminds me of this fascinating article, "Why two spaces after a period isn’t wrong (or, the lies typographers tell about history)". The author goes through even these older Style Guides themselves (like new/older versions of the Chicago Manual of Style) and demolishes this "double-spacing" myth! I assume something similar could be said about ellipses:

http://www.heracliteanriver.com/?p=324

Similarly, asterisks were used along the same lines in the middle of paragraphs:

Quote:

Here is an ending sample sentence. * * * * * Here is some more sentences of text. And continuing.
If I recall correctly, these were used as:
  • Rough way for certain typographers to be able to squeeze/push widows/orphans off of further pages
  • Help "square off" the bottom of pages
  • Section breaks
  • Alternative to show "missing text"

In many cases, it is hard to tell exactly what the typographer was thinking, so it is very hard to "reverse" or "modernize" the decision.

Is this ACTUALLY missing text, or was it a pause, was it a punctuation mark in the original text that is being quoted, was it just a design decision, ...?

Quote:

Originally Posted by Notjohn (Post 2910393)
I was assuming that three or even four dots without a space between would be regarded as a single word by most or all e-book platforms. Am I wrong about that?

According to my testing, three or four periods in a row with NO SPACES between would be ok. Although as Jellby stated, I have seen the "ellipsis + period" or "period + ellipsis" break according to the linebreak algorithms.

If you wanted to stick with the ellipsis character, you would have to insert a non-breaking space to connect the ellipsis to the period before/after in order for them to stick together. So, "ellipsis + nbsp + period" and "period + nbsp + ellipsis" should work... although again having a space there isn't necessarily the proper way according to certain Style Guides.

Quote:

Originally Posted by Notjohn (Post 2910393)
The only case I can think of where a four-dot ellipsis would change under my e-book formula is where the omission comes at the beginning of the following sentence. In a printed book, I would go dot/space/dot/space/dot/space/dot/space, but in an e-book I would go dot/space/dot/dot/dot/space.

And you also have to keep in mind all of the spacing rules for certain punctuation, how do you handle not just periods before/after, but commas, quotation marks, question marks, exclamation points, brackets, parenthesis, etc. etc. The situation gets a lot hairier than you first expect, and many of these are hard, and have to be decided on a case by case basis according to context.

As I said, a huge pain in the butt! :D

Quote:

Originally Posted by Jellby (Post 2910442)
Probably. I've seen too many linebreaks before/after question/quote marks, with or without a hyphen, to keep any faith I initially had on the linebreaking algorithms of ebook readers.

Yep, and I believe when I first started, I saw the "ellipsis + period/question mark/exclamation point/quote mark" change into a "period + linebreak + punctuation", which is why is another reason why I abandoned using the ellipsis character.

To fix this, you would have to probably insert something like a zero-width space, although this would create HIDEOUSLY ugly code...... and devices probably do not have very good support for that, so zero-width spaces will show up as "missing character" boxes or quotation marks. Bleh!

Also, I just thought of another thing devices might break on, SEARCH. It is much easier to search for a three or four periods in a row, than it is to search for text with an ellipsis character.

BetterRed 09-01-2014 02:20 AM

@Tex2002ans - the author of the Heraclitean River blog and I are like minded in that he and I would prefer 1.5 spaces between sentences rather than one or two.

I just feel more comfortable if the space between sentences exceeds the space between words. I edit to two spaces using a regular space & a non breaking space. If I wanted 1.5 spaces what would you suggest I use.

My 'target font' is Times Roman 12 point, if that has anything to do with the price of fish.

Thanks.

BR


All times are GMT -4. The time now is 10:17 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.