![]() |
[old thread] non breaking spaces (* and  ) automatically removed
Hello,
I have a really big problem with non breaking spaces, that are not written as " " but as "& #160;" or " ", what - as far as I understand it - is just the decimal or hexadecimal way of writing a non breaking space. In the 0.5x.0 versions Sigil automatically replaced these with " " what was ok, since it was the same thing. But in the 0.6 and now the 0.7. version Sigil just replaces them with "normal" spaces. When you use non breaking spaces for your layout, this is a big problem, since in HTML more than space is just treated as one single space. Is there a way to disable this behaviour? I have to work with EPUBs that were created not by me, EPUB files, that have these non breaking spaces in it. I tried to uncheck the checkboxes in the preferences "Automatically clean and format html source", but that does not work. Sigil does that replacement when I open the EPUB, so there is no way of getting around it, by replacing the spaces myself with a search/replace or something like that. Does anyone have an idea, because that bug (or feature?) makes it impossible for me to use newer Sigil versions and I have to use the old 0.5.3 version :-( Thanks and bye Artoros |
I can see the issue. * is converted to a space instead of being left as * or converted to . This is primarily an issue with nbsp since its special in that Book View always changes the actual nbsp character to a space - so we have code that converts the nbsp character to the entity. Need to check if we need to do the same for * or handle in a different way.
|
On a related note I am getting errors with non-breaking spaces when validating books - the error reported is
Code:
entity 'nbsp' not foundWhen I examine the html the error points to various entries such as : Code:
‘But . . . I don’t—’Which looks valid HTML to me. While I can get rid of the "errors" by cleaning the file I can't see why it needs cleaning. BobC |
This usually indicates a problem in your header's document type. The type is defined for something that doesn't know how to deal with an & nbsp ; entity. Just compare the header from the uncleaned and the cleaned versions to see the difference.
|
Is there something that I can "switch off" to prevent my nonbreaking spaces from being turned into regular old spaces?
|
Quote:
The character will always be converted to an entity. It can't survive the Qt Widget being used (it will be changed to a normal space), so Sigil converts it to an entity first (when opening the epub) so that it can still fulfill its non-breaking purpose. The entity will usually stay an entity as long as you forego any editing whatsoever in Book View. Look at Book View ... Edit in Code View. And in the Clean Source preferences, make sure "Pretty Print" is selected instead of HTML Tidy. If you can build Sigil from source, there's been a few new patches accepted -- one of which provides a "Preserve Entities" feature that will help make sure non-breaking space entities don't get zapped when editing in book view. |
Quote:
I know how irritating it is when somebody answers a question by replying to a different question, but I went through this issue years ago when the Kindle was first introduced (the ellipses breaking at the end of a line), until I decided that on the digital "page" the space looked rather silly. |
Quote:
|
Quote:
blah blah blah . . .! More blah blah On paper I prefer single curly quotes for dialogue, on digital I prefer double curly quotes. Maybe ebooks could be user configurable ;) BR |
Quote:
|
Quote:
Quote:
Some fonts also have oddities with the ellipsis character, in which the "dots" don't match your typical period, or do not have similar kerning to the default period, making it look quite odd when they are near eachother. I had a large post typed up covering my personal annoyances with ellipses in EPUBs, but then scrapped it. Here are some more resources on the topic: https://english.stackexchange.com/qu...s-for-ellipses http://www.thebookdesigner.com/2013/...dobe-indesign/ https://tex.stackexchange.com/questi...xetex-document Different Style Guides and different languages also have different rules. Ultimately, ellipses are a a huge pain in the bottom, and these "Smarten Punctuation" algorithms completely mangle them. |
Actually, I think in standard book-making, all ellipses are three characters. When a fourth is added, then it is a full stop (period) and not technically part of the ellipsis. (It could be a question mark, exclamation mark, or even a comma or semi-colon or colon instead.)
I was assuming that three or even four dots without a space between would be regarded as a single word by most or all e-book platforms. Am I wrong about that? The only case I can think of where a four-dot ellipsis would change under my e-book formula is where the omission comes at the beginning of the following sentence. In a printed book, I would go dot/space/dot/space/dot/space/dot/space, but in an e-book I would go dot/space/dot/dot/dot/space. |
Quote:
|
Quote:
Also, the typographer of an older book may have had older Style Guides that completely disagreed with the rules now given by the modern versions. You also have to keep in mind that other languages/countries may have their own typography and Style Guides. I don't have any samples of these older books on hand, although I do remember it in a handful of Archive.org scans I have digitized. Side Note: This reminds me of this fascinating article, "Why two spaces after a period isn’t wrong (or, the lies typographers tell about history)". The author goes through even these older Style Guides themselves (like new/older versions of the Chicago Manual of Style) and demolishes this "double-spacing" myth! I assume something similar could be said about ellipses: http://www.heracliteanriver.com/?p=324 Similarly, asterisks were used along the same lines in the middle of paragraphs: Quote:
In many cases, it is hard to tell exactly what the typographer was thinking, so it is very hard to "reverse" or "modernize" the decision. Is this ACTUALLY missing text, or was it a pause, was it a punctuation mark in the original text that is being quoted, was it just a design decision, ...? Quote:
If you wanted to stick with the ellipsis character, you would have to insert a non-breaking space to connect the ellipsis to the period before/after in order for them to stick together. So, "ellipsis + nbsp + period" and "period + nbsp + ellipsis" should work... although again having a space there isn't necessarily the proper way according to certain Style Guides. Quote:
As I said, a huge pain in the butt! :D Quote:
To fix this, you would have to probably insert something like a zero-width space, although this would create HIDEOUSLY ugly code...... and devices probably do not have very good support for that, so zero-width spaces will show up as "missing character" boxes or quotation marks. Bleh! Also, I just thought of another thing devices might break on, SEARCH. It is much easier to search for a three or four periods in a row, than it is to search for text with an ellipsis character. |
@Tex2002ans - the author of the Heraclitean River blog and I are like minded in that he and I would prefer 1.5 spaces between sentences rather than one or two.
I just feel more comfortable if the space between sentences exceeds the space between words. I edit to two spaces using a regular space & a non breaking space. If I wanted 1.5 spaces what would you suggest I use. My 'target font' is Times Roman 12 point, if that has anything to do with the price of fish. Thanks. BR |
| All times are GMT -4. The time now is 10:17 PM. |
Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.