08-06-2017, 10:02 AM | #1 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Representing the no-break space
Hi
Here are the different ways to write it. We are living in a complex and threatening world... http://www.fileformat.info/info/unic...00a0/index.htm No-break spaces are particularly required for French users, which explains why this subject is so sensitive to us. A French book, written following the typographic rules of the Imprimerie Nationale, may contain thousands of no-break spaces (not to speak about narrow no-break spaces). I am not joking. As everybody knows, the use of Code:
- The Calibre editor uses the Unicode character Code:
\u00a0 Code:
& #160; When, coming from the Calibre editor, I open an EPUB with Sigil, my Code:
\u00a0 Code:
& #160; Code:
|& #160; Code:
\u00a0 I know this is nearly an article of faith and I do not wish to open a religious war but maybe there is room for compromise, for a peaceful co-existence. Would it be possible to open a small place in Preferences where the user could choose its own representation of the No-break space? Sigil would follow on using by default its smart decimal html entity Code:
& #160; Nota*: on this forum, I wrote the 160 with a plain space after the ampersand to make it appear. Last edited by roger64; 08-06-2017 at 11:45 AM. |
08-06-2017, 12:30 PM | #2 |
Sigil Developer
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
|
They are not two signs. They all represent the same character. In unicode \00a0 represent a unicode character whose codepoint value is 160 in decimal notation and 0xA0 in hex code.
So \u00a0 == & nbsp; == & #160; == & #xa0; these all refer to the exact same character. The & nbsp; is referred to as a named entity or named character ref. The others are numeric entities. In epub3 named entities are no longer allowed (except for xml reserved named entities). So Sigil uses & #160; for epub3 and & nbsp; for epub2. Calibre should easily be able to handle either. If you want Sigil to use another mapping in Codeview for non-breaking spaces, you need to change its Preserve Entities setting. Just do not use a named entity for it with epub3. Using no entity makes the non-breaking space undistinguishable from a regular space and can even cause problems in Codeview (actually a bug in Qt) which is why Sigil always tries to replace them with a proper entity. Last edited by KevinH; 08-06-2017 at 12:36 PM. |
Advert | |
|
08-06-2017, 12:38 PM | #3 |
Grand Sorcerer
Posts: 27,465
Karma: 192992430
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I (and I suspect Kevin, as well) would love to be able to support the use of the unicode character for the nobreak space in Sigil. The current parser (Google's Gumbo) could handle them just fine. Unfortunately, the unicode character cannot survive Qt's QTextEdit environment, which Sigil uses for Code View/Book View. They get changed to "normal" spaces. Hence the entity to keep that from happening.
Kovid has found a way around the Qt issue in calibre's editor, but he has the advantage of not having to deal with--by his own (wise) choice--a WYSIWYG Book View in his editor. We're not so lucky. We've inherited Book View. We hope to be able to utilize the same kind of workaround technique for this issue, but while a WYSIWYG Book View editor is a part of Sigil ... it's not likely to happen (unless Qt miraculously chooses to support unicode no-break space characters in QTextEdit someday). Until then ... both Sigil and Calibre's editor can handle each other's content just fine. If you use both, you'll just have to accept the fact that your no-break space entities may change (and calibre's no-break space characters will be converted to entities in Sigil. You can easily change them all back to characters in Calibre). Last edited by DiapDealer; 08-06-2017 at 12:43 PM. |
08-06-2017, 12:45 PM | #4 | |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Quote:
Further, things look a little muddled for the user, when he wishes to insert no-break spaces using a set of regexes (for example I have a group of ten saved searches for this purpose). Though as you say there are exactly the same character, nevertheless, they are counted differently. This situation and the changing of face of the no-break space is very confusing for beginners. My wish would be that Sigil honor the Entities to preserve code even if it wasn't coded as an entity in the original file. That would mean that the non-breaking space be converted to whatever entity was defined in Edit > Preferences > Preserve Entiries > Entities to preserve setting. Last edited by roger64; 08-06-2017 at 12:48 PM. Reason: preserve |
|
08-06-2017, 12:53 PM | #5 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
I just read DiapDealer comments.
OK and thanks for the technical explanations. |
Advert | |
|
08-06-2017, 03:05 PM | #6 |
Sigil Developer
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
|
For the record ... the Qt "bug" is here: Qt/qtbase/src/gui/text/qtextdocument.cpp
whose toPlainText() routine (which is used by QPlainTextEdit and QTextEdit) does the following and thereby changes all nbsp to normal spaces. Code:
QString QTextDocument::toPlainText() const { Q_D(const QTextDocument); QString txt = d->plainText(); QChar *uc = txt.data(); QChar *e = uc + txt.size(); for (; uc != e; ++uc) { switch (uc->unicode()) { case 0xfdd0: // QTextBeginningOfFrame case 0xfdd1: // QTextEndOfFrame case QChar::ParagraphSeparator: case QChar::LineSeparator: *uc = QLatin1Char('\n'); break; case QChar::Nbsp: *uc = QLatin1Char(' '); break; default: ; } } return txt; } This Qt bug has been reported numerous times but it has never been fixed as they don't consider this a bug. Strange that they only mess with nbsp as no other char is played with. Last edited by KevinH; 08-06-2017 at 03:08 PM. |
08-07-2017, 02:07 AM | #7 | |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Quote:
The target is identified. Who knows? If they receive a number of protests from -presumably French- users, it will make them change their mind about it. Or a -polite- kind of petition. I'll think about it... |
|
08-07-2017, 05:02 AM | #8 |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Presumably it's an issue for French in particular because French requires a non-breaking space before punctuation marks like "?" and ";"? (or should I say ";" ? )
|
08-07-2017, 12:00 PM | #9 | |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Quote:
We can manage it -mostly- with a set of regexes. |
|
08-08-2017, 05:08 AM | #10 |
Guru
Posts: 667
Karma: 929286
Join Date: Apr 2014
Device: PW-3, iPad, Android phone
|
Not just French; I routinely change ellipses (either the single glyph or just 3 or 4 dots) to stops with nbsp. Looks nicer (IMHO) and consistent for 3 or 4 dot ellipses, or those with ? or ! following. Also for spacing between quotes when embedded. -- ‘ “
And I hate numeric codes. Even assembly code has mnemonics; why do I have to remember what number is assigned to each glyph? Now enforced for epub3. |
08-08-2017, 11:15 PM | #11 |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
|
08-09-2017, 02:09 AM | #12 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
You're welcome. I use them to tweak French EPUBs.
Here they are in .json format with French titles. You can import the group in the Calibre editor. They are numbered from one to ten. You also can begin with this one, if you are coming back from Sigil: S: |& #160; R: \u00a0 Some comments before using them. - save your EPUB before use. - the Calibre editor allows you to know the individual count for each regex when you perform a group search (see "Show details"). - I can begin -or not- with 01, suppressing everything close to a no-break or narrow no-break space to begin with a clean -and empty- slate. The other nine recreate them step by step. - I make mostly use of narrow no-break spaces ("fines nsécables" in French) represented by \u202F in the replace part. If you wish to get "normal" no-break spaces, replace this last term with \u00a0 in the replace part. - If the number counted in 01 is superior to half the number of the group total (i.e. if you destroy more than you create), you may have to understand the reason of this difference and check again the book. You may be missing something useful. Any improvement welcome. Last edited by roger64; 08-09-2017 at 02:25 AM. |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Free (nook/Kindle/Kobo/iTunes) Break Out [Sci-Fi Space Vampire Paranormal Romance] | ATDrake | Deals and Resources (No Self-Promotion or Affiliate Links) | 0 | 11-02-2016 07:40 AM |
iPad Displaying properly narrow no-break-space (u202F) | roger64 | Apple Devices | 13 | 05-26-2015 01:16 PM |
Narrow No-Break Space display | roger64 | Sigil | 6 | 12-20-2012 02:43 PM |
Narrow No-Break Space and commercial support. | roger64 | ePub | 8 | 09-04-2012 01:08 PM |
Is it possible to define rules for non break space? | habanr | Conversion | 4 | 02-22-2011 07:36 AM |