Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 08-06-2017, 10:02 AM   #1
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Representing the no-break space

Hi

Here are the different ways to write it. We are living in a complex and threatening world...

http://www.fileformat.info/info/unic...00a0/index.htm

No-break spaces are particularly required for French users, which explains why this subject is so sensitive to us. A French book, written following the typographic rules of the Imprimerie Nationale, may contain thousands of no-break spaces (not to speak about narrow no-break spaces). I am not joking.

As everybody knows, the use of
Code:
 
has been deprecated for EPUB3. I remember the glorious times when this obnoxious named entity wreaked havoc among Sigil files. This is behind us now and this is just where the fun begins. For ebook users, the situation looks about the same than the one for European users, some of them, well most, using the metric system, some others the Imperial system. So, to make it short, for the same purpose in its code :
- The Calibre editor uses the Unicode character
Code:
\u00a0
- Sigil uses the decimal html entity*
Code:
& #160;
Using different signs for the same purpose can induce some disagreements. That's why, when you switch from one Editor to another because they are complementary, you need to adapt to this difference.

When, coming from the Calibre editor, I open an EPUB with Sigil, my
Code:
\u00a0
are automatically changed to
Code:
& #160;
and I make use of a plain regex when I come back to the Calibre editor to reestablish the previous situation. This regex is a little alike a genuflection or a sign of the cross. You do it without thinking too much about it. This is mine:
Code:
 |& #160;
Code:
\u00a0
But this situation is really puzzling (unpleasant) for beginners.I do not say that someone is right, and the other is wrong. Both editors are right in their own way, but I can't help thinking that offering a common choice would be nice. Even if the two signs are rendered the same way, are recognized everywhere, I have a preference for the Unicode one, if only for cosmetic reasons.

I know this is nearly an article of faith and I do not wish to open a religious war but maybe there is room for compromise, for a peaceful co-existence. Would it be possible to open a small place in Preferences where the user could choose its own representation of the No-break space? Sigil would follow on using by default its smart decimal html entity
Code:
& #160;
but the users would be offered the possibility to customize it and make it display another choice.

Nota*: on this forum, I wrote the 160 with a plain space after the ampersand to make it appear.

Last edited by roger64; 08-06-2017 at 11:45 AM.
roger64 is offline   Reply With Quote
Old 08-06-2017, 12:30 PM   #2
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
They are not two signs. They all represent the same character. In unicode \00a0 represent a unicode character whose codepoint value is 160 in decimal notation and 0xA0 in hex code.

So \u00a0 == & nbsp; == & #160; == & #xa0;

these all refer to the exact same character.

The & nbsp; is referred to as a named entity or named character ref. The others are numeric entities. In epub3 named entities are no longer allowed (except for xml reserved named entities).

So Sigil uses & #160; for epub3 and & nbsp; for epub2. Calibre should easily be able to handle either.

If you want Sigil to use another mapping in Codeview for non-breaking spaces, you need to change its Preserve Entities setting. Just do not use a named entity for it with epub3. Using no entity makes the non-breaking space undistinguishable from a regular space and can even cause problems in Codeview (actually a bug in Qt) which is why Sigil always tries to replace them with a proper entity.

Last edited by KevinH; 08-06-2017 at 12:36 PM.
KevinH is offline   Reply With Quote
Advert
Old 08-06-2017, 12:38 PM   #3
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,465
Karma: 192992430
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
I (and I suspect Kevin, as well) would love to be able to support the use of the unicode character for the nobreak space in Sigil. The current parser (Google's Gumbo) could handle them just fine. Unfortunately, the unicode character cannot survive Qt's QTextEdit environment, which Sigil uses for Code View/Book View. They get changed to "normal" spaces. Hence the entity to keep that from happening.

Kovid has found a way around the Qt issue in calibre's editor, but he has the advantage of not having to deal with--by his own (wise) choice--a WYSIWYG Book View in his editor. We're not so lucky. We've inherited Book View.

We hope to be able to utilize the same kind of workaround technique for this issue, but while a WYSIWYG Book View editor is a part of Sigil ... it's not likely to happen (unless Qt miraculously chooses to support unicode no-break space characters in QTextEdit someday).

Until then ... both Sigil and Calibre's editor can handle each other's content just fine. If you use both, you'll just have to accept the fact that your no-break space entities may change (and calibre's no-break space characters will be converted to entities in Sigil. You can easily change them all back to characters in Calibre).

Last edited by DiapDealer; 08-06-2017 at 12:43 PM.
DiapDealer is offline   Reply With Quote
Old 08-06-2017, 12:45 PM   #4
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Quote:
Originally Posted by KevinH View Post
So Sigil uses & #160; for epub3 and & nbsp; for epub2. Calibre should easily be able to handle either.
The Calibre editor handles either indeed. It does not modify any & #160; coming from Sigil and displays them normally. It does not convert them automatically like Sigil does for the unicode character \u00a0. While we are at it, I do not understand why Sigil should not behave in the same way.

Further, things look a little muddled for the user, when he wishes to insert no-break spaces using a set of regexes (for example I have a group of ten saved searches for this purpose). Though as you say there are exactly the same character, nevertheless, they are counted differently.

This situation and the changing of face of the no-break space is very confusing for beginners.

My wish would be that Sigil honor the Entities to preserve code even if it wasn't coded as an entity in the original file. That would mean that the non-breaking space be converted to whatever entity was defined in Edit > Preferences > Preserve Entiries > Entities to preserve setting.

Last edited by roger64; 08-06-2017 at 12:48 PM. Reason: preserve
roger64 is offline   Reply With Quote
Old 08-06-2017, 12:53 PM   #5
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
I just read DiapDealer comments.
OK and thanks for the technical explanations.
roger64 is offline   Reply With Quote
Advert
Old 08-06-2017, 03:05 PM   #6
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
For the record ... the Qt "bug" is here: Qt/qtbase/src/gui/text/qtextdocument.cpp
whose toPlainText() routine (which is used by QPlainTextEdit and QTextEdit) does the following and thereby changes all nbsp to normal spaces.

Code:
QString QTextDocument::toPlainText() const
{
    Q_D(const QTextDocument);
    QString txt = d->plainText();

    QChar *uc = txt.data();
    QChar *e = uc + txt.size();

    for (; uc != e; ++uc) {
        switch (uc->unicode()) {
        case 0xfdd0: // QTextBeginningOfFrame
        case 0xfdd1: // QTextEndOfFrame
        case QChar::ParagraphSeparator:
        case QChar::LineSeparator:
            *uc = QLatin1Char('\n');
            break;
        case QChar::Nbsp:
            *uc = QLatin1Char(' ');
            break;
        default:
            ;
        }
    }
    return txt;
}
Kovid creates his own class that subclasses the problem Qt class and creates his own version of this toPlainText() in which he uses a text cursor to highlight the entire document and then copy it out so he does not have to use the normal toPlainText call.

This Qt bug has been reported numerous times but it has never been fixed as they don't consider this a bug. Strange that they only mess with nbsp as no other char is played with.

Last edited by KevinH; 08-06-2017 at 03:08 PM.
KevinH is offline   Reply With Quote
Old 08-07-2017, 02:07 AM   #7
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Quote:
Originally Posted by KevinH View Post
This Qt bug has been reported numerous times but it has never been fixed as they don't consider this a bug. Strange that they only mess with nbsp as no other char is played with.
@Kevin

The target is identified.

Who knows? If they receive a number of protests from -presumably French- users, it will make them change their mind about it. Or a -polite- kind of petition. I'll think about it...
roger64 is offline   Reply With Quote
Old 08-07-2017, 05:02 AM   #8
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Presumably it's an issue for French in particular because French requires a non-breaking space before punctuation marks like "?" and ";"? (or should I say ";" ? )
HarryT is offline   Reply With Quote
Old 08-07-2017, 12:00 PM   #9
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Quote:
Originally Posted by HarryT View Post
Presumably it's an issue for French in particular because French requires a non-breaking space before punctuation marks like "?" and ";"? (or should I say ";" ? )
Indeed but not only. Here is a Canadian French text with many examples where there should be no-break spaces (represented here with _).

We can manage it -mostly- with a set of regexes.
roger64 is offline   Reply With Quote
Old 08-08-2017, 05:08 AM   #10
AlanHK
Guru
AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.
 
AlanHK's Avatar
 
Posts: 667
Karma: 929286
Join Date: Apr 2014
Device: PW-3, iPad, Android phone
Not just French; I routinely change ellipses (either the single glyph or just 3 or 4 dots) to stops with nbsp. Looks nicer (IMHO) and consistent for 3 or 4 dot ellipses, or those with ? or ! following. Also for spacing between quotes when embedded. -- ‘ “
And I hate numeric codes. Even assembly code has mnemonics; why do I have to remember what number is assigned to each glyph? Now enforced for epub3.
AlanHK is offline   Reply With Quote
Old 08-08-2017, 11:15 PM   #11
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by roger64 View Post
Further, things look a little muddled for the user, when he wishes to insert no-break spaces using a set of regexes (for example I have a group of ten saved searches for this purpose).
Mind sharing these Saved Searches?
Tex2002ans is offline   Reply With Quote
Old 08-09-2017, 02:09 AM   #12
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Quote:
Originally Posted by Tex2002ans View Post
Mind sharing these Saved Searches?
You're welcome. I use them to tweak French EPUBs.
Here they are in .json format with French titles. You can import the group in the Calibre editor. They are numbered from one to ten.

You also can begin with this one, if you are coming back from Sigil:
S:  |& #160;
R: \u00a0

Some comments before using them.
- save your EPUB before use.
- the Calibre editor allows you to know the individual count for each regex when you perform a group search (see "Show details").
- I can begin -or not- with 01, suppressing everything close to a no-break or narrow no-break space to begin with a clean -and empty- slate. The other nine recreate them step by step.
- I make mostly use of narrow no-break spaces ("fines nsécables" in French) represented by \u202F in the replace part. If you wish to get "normal" no-break spaces, replace this last term with \u00a0 in the replace part.
- If the number counted in 01 is superior to half the number of the group total (i.e. if you destroy more than you create), you may have to understand the reason of this difference and check again the book. You may be missing something useful.

Any improvement welcome.
Attached Files
File Type: zip narrow no-break spaces.json.zip (910 Bytes, 200 views)

Last edited by roger64; 08-09-2017 at 02:25 AM.
roger64 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Free (nook/Kindle/Kobo/iTunes) Break Out [Sci-Fi Space Vampire Paranormal Romance] ATDrake Deals and Resources (No Self-Promotion or Affiliate Links) 0 11-02-2016 07:40 AM
iPad Displaying properly narrow no-break-space (u202F) roger64 Apple Devices 13 05-26-2015 01:16 PM
Narrow No-Break Space display roger64 Sigil 6 12-20-2012 02:43 PM
Narrow No-Break Space and commercial support. roger64 ePub 8 09-04-2012 01:08 PM
Is it possible to define rules for non break space? habanr Conversion 4 02-22-2011 07:36 AM


All times are GMT -4. The time now is 05:00 AM.


MobileRead.com is a privately owned, operated and funded community.