If you create an epub with Sigil and use it to insert a non-breaking space it will use the html entity format ( ) this will be preserved through various edits using Sigil or other simple text editors.
The problem arises when a file is being edited where the non-breaking space is the unicode character U+00A0 aka 160 decimal . In this form the non-breaking space simply shows up as a space when the epub is opened with Sigil but is removed if the file is saved (it is not replaced with a normal <space>). If the Unicode form is still present in the epub when it arrives on the reader it will usually be correctly interpreted and not create a problem.
The real problem occurs if you need to edit a file with this "invisible" character in it as it will almost certainly show up as a simple <space>. Libre Office is capable of displaying the non-breaking space as a highlighted space so making it visible when editing. If Sigil could do the same or similar then it wouldn't be necessary to use the html entity form. Even in code view Sigil displays the U+00A0 as a simple space.
Unfortunately Calibre and Writer2Epub both use the Unicode form - as roger64 points out writer2xhtml has an option to use named entities rather than Unicode but this affects other characters as well.
From a quick read of EPUB and XML specs it should be possible to declare nbsp as an entity in the XHTML document to allow it to be used and not fall foul of the validation.
There may also be other characters such as the soft hyphen ( & shy; ) that exhibit a similar behaviour in that they are not normally visible but affect the text flow.
BobC
Last edited by BobC; 05-16-2013 at 06:42 PM.
Reason: Get rid of unintened smilies !
|