View Single Post
Old 12-08-2013, 08:28 AM   #242
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,252
Karma: 16544692
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
The non-breaking space problem

@kovid,

Please can I ask you have another look at the nbsp issue of unicode \xA0 vs. the &nbsp entity as raised by Perkin. I am using calibre v1.14, not running from source, so if the following is no longer relevant please ignore

The attached epub contains 14 unicode \xA0 chars. When viewed in calibre Viewer all is OK.
  1. When opened in new Tweak, just as you said, the \xA0 chars are syntax highlighted.
    Problem 1: Preview is displaying them as if they were normal spaces.
  2. Change anything in the text and File-Save.
    Problem 2: Although the \xA0 chars still appear to be syntax highlighted in the code view, after quitting Tweak and viewing the edited epub it seems that the \xA0 chars have been converted to normal spaces during the File-Save. This can be confirmed by reopening the edited epub in Tweak.

This is a similar problem to one which existed in Sigil for a very long time, although it is fixed in the current version.

A possible way to avoid both Problem 1 & 2 above may be to convert the unicode \xA0 chars to entities (&nbsp or & # 160) during Tweak's File-Open and convert them back to unicode during the File-Save. I think current Sigil does the former (and complains about 'not well formed' because DTD is missing) but not the latter.

I've done a brief test of the theory by manually converting the \xA0 chars to &nbsp before opening in Tweak. After an edit/save Tweak has converted the &nbsp to \xA0 and the edited epub displays correctly.

Obviously I have no idea of any wider ramifications to Tweak of &nbsp entities being temporarily present during the editing process, but given the high percentage of retail epubs which use 'empty &nbsp paragraphs' to create scenebreaks etc, IMHO something needs to be done to avoid the current 'silent stripping'. If I've missed something obvious which would have avoided these problems, feel free to call me an idiot
Attached Files
File Type: epub uni_nbsp.epub (2.5 KB, 132 views)

Last edited by jackie_w; 12-08-2013 at 08:30 AM.
jackie_w is offline