Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 05-16-2013, 07:15 AM   #46
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
@BobC

Glad it worked.

English language makes use of much less "named entities" than French, so, it's not really inconvenient to use them, at least for the purpose of this thread.
roger64 is offline   Reply With Quote
Old 05-16-2013, 04:41 PM   #47
Arios
A curiosus lector!
Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.
 
Arios's Avatar
 
Posts: 463
Karma: 2015140
Join Date: Jun 2012
Device: Sony PRS-T1, Kobo Touch
@ meme

I am puzzled because I created many french epub with Sigil 0.7.2 containing a lot of non-breaking spaces (of course...) and I never lost one of them.

So I did some quick tests with Writer2xhtml (Win 7 64b and LMDE 64b) and compared the epubs with those produced by AWP (through wine in LMDE): the immediate difference is that Writer2xhtml uses a "xhtml" file extension (not a scoop, I know!) and AWP a "html" one for sections in the epub.

From these few tests my hypothesis is that the export with xhtml extension trigger the loss of non-breaking spaces, simply because there is no lost when using AWP.

Moreover, even if they are invisible, it seems that non-breaking spaces produced by Writer2xhtml and AOO are still active (test done with the PRS-T1).

@ Roger

Roger (or someone else) is it possible to "force" Writer2xhtml to use the "html" extension instead of xhtml to see if my guess is good enough? Indeed, if the files are renamed after export, breaking spaces remain invisible.
Arios is offline   Reply With Quote
Advert
Old 05-16-2013, 05:52 PM   #48
BobC
Guru
BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.
 
Posts: 691
Karma: 3026110
Join Date: Dec 2008
Location: Lancashire, U.K.
Device: BeBook 1, BeBook Pure, Kobo Glo, (and HD),Energy Sistem EReader Pro +
If you create an epub with Sigil and use it to insert a non-breaking space it will use the html entity format (   ) this will be preserved through various edits using Sigil or other simple text editors.

The problem arises when a file is being edited where the non-breaking space is the unicode character U+00A0 aka 160 decimal . In this form the non-breaking space simply shows up as a space when the epub is opened with Sigil but is removed if the file is saved (it is not replaced with a normal <space>). If the Unicode form is still present in the epub when it arrives on the reader it will usually be correctly interpreted and not create a problem.

The real problem occurs if you need to edit a file with this "invisible" character in it as it will almost certainly show up as a simple <space>. Libre Office is capable of displaying the non-breaking space as a highlighted space so making it visible when editing. If Sigil could do the same or similar then it wouldn't be necessary to use the html entity form. Even in code view Sigil displays the U+00A0 as a simple space.

Unfortunately Calibre and Writer2Epub both use the Unicode form - as roger64 points out writer2xhtml has an option to use named entities rather than Unicode but this affects other characters as well.

From a quick read of EPUB and XML specs it should be possible to declare nbsp as an entity in the XHTML document to allow it to be used and not fall foul of the validation.

There may also be other characters such as the soft hyphen ( & shy; ) that exhibit a similar behaviour in that they are not normally visible but affect the text flow.


BobC

Last edited by BobC; 05-16-2013 at 06:42 PM. Reason: Get rid of unintened smilies !
BobC is offline   Reply With Quote
Old 05-16-2013, 06:24 PM   #49
BobC
Guru
BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.
 
Posts: 691
Karma: 3026110
Join Date: Dec 2008
Location: Lancashire, U.K.
Device: BeBook 1, BeBook Pure, Kobo Glo, (and HD),Energy Sistem EReader Pro +
Quote:
Originally Posted by meme View Post

The file is encoded okay as UTF-8, and the nbsp characters are in the file stored as two bytes C2 A0 (or 302 240 decimal) as you can see when you examine the file with a hex editor. When Sigil opens the file it correctly identifies the file as UTF-8 and then asks Qt to convert the file to Unicode. Unfortunately for some reason, although it appears to convert everything else ok, Qt converts the 2 bytes to a standard space (20) instead of the nbsp character (A0). So the nbsp characters are removed before the rest of Sigil can see them and convert them to the &nbsp; entity.
@meme

Are you sure the nbsp is stored as C2 A0 - when I looked at it with a hex editor is showed as A0 00 (presumably swap the bytes for endian and read it as simply 00 A0 or 000 160 decimal). This would be correct -


From http://www.w3.org/TR/html4/sgml/entities.html :
Code:
<!ENTITY nbsp   CDATA " & # 160 ;" -- no-break space = non-breaking space, U+00A0 ISOnum -->
BobC

Last edited by BobC; 05-16-2013 at 06:43 PM.
BobC is offline   Reply With Quote
Old 05-16-2013, 11:51 PM   #50
Arios
A curiosus lector!
Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.Arios ought to be getting tired of karma fortunes by now.
 
Arios's Avatar
 
Posts: 463
Karma: 2015140
Join Date: Jun 2012
Device: Sony PRS-T1, Kobo Touch
@ BobC

What you say is consistent with my observations and Emacs could probably be added to the list, with Calibre and w2xhtml.

I do not encountered the same problem as roger64 because the "Use named character entities" was activated since a long time in my writer2xhtml setup.

Consequently, it is quite possible that this is not strictly connected to the extension, but the way some softwares export entities. But if w2xhtml can export to "html", it could be fun to see what happen.
Arios is offline   Reply With Quote
Advert
Old 05-28-2013, 09:25 AM   #51
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Hi

@meme

Could Html Tidy be of some help in the meantime?

Here, I've found the two following Html Tidy options about &nbsp; and entities but I do not know how to change their value.
Attached Thumbnails
Click image for larger version

Name:	Tidy option.png
Views:	294
Size:	11.4 KB
ID:	106363   Click image for larger version

Name:	Tidy-entities.png
Views:	303
Size:	8.9 KB
ID:	106364  
roger64 is offline   Reply With Quote
Old 05-28-2013, 02:19 PM   #52
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,160
Karma: 73448616
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
@roger64: from looking at the Sigil source it would appear that the configuration options passed into tidy are hard-coded within Sigil.
PeterT is offline   Reply With Quote
Old 05-28-2013, 03:34 PM   #53
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
@PeterT

These are bad news. This explains why Tidy is so rigid. It's a pity because it has been conceived with a lot of configuration parameters.

Maybe a Linux user better than me would know how to use the command-line tool. This one can be customized to tidy the xhtml files.
roger64 is offline   Reply With Quote
Old 05-28-2013, 05:57 PM   #54
BobC
Guru
BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.BobC ought to be getting tired of karma fortunes by now.
 
Posts: 691
Karma: 3026110
Join Date: Dec 2008
Location: Lancashire, U.K.
Device: BeBook 1, BeBook Pure, Kobo Glo, (and HD),Energy Sistem EReader Pro +
Quote:
Originally Posted by roger64 View Post

Maybe a Linux user better than me would know how to use the command-line tool. This one can be customized to tidy the xhtml files.
Is there not a GUI front end to Tidy for Linux like there is for Windows ?

BobC
BobC is offline   Reply With Quote
Old 05-29-2013, 12:28 AM   #55
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Quote:
Originally Posted by BobC View Post
Is there not a GUI front end to Tidy for Linux like there is for Windows ?

BobC
Maybe, but I did not see it on their site. Anyway, it's not that complicated to tweak. On the "configuration file":

- you only modify what you do not like (it can be as few as two or three values)
- most of them are boolean type (yes/no), or with a preset limited choice.
roger64 is offline   Reply With Quote
Old 12-22-2013, 07:41 PM   #56
RingEbooks
Junior Member
RingEbooks began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Dec 2013
Device: iPad
Use #160 not &nbsp;

You should use #160 not "&nbsp;"", in XML.
"nbsp" isn't a predefined XML entity. This works on all tablets I've tested it on.

Last edited by RingEbooks; 12-22-2013 at 07:51 PM. Reason: not showing characters
RingEbooks is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Use of &nbsp; for spacing Ripplinger Sigil 11 11-25-2012 04:36 AM
iBooks does NOT LIKE &nbsp; Erin Apple Devices 0 09-13-2011 11:17 AM
txt to Epub - nbsp nbsp cybmole Calibre 1 09-17-2010 09:05 AM
Specify indent in css, not with &nbsp James_Wilde Calibre 7 09-13-2010 09:48 PM
Unwanted $nbsp; Nathanael Sigil 10 09-07-2010 03:52 PM


All times are GMT -4. The time now is 07:59 PM.


MobileRead.com is a privately owned, operated and funded community.