Bug with zero-width space Unicode
Well, I wanted to officially submit an issue, but the Sigil website says I should just fix the bug myself and submit a patch. As I'm no programmer, I guess this is the next best thing!
I've got a book with a few paragraphs that are meant to have no spaces in them. I told the ebook formatting people to use zero-width spaces between where the word boundaries would normally be, so the lines still wrap. Here is what the html looks like in a text editor:
< p>Ah& #8203;but& #8203;they& #8203;were& #8203;left& #8203;behind& #8203;It& #8203;is& #8203;obvious& #8203;from& #8203;then& #8203;ature& #8203;of& #8203;the& #8203;bond& #8203;But& #8203;where& #8203;where& #8203;where& #8203;where& #8203;Setoff& #8203;Obvious& #8203;Realization& #8203;like& #8203;a& #8203;pricity& #8203;They& #8203;are& #8203;with& #8203;the& #8203;Shin& #8203;We& #8203;must& #8203;find& #8203;one& #8203;Can& #8203;we& #8203;make & #8203;to& #8203;use& #8203;a& #8203;Truthless& #8203;Can& #8203;we& #8203;craft& #8203;a& #8203;weapon< /p>
(I put a space in each html entity and tag so it will display here, but there's no space in the html itself.)
This looks perfect in all ebook readers, except for in amzn-mobi, for which we get around the issue with a media query.
The problem is how it displays in Sigil. Here is how that paragraph displays in Sigil, in the html view (not wysiwyg):
< p>AhbuttheywereleftbehindItisobviousfrom thenatureضthebondButwherewherewherewher eSetoff/˘∂ص≥Realizationlikeapricity4®•πarewiththe ShinWemustfindoneCanwemaketouseaTruthl ess#°Ćwecraftaweapon< /p>
So as you can see, some of the individual words get changed to garbage characters. (Also, the forum software here is adding some spaces.)
However, this is only the way it displays. The underlying text looks normal—Sigil is converting all of the zero-width html entities to actual zero-width Unicode characters. And the intervening characters that look like garbage above are not actually garbage—if you save the file and look at it in a text editor, those characters look fine. And if I look at it in a hex editor, each zero-width space is E2808B exactly as expected for zero-width space Unicode.
But this is not very useful in Sigil—it looks buggy as I mentioned above, and it runs all the text together as if the zero-width spaces aren't there. (That's what it's supposed to look like, facing the end user—not in the html code, which should give an indication that the zero-width spaces are there so an editor can do something with them.)
|