MobileRead Forums - View Single Post

Toxaris · 01-14-2016, 03:50 AM

Quote:

Originally Posted by man2010

These problems are related to conversion of Sigil-generated EPUB files to Kindle or adaptation of HTML code from a Kindle ebook to EPUB via Sigil.

Sigil usually converts the code of Unicode symbols into symbols themselves in UTF-8 files (when you import an HTML file into Sigil); later, when you try to convert such an EPUB file to Kindle, these symbols may get mangled.

Let's say your HTML file contains this line of code:

Code:

&amp;#x2318; Hello &amp;#x2192;

After you edit this file in Notepad++ (some other lines of code) and save the file, this line of code will stay the same:

Code:

&amp;#x2318; Hello &amp;#x2192;

In contrast, if you import this HTML file into Sigil, and then save the EPUB file, this line of code will get converted to:

Code:

⌘ Hello →

Thus, Sigil created a possible problem with encoding. There is no way to turn off automatic conversion of HTML entities into symbols in Sigil, except to specify individual HTML entities (inconvenient). Some EPUB readers may support HTML entities but not the resulting symbols. For instance, Internet Explorer (not an EPUB reader) reads HTML entity ’ but cannot read the corresponding symbol in UTF-8 files. There used to be a built-in validation tool in KindleGen 2.3 and 2.4, which produced warnings for various symbols; these warnings disappeared after you replaced the symbols with HTML entities or Unicode codes. This observation leads me to believe that the Kindle platform prefers HTML entities to symbols.

Another example, on another forum, a poster reported that musical score software Finale 2012 produces an EPUB3 file that converts OK to Kindle, but if you import this EPUB file into Sigil, edit, and save it, then the resulting EPUB file cannot be converted to Kindle (most musical symbols get corrupted). This problem existed 3 years ago. During recent testing, I was unable to find examples of such symbols (I don't have Finale 2012). It is possible that this bug was fixed in the latest version of KindleGen. Nevertheless, I found some strange behavior of Sigil with respect to Unicode symbols. If you test this list of symbols:

Code:

&amp;#8986; &#x23DB; &#x23DA; &#x23F0; &#x2655; &#x26C4; &#x23F3; &#x263B; &#x263C; &#x266A; &#x23CE; &#x23CF; &#x2284; &#x2286; &#x22A5; &#x220F; &#x2209; &#x24F3;

In Sigil, all these pieces of code get automatically (and successfully) converted into corresponding Unicode symbols, except Sigil cannot show many of them (you see a white square in HTML and book views). When you convert the resulting EPUB into Kindle, these symbols get converted successfully and are visible in the MOBI file (despite not being visible in the EPUB).

Whether Sigil can show the Unicode symbol, depends on the font used for displaying and not Sigil. If the symbol is not in the font, it will not be shown.

There are many issues with entities, especially the   is known to give issues. AFAK this has partially to do with BookView. For one thing it depends on the DOCTYPE, which is not required in ePUB (although recommended for ePUB2). The Unicode notation is universal and should be supported by all, even if they are not user/programmer friendly. Again, of course they need to be in the font. No HTML viewer should have issues with this, this should be supported.
Sigil uses the Unicode notation, which will show as the symbol in the editor. If you want to keep seeing the entity, you will need to add it to the preserve entities.
Since Sigil is on the route of supporting ePUB3, the issues with the entities can be solved by using the Unicode notation. Entities are not supported in ePUB3 (or HTML5) anymore.