MobileRead Forums - View Single Post

man2010 · 01-13-2016, 11:18 PM

These problems are related to conversion of Sigil-generated EPUB files to Kindle or adaptation of HTML code from a Kindle ebook to EPUB via Sigil.

Sigil usually converts the code of Unicode symbols into symbols themselves in UTF-8 files (when you import an HTML file into Sigil); later, when you try to convert such an EPUB file to Kindle, these symbols may get mangled.

Let's say your HTML file contains this line of code:

Code:

&amp;#x2318; Hello &amp;#x2192;

After you edit this file in Notepad++ (some other lines of code) and save the file, this line of code will stay the same:

Code:

&amp;#x2318; Hello &amp;#x2192;

In contrast, if you import this HTML file into Sigil, and then save the EPUB file, this line of code will get converted to:

Code:

⌘ Hello →

Thus, Sigil created a possible problem with encoding. There is no way to turn off automatic conversion of HTML entities into symbols in Sigil, except to specify individual HTML entities (inconvenient). Some EPUB readers may support HTML entities but not the resulting symbols. For instance, Internet Explorer (not an EPUB reader) reads HTML entity ’ but cannot read the corresponding symbol in UTF-8 files. There used to be a built-in validation tool in KindleGen 2.3 and 2.4, which produced warnings for various symbols; these warnings disappeared after you replaced the symbols with HTML entities or Unicode codes. This observation leads me to believe that the Kindle platform prefers HTML entities to symbols.

Another example, on another forum, a poster reported that musical score software Finale 2012 produces an EPUB3 file that converts OK to Kindle, but if you import this EPUB file into Sigil, edit, and save it, then the resulting EPUB file cannot be converted to Kindle (most musical symbols get corrupted). This problem existed 3 years ago. During recent testing, I was unable to find examples of such symbols (I don't have Finale 2012). It is possible that this bug was fixed in the latest version of KindleGen. Nevertheless, I found some strange behavior of Sigil with respect to Unicode symbols. If you test this list of symbols:

Code:

&amp;#8986; &#x23DB; &#x23DA; &#x23F0; &#x2655; &#x26C4; &#x23F3; &#x263B; &#x263C; &#x266A; &#x23CE; &#x23CF; &#x2284; &#x2286; &#x22A5; &#x220F; &#x2209; &#x24F3;

In Sigil, all these pieces of code get automatically (and successfully) converted into corresponding Unicode symbols, except Sigil cannot show many of them (you see a white square in HTML and book views). When you convert the resulting EPUB into Kindle, these symbols get converted successfully and are visible in the MOBI file (despite not being visible in the EPUB).

The bottom line is that it would be a good idea to have an option in Sigil that disables conversion of HTML entities and Unicode codes into symbols. Additionally, it would be great if Sigil could show all popular Unicode characters (probably a lot of work for the developers).

Another problem: Sigil generates an HTML table of contents that is single-spaced and inconvenient on smartphones. You need to insert extra spacing by hand to make this table of contents usable.

Another pitfall: A multilevel NCX doesn't work on all Kindle devices and reading apps (especially old Kindle devices). Even in modern reading apps, Kindle for PC shows only the first level, and Kindle for Android supports only two levels. There needs to be some kind of warning in Flight Crew or in help files about this problem, if users decide to convert an EPUB to Kindle. Many users assume that an epubcheck-compliant EPUB is 100% ready for Kindle conversion. In actuality, there are many small pitfalls, listed in my Kindle formatting manual (search for "EPUB" in the ebook).

01-13-2016, 11:18 PM	#1
man2010 Junior Member Posts: 9 Karma: 10 Join Date: Jan 2016 Device: Android phone, Paperwhite 2	Some problems/bugs in Sigil These problems are related to conversion of Sigil-generated EPUB files to Kindle or adaptation of HTML code from a Kindle ebook to EPUB via Sigil. Sigil usually converts the code of Unicode symbols into symbols themselves in UTF-8 files (when you import an HTML file into Sigil); later, when you try to convert such an EPUB file to Kindle, these symbols may get mangled. Let's say your HTML file contains this line of code: Code: &#x2318; Hello &#x2192; After you edit this file in Notepad++ (some other lines of code) and save the file, this line of code will stay the same: Code: &#x2318; Hello &#x2192; In contrast, if you import this HTML file into Sigil, and then save the EPUB file, this line of code will get converted to: Code: ⌘ Hello → Thus, Sigil created a possible problem with encoding. There is no way to turn off automatic conversion of HTML entities into symbols in Sigil, except to specify individual HTML entities (inconvenient). Some EPUB readers may support HTML entities but not the resulting symbols. For instance, Internet Explorer (not an EPUB reader) reads HTML entity ’ but cannot read the corresponding symbol in UTF-8 files. There used to be a built-in validation tool in KindleGen 2.3 and 2.4, which produced warnings for various symbols; these warnings disappeared after you replaced the symbols with HTML entities or Unicode codes. This observation leads me to believe that the Kindle platform prefers HTML entities to symbols. Another example, on another forum, a poster reported that musical score software Finale 2012 produces an EPUB3 file that converts OK to Kindle, but if you import this EPUB file into Sigil, edit, and save it, then the resulting EPUB file cannot be converted to Kindle (most musical symbols get corrupted). This problem existed 3 years ago. During recent testing, I was unable to find examples of such symbols (I don't have Finale 2012). It is possible that this bug was fixed in the latest version of KindleGen. Nevertheless, I found some strange behavior of Sigil with respect to Unicode symbols. If you test this list of symbols: Code: &#8986; ⏛ ⏚ ⏰ ♕ ⛄ ⏳ ☻ ☼ ♪ ⏎ ⏏ ⊄ ⊆ ⊥ ∏ ∉ ⓳ In Sigil, all these pieces of code get automatically (and successfully) converted into corresponding Unicode symbols, except Sigil cannot show many of them (you see a white square in HTML and book views). When you convert the resulting EPUB into Kindle, these symbols get converted successfully and are visible in the MOBI file (despite not being visible in the EPUB). The bottom line is that it would be a good idea to have an option in Sigil that disables conversion of HTML entities and Unicode codes into symbols. Additionally, it would be great if Sigil could show all popular Unicode characters (probably a lot of work for the developers). Another problem: Sigil generates an HTML table of contents that is single-spaced and inconvenient on smartphones. You need to insert extra spacing by hand to make this table of contents usable. Another pitfall: A multilevel NCX doesn't work on all Kindle devices and reading apps (especially old Kindle devices). Even in modern reading apps, Kindle for PC shows only the first level, and Kindle for Android supports only two levels. There needs to be some kind of warning in Flight Crew or in help files about this problem, if users decide to convert an EPUB to Kindle. Many users assume that an epubcheck-compliant EPUB is 100% ready for Kindle conversion. In actuality, there are many small pitfalls, listed in my Kindle formatting manual (search for "EPUB" in the ebook). Last edited by man2010; 01-13-2016 at 11:40 PM.