Quote:
Originally Posted by charleski
This first ('Sigil test Original.epub') was deliberately created using ANSI encoding in Notepad++ and (correctly) specifies ISO-8859-1 encoding in the xml specification. The accented 'é' appears correctly in both ADE and Calibre's epub reader.
The second ('Sigil test opened in Sigil.epub') is the same file which has simply been opened in Sigil and immediately saved without any editing. the 'é' has now become a '?' in ADE and Calibre, because Sigil assumed that the encoding was utf-8, disregarding the encoding specified in the file, and changed the encoding attribute in the specification.
|
And there's your problem.
Try to directly load the XHTML in the first epub file. The accented "e" is preserved, since the encoding is correctly detected and the files converted to UTF-8, just like I said.
If you load the epub file, the accented "e" becomes a question mark. Why?
Because what you have is an ISO-8859-1 encoded XHTML file
inside and epub file, and that's
against the epub specification. The
only encoding allowed in XHTML files present in the epub specification are UTF-8 and UTF-16. You are
not allowed to use something else (like ISO-8859-1).
Quote:
Originally Posted by OPS Specification
1.4.1.2: XHTML Content Document Requirements
A conformant XHTML Content Document must meet these conditions:
- it is a well-formed XML document (as defined by XML 1.1); and
- it is encoded in UTF-8 or UTF-16; and
- it is a valid XML document according to the NVDL schema interaction provided in Appendix A; and
- it has a MIME media type of either application/xhtml+xml or text/x-oeb1-document (deprecated); and
- all XHTML elements and attributes not contained in an Inline XML Island are drawn from the XHTML subset identified in this document.
|
So your file is bad. Sigil is doing the correct thing by assuming the XHTML files in the epub will be either UTF-8 or UTF-16.
But I'm going to change that. I'm going to perform the same encoding detection analysis on XHTML files in the epubs as I do when an (X)HTML file is loaded directly. Why? Because someone not familiar with the epub spec will do the same thing you did and expect everything to work. Sigil should be able to detect this error and correct it, as it can for markup.
And it will, next version onwards.
EDIT: This is now
in trunk.