Thread: Just FYI, ä
View Single Post
Old 03-02-2013, 04:42 PM   #6
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by Tex2002ans View Post
Here is an example of a massive EPUB I created this EPUB in Sigil 0.5.3 with unicode filenames:

...

Sigil 0.5.3 allowed me to save/open/rename files with unicode characters perfectly fine (and if I recall correctly, FlightCrew said nothing about potential filename errors). I imported them into Sigil 0.5.3 using the typical Add Existing File dialog.

My Nook is able to read these EPUBs fine, even the articles with unicode in the filenames.
Later version of Sigil are more strict about the EPUB 2 spec. The file you've attached is invalid. The filenames in the archive must be utf-8 encoded according to the spec and Sigil tries to decode the filenames using utf-8. The filenames in this file are _not_ utf-8 encoded.

So what's happening is Sigil is getting the list of files from the OPF and they don't match what it gets from the archive. So Sigil thinks the file does not exist.

Some EPUB readers are more relaxed and don't really care about the filename encoding. These will either ignore utf-8 encoding and use the standard ZIP encoding or they will check if the utf-8 bit is set and only use utf-8 in that case. I've made a change to Sigil for 0.7.1 to check the utf-8 bit and use the standard ZIP filename encoding if it's not set instead. With this change the example file opens properly.



Quote:
Originally Posted by Tex2002ans View Post
When running the EPUB through EPUBCheck 3.0, you get this output:

Code:
File name contains non-ascii characters: óá. Consider changing filename
This recommendation is because of this very situation. These characters must be decoded properly otherwise they won't match what's in the OPF. A reading system can either A) follow the spec and expect the defined encoding. B) See what the archive has set as the encoding. With A we get into this situation. With B, well this assumes the encoding was marked properly. Either way you're going to run into this situation using non-ascii characters.
user_none is offline   Reply With Quote