Help - need epub sample that uses non-ascii file names and paths
Sigil has received a bug report about how it could not handle for input an epub that used non-ascii file names made by InDesign.
I have tried the official non-ascii file name testcase from the epub3 samples page and Sigil had no problems with it.
But that testcase used Japanese and I can not tell if Japanese has the same issues with combined characters and unicode normalization (nfc vs nfd, etc) as more European non-ascii characters often have.
So can anyone either provide or point me to a good test case that uses non-ascii filenames in an epub that may use combined forms (when one accent is added to a base character in decomposed form), or alternatively when one character can be decomposed into two separate ones (typically involving one or more on accents and other diacritic markings).
If anyone has one, I would love to have a copy of it even if just a single filename that is non-ascii in a sample epub.
In addition, is this a known problem with InDesign either not properly unicode normalizing its non-ascii filenames to NFC as the spec calls for, or not properly url encoding them in the manifest? Or does the epub zip archive produced by InDesign not proerly set the flag that tells it to use utf-8?
Also inside a zip archive that uses the utf-8 flag the order of decomposed vs composed characters would matter. Does anyone know what the rule of normalization is for the files inside a zip archive? On macOS filesystem paths are typically stored as decomposed (NFD), but the web and elsewhere seems to all be NFC (composed). So could a zip archive built on a Mac by InDesign be using file names/paths stored inside the zip using NFD normalization form, when if fact it should probably be NFC form?
I just do not use InDesign, so I have no idea if this is an InDesign issue or a hidden Sigil bug.
Thanks,
KevinH
Last edited by KevinH; 05-04-2024 at 04:50 PM.
|