Some of your arguments have merit but an epub produced on one platform must be readable and searchable on many different e-readers.
Precomposed form works with multiple accents as well.
Researchers of historic texts that study dead languages and use of accents/diacritics use only primary hand written or typed sources as old as possible. This is not the realm of epubs as someone had to choose a form for digital storage and that was not the original author. So most of those points are moot. In addition the fonts chosen to be used in the epub have a greater an impact on any visual stylistic interpretation than invisible normalization forms that do not lose accents.
Yes mixed normalization forms can not be searched. And mixed normalization forms converted can not easily be converted back.
Even though the form is different the actual text is visually *identical*. The reader of any epub can not tell which normalization form is being used. Only when searching do issues become obvious.
As I said before, the latest version of Sigil now in its own repo branch now handles copying and pasting text into its find field and functions to prevent these search issues.
So you appear to be arguing against changes that actually help epubs be more universally searchable.
In addition decomposed text has become an important attack vector hiding website urls to enable redirecting them from real websites. NFC and precomposing is the right way to handle that along with unicode variants being made to be visually differentiable. So the push toward NFC will probably continue.
Work will continue on this.
If demand warrants, we can add an environment variable to allow the user of Sigil to control this, but then no support or bug reports for searches failing will be accepted if the environment variable approach is employed by the user.
In the meanwhile use Sigil-2.1.0 if you do not want to use NFC conversions.
Last edited by KevinH; 06-28-2024 at 04:54 PM.
|