You seem to misunderstand.
The epub3 spec did not remove it, they focused it on urls and file paths where it matters most for string comparison and matching for the epub links to function at all.
In order for search to work in general, all of the text being searched and all of the search strings need to form text characters from unicode using an identical sequence. If more than 1 sequence creates the same character, the mixing them in the same text causes problems as does searching with one sequence in find but the text uses the other sequence for the same word or words.
And in either form, *no* text is ever lost or unreadable. It will be 100% searchable only when the search string and all the text to be searched are in the same form.
As for Sigil, and Calibre going away, the code to convert a file between normalization forms in python is trivial (less than 5 lines) same in Qt and it is available in all good string libraries.
And finally, the world seems to be standardizing on NFC forms for just the reasons above.
So no worries.
Last edited by KevinH; 06-27-2024 at 09:25 AM.
|