MobileRead Forums - View Single Post - Will Sigil support the entire Unicode System?

DNSB · 07-13-2021, 04:00 PM

Quote:

Originally Posted by arakish

<?xml version="1.0" encoding="utf-16"?>

the XHTML file saved, but it was completely goobly-doo with Asian ideograms instead of english latin characters. I have an Ebook project that would be fantastic if I could use UTF characters above the UTF-8. Otherwise, I do not look forward to making a bunch of PNGs of the characters I wish to use. But will if I have to... ...

UTF-16 assumes that your entire document is encoded in 2 byte blocks whereas UTF-8 does variable length blocks. When you attempted to force UTF-16, every pair of bytes was interpreted as a single character which would give, uummm, interesting results. I.e. instead of seeing a string of 0x4A, 0x7E as 'An", it would be shown as a single' 䩾' character or a single '繊' character depending on whether you used big or little endian interpretation.

Given that UTF-8 is capable of encoding the entire Unicode character set, either UTF-16 or UTF-32 are not very useful, IMHO.