MobileRead Forums - View Single Post - Will Sigil support the entire Unicode System?

Tex2002ans · 07-13-2021, 02:09 PM

Quote:

Originally Posted by arakish

What I know how to do is write the Unicode characters in this format:

Where/How are you initially writing these documents?

Sigil automatically handles the UTF-16 -> UTF-8 conversion upon opening.

... but it would probably be better to keep your source documents in UTF-8 in the first place.

Quote:

Originally Posted by arakish

I do it using the �[number]; format for HTML web documents. Tried it in Sigil. Worked until saving the file with UTF-16.

Using the UTF-8, and Sigil will not even show the characters.

Thus, next question: Is there software that would convert a Unicode number such as "�" (Waning Gibbous Moon) into the UTF-8 equivalent?

Sigil handles/displays all those characters perfectly fine.

If you typed the HTML Entities in your original code:

🌔 = WAXING GIBBOUS MOON
🌖 = WANING GIBBOUS MOON

Sigil helpfully converts everything into the actual, human-readable characters:

🌔 (U+1F314) = WAXING GIBBOUS MOON
🌖 (U+1F316) = WANING GIBBOUS MOON

All are converted to their actual characters besides:

> = Greater Than
< = Less Than
& = Ampersand
  or   = Non-Breaking Spaces

Quote:

Originally Posted by arakish

⅜ is the Fractional Three-Eighths character, but Sigil will only show them if I use UTF-16 or UTF-32 in the XML tag.

Not a good idea to use Vulgar Fractions.

See my post in 2019: "I'm assuming it's the font's fault, but just in case ..."

Quote:

Originally Posted by arakish

No not any asian script language. I want to use the characters on this Code Chart or this one for example. There are other Code Charts I want to use, but it seems Sigil will only show such with decimal numbers below 1024, perhaps 2048.

You can enter the hex or decimal form, and Sigil will automatically convert to the characters for you...

Or even better:

You can insert the character directly using your OS's Character Map (or similar program): Personally, on Windows, I like to use BabelMap.

Or copy/paste characters from Fileformat.info's Unicode Search. For example, here was my search for "Gibbous Moon".

Quote:

Originally Posted by KevinH

Using rarely seen unicode characters in an epub will almost always require embedding a font that supports it so that readers can show it properly.

I can guarantee a symbol like:

🜊 (U+1F70A) = ALCHEMICAL SYMBOL FOR VINEGAR

doesn't exist in ereader's fonts.

Follow similar code practices like I showed in the Japanese font thread. Do something like:

Code:

Vinegar <span class="alchemy">🜊</span> is an acidic thing.

then embed a font specifically for those symbols.

Symbola is a font that contains many of those obscure symbols.