07-13-2021, 12:46 PM | #1 |
Member
Posts: 14
Karma: 1064
Join Date: Jul 2018
Device: PC
|
Will Sigil support the entire Unicode System?
Did searches and found nothing.
Notice that the XHTML files in Sigil have this opening XML tag: <?xml version="1.0" encoding="utf-8"?> Will Sigil support these opening tags? <?xml version="1.0" encoding="utf-16"?> <?xml version="1.0" encoding="utf-32"?> When I tried using this tag: <?xml version="1.0" encoding="utf-16"?> the XHTML file saved, but it was completely goobly-doo with Asian ideograms instead of english latin characters. I have an Ebook project that would be fantastic if I could use UTF characters above the UTF-8. Otherwise, I do not look forward to making a bunch of PNGs of the characters I wish to use. But will if I have to... ... Any help is much appreciated. Thanks. adeg |
07-13-2021, 12:53 PM | #2 |
Evangelist
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
|
UTF-8 covers all defined Unicode codepoints. Just convert your source to UTF-8.
|
07-13-2021, 01:05 PM | #3 |
Running with scissors
Posts: 1,552
Karma: 14325282
Join Date: Nov 2019
Device: none
|
|
07-13-2021, 01:07 PM | #4 |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Sigil supports reading utf-16 files, but'll save most files as utf-8 files.
For more information, see this thread. |
07-13-2021, 01:13 PM | #5 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
When you import or load an epub into Sigil it will automatically grok utf-16 (and many other encodings) and convert it to utf-8 which is now an industry standard. The problem with any other utf- encoding is that they are endian dependent (little vs big endian). So you would need to specify either utf-16 little or utf-16 big and then use the appropriate Byte Order Mark (to indicate endianness).
There is no such thing as characters above the utf-8 code point encodings. Utf-8 can represent the full range unicode codepoints. |
07-13-2021, 01:32 PM | #6 | |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
You can just use the actual characters and embed an Asian font if needed. Just use proper HTML + mark your languages properly. I even wrote a tutorial/thread about this a few months ago: "Japanese characters not showing up on some devices" In your case, since the entire book is going to be Japanese (or some other Asian language), you'll mark the lang + xml:lang in your <html>. So where every chapter's file has this: Code:
<html xmlns="http://www.w3.org/1999/xhtml"> Code:
<html xmlns="http://www.w3.org/1999/xhtml" lang="ja" xml:lang="ja"> |
|
07-13-2021, 01:33 PM | #7 |
Member
Posts: 14
Karma: 1064
Join Date: Jul 2018
Device: PC
|
What I know how to do is write the Unicode characters in this format:
⅜ is the Fractional Three-Eighths character, but Sigil will only show them if I use UTF-16 or UTF-32 in the XML tag. I do it using the �[number]; format for HTML web documents. Tried it in Sigil. Worked until saving the file with UTF-16. Using the UTF-8, and Sigil will not even show the characters. Thus, next question: Is there software that would convert a Unicode number such as "" (Waning Gibbous Moon) into the UTF-8 equivalent? Additionally, with only the UTF-8 attribute, Sigil will only show its own Special Characters. None others... I'm lost as for what to do. Been a Geologist/Volcanologist my whole life and now started writing books using Sigil (great software by the way...). adeg |
07-13-2021, 01:39 PM | #8 | |
Member
Posts: 14
Karma: 1064
Join Date: Jul 2018
Device: PC
|
Quote:
adeg |
|
07-13-2021, 01:41 PM | #9 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
*All* unicode characters are representable in utf-8. If a character does not appear for some reason the issue is that the font being used does not support the glyphs for that character.
Very few fonts support all the characters in unicode as many are rarely if ever used since they are for dead languages, etc. So you can use Sigil to embed a font thatdoes support the specific characters you desire. The utf-8 vs utf-16 vs utf-32 has nothing really to do with that. |
07-13-2021, 01:50 PM | #10 |
Sigil Developer
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Note named character entities are only supported in epub2. epub3 supports only numeric entities. You should be able to use a numeric entity like &#xXXXX; to represent the unicode codepoint (in hex XXXX). But that character may notappear if the font being used does not support it.
Using rarely seen unicode characters in an epub will almost always require embedding a font that supports it so that readers can show it properly. |
07-13-2021, 02:09 PM | #11 | |||||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Sigil automatically handles the UTF-16 -> UTF-8 conversion upon opening. ... but it would probably be better to keep your source documents in UTF-8 in the first place. Quote:
If you typed the HTML Entities in your original code:
Sigil helpfully converts everything into the actual, human-readable characters:
All are converted to their actual characters besides:
Quote:
See my post in 2019: "I'm assuming it's the font's fault, but just in case ..." Quote:
Or even better: You can insert the character directly using your OS's Character Map (or similar program): Personally, on Windows, I like to use BabelMap. Or copy/paste characters from Fileformat.info's Unicode Search. For example, here was my search for "Gibbous Moon". Quote:
I can guarantee a symbol like: 🜊 (U+1F70A) = ALCHEMICAL SYMBOL FOR VINEGAR doesn't exist in ereader's fonts. Follow similar code practices like I showed in the Japanese font thread. Do something like: Code:
Vinegar <span class="alchemy">🜊</span> is an acidic thing. Symbola is a font that contains many of those obscure symbols. Last edited by Tex2002ans; 07-13-2021 at 03:12 PM. |
|||||
07-13-2021, 04:00 PM | #12 | |
Bibliophagist
Posts: 35,400
Karma: 145435140
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
Given that UTF-8 is capable of encoding the entire Unicode character set, either UTF-16 or UTF-32 are not very useful, IMHO. |
|
07-13-2021, 04:05 PM | #13 | ||
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Quote:
|
||
07-13-2021, 04:28 PM | #14 | ||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Calibre Editor's Edit > Insert Special Character... is absolutely fantastic. It displays all Unicode characters by category, and even lets you search: Absolutely amazing, and beats the pants off of most paid/professional programs too! LibreOffice's Insert > Special Character is pretty great too, but it only searches characters within the selected font (so Symbola is a good choice there): Microsoft Word's Insert > Symbol is absolute crap: BabelMap also lets you search: and the great thing about BabelMap is you can Fonts > Font Coverage and show exactly which fonts on your computer have those obscure symbols. Quote:
Or clips to insert the numerical codes, then Tools > Reformat > Mend and Prettify HTML Files, and Sigil will convert all those numericals into the actual character (like the gibbous example I gave above). Last edited by Tex2002ans; 07-13-2021 at 05:03 PM. |
||
Tags |
unicode, utf, utf-16, utf-32, utf-8 |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Clara Unicode support in system fonts | Erekle | Kobo Reader | 3 | 04-26-2021 03:28 PM |
Unicode Support in Tolino | pubudupg | Tolino | 4 | 03-20-2021 07:09 AM |
PRS-950 How to mod the system fonts for Unicode website surfing ? | Binh.nt | Sony Reader Dev Corner | 0 | 06-17-2012 10:12 PM |
Testers Wanted: Cherokee has its own writing system with a different unicode range .. | Waya | ePub | 1 | 10-22-2011 05:05 AM |
Unicode support in K3 | tomsem | Amazon Kindle | 22 | 09-02-2010 04:14 PM |