Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 07-13-2021, 12:46 PM   #1
arakish
Member
arakish can extract oil from cheesearakish can extract oil from cheesearakish can extract oil from cheesearakish can extract oil from cheesearakish can extract oil from cheesearakish can extract oil from cheesearakish can extract oil from cheesearakish can extract oil from cheese
 
Posts: 14
Karma: 1064
Join Date: Jul 2018
Device: PC
Will Sigil support the entire Unicode System?

Did searches and found nothing.

Notice that the XHTML files in Sigil have this opening XML tag:

<?xml version="1.0" encoding="utf-8"?>

Will Sigil support these opening tags?

<?xml version="1.0" encoding="utf-16"?>

<?xml version="1.0" encoding="utf-32"?>

When I tried using this tag:

<?xml version="1.0" encoding="utf-16"?>

the XHTML file saved, but it was completely goobly-doo with Asian ideograms instead of english latin characters. I have an Ebook project that would be fantastic if I could use UTF characters above the UTF-8. Otherwise, I do not look forward to making a bunch of PNGs of the characters I wish to use. But will if I have to... ...

Any help is much appreciated.

Thanks.

adeg
arakish is offline   Reply With Quote
Old 07-13-2021, 12:53 PM   #2
Sarmat89
Evangelist
Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.Sarmat89 ought to be getting tired of karma fortunes by now.
 
Posts: 482
Karma: 2267928
Join Date: Nov 2015
Device: none
UTF-8 covers all defined Unicode codepoints. Just convert your source to UTF-8.
Sarmat89 is offline   Reply With Quote
Old 07-13-2021, 01:05 PM   #3
hobnail
Running with scissors
hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.
 
Posts: 1,552
Karma: 14325282
Join Date: Nov 2019
Device: none
https://stackoverflow.com/questions/...-16-and-utf-32
hobnail is offline   Reply With Quote
Old 07-13-2021, 01:07 PM   #4
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by arakish View Post
Did searches and found nothing.
Sigil supports reading utf-16 files, but'll save most files as utf-8 files.
For more information, see this thread.
Doitsu is offline   Reply With Quote
Old 07-13-2021, 01:13 PM   #5
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
When you import or load an epub into Sigil it will automatically grok utf-16 (and many other encodings) and convert it to utf-8 which is now an industry standard. The problem with any other utf- encoding is that they are endian dependent (little vs big endian). So you would need to specify either utf-16 little or utf-16 big and then use the appropriate Byte Order Mark (to indicate endianness).

There is no such thing as characters above the utf-8 code point encodings. Utf-8 can represent the full range unicode codepoints.
KevinH is offline   Reply With Quote
Old 07-13-2021, 01:32 PM   #6
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by arakish View Post
Otherwise, I do not look forward to making a bunch of PNGs of the characters I wish to use. But will if I have to.
???

You can just use the actual characters and embed an Asian font if needed.

Just use proper HTML + mark your languages properly.

I even wrote a tutorial/thread about this a few months ago:

"Japanese characters not showing up on some devices"

In your case, since the entire book is going to be Japanese (or some other Asian language), you'll mark the lang + xml:lang in your <html>.

So where every chapter's file has this:

Code:
<html xmlns="http://www.w3.org/1999/xhtml">
you'll change to this:

Code:
<html xmlns="http://www.w3.org/1999/xhtml" lang="ja" xml:lang="ja">
This says: "Hey, this HTML file is written in Japanese".
Tex2002ans is offline   Reply With Quote
Old 07-13-2021, 01:33 PM   #7
arakish
Member
arakish can extract oil from cheesearakish can extract oil from cheesearakish can extract oil from cheesearakish can extract oil from cheesearakish can extract oil from cheesearakish can extract oil from cheesearakish can extract oil from cheesearakish can extract oil from cheese
 
Posts: 14
Karma: 1064
Join Date: Jul 2018
Device: PC
What I know how to do is write the Unicode characters in this format:

⅜ is the Fractional Three-Eighths character, but Sigil will only show them if I use UTF-16 or UTF-32 in the XML tag.

I do it using the &#0[number]; format for HTML web documents. Tried it in Sigil. Worked until saving the file with UTF-16.

Using the UTF-8, and Sigil will not even show the characters.

Thus, next question: Is there software that would convert a Unicode number such as "�" (Waning Gibbous Moon) into the UTF-8 equivalent?

Additionally, with only the UTF-8 attribute, Sigil will only show its own Special Characters. None others...

I'm lost as for what to do. Been a Geologist/Volcanologist my whole life and now started writing books using Sigil (great software by the way...).

adeg
arakish is offline   Reply With Quote
Old 07-13-2021, 01:39 PM   #8
arakish
Member
arakish can extract oil from cheesearakish can extract oil from cheesearakish can extract oil from cheesearakish can extract oil from cheesearakish can extract oil from cheesearakish can extract oil from cheesearakish can extract oil from cheesearakish can extract oil from cheese
 
Posts: 14
Karma: 1064
Join Date: Jul 2018
Device: PC
Quote:
Originally Posted by Tex2002ans View Post
In your case, since the entire book is going to be Japanese (or some other Asian language), you'll mark the lang + xml:lang in your <html>.
No not any asian script language. I want to use the characters on this Code Chart or this one for example. There are other Code Charts I want to use, but it seems Sigil will only show such with decimal numbers below 1024, perhaps 2048.

adeg
arakish is offline   Reply With Quote
Old 07-13-2021, 01:41 PM   #9
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
*All* unicode characters are representable in utf-8. If a character does not appear for some reason the issue is that the font being used does not support the glyphs for that character.

Very few fonts support all the characters in unicode as many are rarely if ever used since they are for dead languages, etc.

So you can use Sigil to embed a font thatdoes support the specific characters you desire. The utf-8 vs utf-16 vs utf-32 has nothing really to do with that.
KevinH is offline   Reply With Quote
Old 07-13-2021, 01:50 PM   #10
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Note named character entities are only supported in epub2. epub3 supports only numeric entities. You should be able to use a numeric entity like &#xXXXX; to represent the unicode codepoint (in hex XXXX). But that character may notappear if the font being used does not support it.

Using rarely seen unicode characters in an epub will almost always require embedding a font that supports it so that readers can show it properly.
KevinH is offline   Reply With Quote
Old 07-13-2021, 02:09 PM   #11
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by arakish View Post
What I know how to do is write the Unicode characters in this format:
Where/How are you initially writing these documents?

Sigil automatically handles the UTF-16 -> UTF-8 conversion upon opening.

... but it would probably be better to keep your source documents in UTF-8 in the first place.

Quote:
Originally Posted by arakish View Post
I do it using the �[number]; format for HTML web documents. Tried it in Sigil. Worked until saving the file with UTF-16.

Using the UTF-8, and Sigil will not even show the characters.

Thus, next question: Is there software that would convert a Unicode number such as "�" (Waning Gibbous Moon) into the UTF-8 equivalent?
Sigil handles/displays all those characters perfectly fine.

If you typed the HTML Entities in your original code:
  • &#x1f314; = WAXING GIBBOUS MOON
  • &#x1f316; = WANING GIBBOUS MOON

Sigil helpfully converts everything into the actual, human-readable characters:
  • 🌔 (U+1F314) = WAXING GIBBOUS MOON
  • 🌖 (U+1F316) = WANING GIBBOUS MOON

All are converted to their actual characters besides:
  • &gt; = Greater Than
  • &lt; = Less Than
  • &amp; = Ampersand
  • &nbsp; or &#160; = Non-Breaking Spaces

Quote:
Originally Posted by arakish View Post
⅜ is the Fractional Three-Eighths character, but Sigil will only show them if I use UTF-16 or UTF-32 in the XML tag.
Not a good idea to use Vulgar Fractions.

See my post in 2019: "I'm assuming it's the font's fault, but just in case ..."

Quote:
Originally Posted by arakish View Post
No not any asian script language. I want to use the characters on this Code Chart or this one for example. There are other Code Charts I want to use, but it seems Sigil will only show such with decimal numbers below 1024, perhaps 2048.
You can enter the hex or decimal form, and Sigil will automatically convert to the characters for you...

Or even better:

You can insert the character directly using your OS's Character Map (or similar program): Personally, on Windows, I like to use BabelMap.

Or copy/paste characters from Fileformat.info's Unicode Search. For example, here was my search for "Gibbous Moon".

Quote:
Originally Posted by KevinH View Post
Using rarely seen unicode characters in an epub will almost always require embedding a font that supports it so that readers can show it properly.


I can guarantee a symbol like:

🜊 (U+1F70A) = ALCHEMICAL SYMBOL FOR VINEGAR

doesn't exist in ereader's fonts.

Follow similar code practices like I showed in the Japanese font thread. Do something like:

Code:
Vinegar <span class="alchemy">🜊</span> is an acidic thing.
then embed a font specifically for those symbols.

Symbola is a font that contains many of those obscure symbols.

Last edited by Tex2002ans; 07-13-2021 at 03:12 PM.
Tex2002ans is offline   Reply With Quote
Old 07-13-2021, 04:00 PM   #12
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,400
Karma: 145435140
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by arakish View Post
<?xml version="1.0" encoding="utf-16"?>

the XHTML file saved, but it was completely goobly-doo with Asian ideograms instead of english latin characters. I have an Ebook project that would be fantastic if I could use UTF characters above the UTF-8. Otherwise, I do not look forward to making a bunch of PNGs of the characters I wish to use. But will if I have to... ...
UTF-16 assumes that your entire document is encoded in 2 byte blocks whereas UTF-8 does variable length blocks. When you attempted to force UTF-16, every pair of bytes was interpreted as a single character which would give, uummm, interesting results. I.e. instead of seeing a string of 0x4A, 0x7E as 'An", it would be shown as a single' 䩾' character or a single '繊' character depending on whether you used big or little endian interpretation.

Given that UTF-8 is capable of encoding the entire Unicode character set, either UTF-16 or UTF-32 are not very useful, IMHO.
DNSB is offline   Reply With Quote
Old 07-13-2021, 04:05 PM   #13
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by arakish View Post
⅜ is the Fractional Three-Eighths character, but Sigil will only show them if I use UTF-16 or UTF-32 in the XML tag.
Sigil has an Insert Special Character tool that allows you to insert vulgar fractions and other special characters.



Quote:
Originally Posted by arakish View Post
Thus, next question: Is there software that would convert a Unicode number such as "🌖" (Waning Gibbous Moon) into the UTF-8 equivalent?
You could define 4 clipbar entries for the moon phases. For details, see this post. (Simply paste the actual characters into the Name and Text fields.)

Doitsu is offline   Reply With Quote
Old 07-13-2021, 04:28 PM   #14
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Doitsu View Post
Sigil has an Insert Special Character tool that allows you to insert vulgar fractions and other special characters.
Sigil's Insert > Special Character window only has a handful of special characters in there. (I believe it's every named HTML 2.0 Entity?)

Calibre Editor's Edit > Insert Special Character... is absolutely fantastic.

It displays all Unicode characters by category, and even lets you search:

Click image for larger version

Name:	Calibre.-.Insert.Character.-.Gibbous.Moons.png
Views:	190
Size:	9.2 KB
ID:	188174 Click image for larger version

Name:	Calibre.-.Insert.Character.-.Alchemy.png
Views:	176
Size:	20.1 KB
ID:	188175

Absolutely amazing, and beats the pants off of most paid/professional programs too!

LibreOffice's Insert > Special Character is pretty great too, but it only searches characters within the selected font (so Symbola is a good choice there):

Click image for larger version

Name:	LibreOffice.-.Insert.Character.-.Alchemy.png
Views:	191
Size:	17.2 KB
ID:	188176

Microsoft Word's Insert > Symbol is absolute crap:

Click image for larger version

Name:	Microsoft.Word.-.Insert.Symbol.png
Views:	161
Size:	9.2 KB
ID:	188178

BabelMap also lets you search:

Click image for larger version

Name:	BabelMap.-.Search.-.Alchemy.png
Views:	197
Size:	34.7 KB
ID:	188177

and the great thing about BabelMap is you can Fonts > Font Coverage and show exactly which fonts on your computer have those obscure symbols.

Quote:
Originally Posted by Doitsu View Post
You could define 4 clipbar entries for the moon phases. For details, see this post. (Simply paste the actual characters into the Name and Text fields.)
That works too.

Or clips to insert the numerical codes, then Tools > Reformat > Mend and Prettify HTML Files, and Sigil will convert all those numericals into the actual character (like the gibbous example I gave above).

Last edited by Tex2002ans; 07-13-2021 at 05:03 PM.
Tex2002ans is offline   Reply With Quote
Reply

Tags
unicode, utf, utf-16, utf-32, utf-8


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Clara Unicode support in system fonts Erekle Kobo Reader 3 04-26-2021 03:28 PM
Unicode Support in Tolino pubudupg Tolino 4 03-20-2021 07:09 AM
PRS-950 How to mod the system fonts for Unicode website surfing ? Binh.nt Sony Reader Dev Corner 0 06-17-2012 10:12 PM
Testers Wanted: Cherokee has its own writing system with a different unicode range .. Waya ePub 1 10-22-2011 05:05 AM
Unicode support in K3 tomsem Amazon Kindle 22 09-02-2010 04:14 PM


All times are GMT -4. The time now is 12:57 AM.


MobileRead.com is a privately owned, operated and funded community.