MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Automatic change of coded HTML Entities into special chars in Sigil 0.9.9 upon saving (https://www.mobileread.com/forums/showthread.php?t=312747)

Barto 11-20-2018 08:20 AM

Automatic change of coded HTML Entities into special chars in Sigil 0.9.9 upon saving
 
1 Attachment(s)
Hi all!
I am now dealing with a very special book using a lot of math chars. Most of them can are embedded in the fonts we use, like ¬, ∈, ∀ or ∃. But some are not, e.g. ℕ (symbol of natural numbers).
So, they are also not visible in readers like ADE, Calibre or on devices.

Now I have used an HTML Entities code for this in Sigil: &#8469[semicolon] (I am writing like this, because you'll see the symbol otherwise)
Sigil presents it as a symbol given above, but after I save the file, Sigil automatically converts the code into a special character such as this: Attachment 167848 (alt="bold, empty N"). This character is not visible in ADE, and I do not know how to force Sigil not to convert code into spec. chars.

Can this be done at all in this version?:help:

Thanks
Bart

Barto 11-20-2018 08:42 AM

Have already found the answer here:
https://www.mobileread.com/forums/sh...d.php?t=277821

Thanks and sorry to bother you.
Bart

DiapDealer 11-20-2018 09:55 AM

No problem. But keep in mind that there's really no difference between the unicode character and the html entity in terms of what ADE is capable of inherently displaying. The glyphs are either included in the available fonts on the reader (or in the app), or they are not. If you're not using embedded fonts for certain characters, using the html entity for those characters will not make them magically show up where the same unicode character fails to properly display. If the glyph is not included in ADE's default fonts, embedding is the only way you can guarantee them to display properly.

In short: Sigil changing your html entity to a unicode character is not the reason the character doesn't display in different rendering engines. They don't display because the font sets in those different engines don't include the glyph for the character you want to show. You need to embed a font that includes all the special characters you require for your content.

Tex2002ans 11-20-2018 04:16 PM

Quote:

Originally Posted by DiapDealer (Post 3776558)
You need to embed a font that includes all the special characters you require for your content.

:thumbsup: For Maths-heavy documents, there's the "STIX Two Text" + "STIX Two Math" fonts:

https://stixfonts.org/

They include every obscure math symbol you'll ever need.

Last year someone had issues with the Alef ℵ in their math document, and I gave examples of some code + font embedding:

https://www.mobileread.com/forums/sh...44#post3575244

Just make sure all your math/variables are properly marked up with classes.

un_pogaz 11-21-2018 04:15 AM

I understand that automatically converting all HTML entities to their Unicode character is more readable.
However, if we have written an HTML entity (and more specifically a Unicode Entity ) it is because we want it as it is.
So seeing them disappear automatically is a bad surprise.
This is especially troublesome for alternative space character ("Narrow No-Break Space" for example) and technical characters.

It would be nice to have an option to disable the automatic conversion of entities completely (or partially: only Unicode Entities are kept, the other ones are converted)
For me, having to ask the user to manually enter the exceptions (without any other alternative) is not a good feature.

DiapDealer 11-21-2018 08:19 AM

It is what it is. There will be no change in this regard. The manual entry of the entities you wish to preserve is the ONLY way we can do it. We didn't decide willy-nilly to convert all entities to characters. The manual method to preserve them WAS the compromise. The only alternative is having no choice whatsoever.

KevinH 11-21-2018 12:13 PM

Then add them to your preseve entities setting and all will be fine. Use only numeric entities for epub3 as named entities are no longer allowed under html5.
Nothing is lost as the conversion to and from entities is a one to one mapping. Your settings will be respected.


Quote:

Originally Posted by un_pogaz (Post 3776876)
I understand that automatically converting all HTML entities to their Unicode character is more readable.
However, if we have written an HTML entity (and more specifically a Unicode Entity ) it is because we want it as it is.
So seeing them disappear automatically is a bad surprise.
This is especially troublesome for alternative space character ("Narrow No-Break Space" for example) and technical characters.

It would be nice to have an option to disable the automatic conversion of entities completely (or partially: only Unicode Entities are kept, the other ones are converted)
For me, having to ask the user to manually enter the exceptions (without any other alternative) is not a good feature.


theducks 11-21-2018 03:10 PM

Quote:

Originally Posted by un_pogaz (Post 3776876)
I understand that automatically converting all HTML entities to their Unicode character is more readable.
However, if we have written an HTML entity (and more specifically a Unicode Entity ) it is because we want it as it is.
So seeing them disappear automatically is a bad surprise.
This is especially troublesome for alternative space character ("Narrow No-Break Space" for example) and technical characters.

It would be nice to have an option to disable the automatic conversion of entities completely (or partially: only Unicode Entities are kept, the other ones are converted)
For me, having to ask the user to manually enter the exceptions (without any other alternative) is not a good feature.

Blame the DEVICE manufactures that decided to NOT support entities any longer. Calibre and Sigil went with the most bulletproof code for those that don't WANT TO LEARN the alternates.

radius 11-27-2018 11:27 AM

If I may be permitted to digress for a moment...

How are people writing up math? I initially thought MathML would be an option, but after trying a few tests with that and SVG, I ended up using MS Word and then creating a png image.

I find anything more complicated than a single, in-one-line equation with some sub- or superscripts is way beyond my HTML coding capabilities.

KevinH 11-27-2018 11:34 AM

There was a nice gui equation editor that was part of OpenOffice/LibreOffice that allowed the user to build the equation and the save the final result to mathml. I think there was a standalone java (not javascript) program that did something similar.

Perhaps one or both still exist. They make creating svg, png, and mathml versions of an equation quite simple.

Sorry I an't remember what they were called but a Google search should help.


Quote:

Originally Posted by radius (Post 3779766)
If I may be permitted to digress for a moment...

How are people writing up math? I initially thought MathML would be an option, but after trying a few tests with that and SVG, I ended up using MS Word and then creating a png image.

I find anything more complicated than a single, in-one-line equation with some sub- or superscripts is way beyond my HTML coding capabilities.


st_albert 11-27-2018 03:22 PM

Quote:

Originally Posted by KevinH (Post 3779771)
There was a nice gui equation editor that was part of OpenOffice/LibreOffice that allowed the user to build the equation and the save the final result to mathml. I think there was a standalone java (not javascript) program that did something similar.

Perhaps one or both still exist. They make creating svg, png, and mathml versions of an equation quite simple.

Sorry I an't remember what they were called but a Google search should help.

It's in LibreOffice Math. From a LO document, open LO help, then use the drop-down box at the top left of the help window, and choose "Libre Office Math."

That'll get you started.

Albert

Tex2002ans 11-27-2018 07:01 PM

Quote:

Originally Posted by radius (Post 3779766)
How are people writing up math? I initially thought MathML would be an option, but after trying a few tests with that and SVG, I ended up using MS Word and then creating a png image.

I wrote a topic on this: Tutorial: Formulas to PNG.

That used LibreOffice Math -> PDF -> PNG.

Nowadays, I use LaTeX -> PDF -> PNG (explained those methods further in the topic).

Doing it this way will allow you to easily generate proper vector + bitmap images directly from the source.

And in the future, as MathML support gets better, you have your equations sitting in a nice source format, and can (easily) convert to MathML.

Note: MathML currently only works in certain readers + newer devices. You have to keep in mind all the old devices out there (and if you want Kindle, that's a no go). So you have to create these bitmap fallback images anyway.

Note #2: At ebookcraft 2018, Peter Krautzberger also gave a workshop "Equations in ebooks" (Slides here, no audio/video online though).

His presentation pretty much came to similar conclusions (needing all the fallbacks because of all the different/ancient devices out there). He discusses slightly different methods/tools, and compatibility tests with different readers/renderers.

jhowell 11-27-2018 10:00 PM

Quote:

Originally Posted by radius (Post 3779766)
How are people writing up math? I initially thought MathML would be an option, but after trying a few tests with that and SVG, I ended up using MS Word and then creating a png image.

I have the latest version of MS word and with that I can highlight an equation in the equation editor, select copy, and then paste it into a text editor (such as notepad). The equation will paste as MathML. I don't know if this works for older versions of Word.

Quote:

Originally Posted by Tex2002ans (Post 3779957)
Note: MathML currently only works in certain readers + newer devices. You have to keep in mind all the old devices out there (and if you want Kindle, that's a no go). So you have to create these bitmap fallback images anyway.

The situation on Kindle is unclear. According to the Amazon Kindle Publishing Guidelines section 9.6 (MathML Support): "Enhanced Typesetting supports MathML." That means KFX format. There is no mention of how MathML is treated in the older (MOBI/KF8) formats. I assume a fallback would be needed but there is no information on how to do that.

Tex2002ans 11-27-2018 10:50 PM

Quote:

Originally Posted by jhowell (Post 3780031)
I have the latest version of MS word and with that I can highlight an equation in the equation editor, select copy, and then paste it into a text editor (such as notepad). The equation will paste as MathML. I don't know if this works for older versions of Word.

1. Can only Copy/Paste out MathML in newer versions of Word (>2007?), when they introduced the newer Equation Editor. I think you can only copy/paste one-at-a-time.......

When you open your DOCX with equations + Save As HTML, Word exports tiny PNGs. (According to what I know you don't even have control over final image size, etc.)

Toxaris can probably pop in and explain details... (I think automated MathML export is only available via APIs and can't be exported directly within Word? But don't quote me on that. See "How to parse mathML in output of WordOpenXML?" on Stack Exchange.)

Note: (I highly recommend reading all the articles by MurrayS3 on Microsoft's Blog about OfficeMath. He's one of the chief engineers for adding/enhancing the Equation Editor over the decades, and has all the technical details.)

2. Toxaris's EPUBTools can also export MathML+SVG for you. This is the easiest/best way I know of currently.

The subpar thing about those two workflows is that all the equations will be numbered sequentially:
  • image001.png for Word.
    • (I'm assuming equation images will get smushed together with normal images?)
  • Equation001.mml + Equation001.svg for EPUBTools

When you fully control the entire workflow from the start, you can give each equation human-readable names (VERY important when going to edit/add/change books in the future), and can control ALL the variables separately:
  • Font
  • DPI
  • Mathematical formatting conventions
    • Bold or italic vectors
  • [...].

This allows you to easily regenerate whatever materials you need.

Quote:

Originally Posted by jhowell (Post 3780031)
The situation on Kindle is unclear. According to the Amazon Kindle Publishing Guidelines section 9.6 (MathML Support): "Enhanced Typesetting supports MathML." That means KFX format. There is no mention of how MathML is treated in the older (MOBI/KF8) formats. I assume a fallback would be needed but there is no information on how to do that.

No normal people can get access to it, similar to audio/video enhancements (remember "Kindle in Motion"... only 28 books actually available). You would have to be one of the huge publishers that can FTP files directly onto Amazon's servers.

It's extremely hard to even get real info from Amazon about MathML (What are the EXACT books that use this? And how is it done?). Here's what the Digital Reader said about it in April:

Quote:

Yesterday's changelog for the Kindle for PC update mentioned that you could use the app to zoom in on math equations, and now we know why.

[...]

Edit: Actually, it doesn't support MathML at all. Amazon was being misleading when they said it was supported; what Amazon actually does is convert the equation to an image and display that.

Quote:

what a joke to call "we'll run MathJax in an unspecified configuration and the generate jpg out of it" anything like "support". Makes my point from Ebookcraft that you can simply do this yourself and retain better control.

— Peter Krautzberger (@pkrautz) April 20, 2018

And according to what I've seen, absolutely nothing changed. (Although it is hard to even pin down technical info, because the blogs are FLOODED with parroting Amazon's marketing material about it.)

From what I've gathered over the months, it only works on the Kindle for PC with NVDA installed. And it uses a horrendously outdated/buggy/crippled MathJax.

And there's STILL no way to limit the files to only modern devices (or even Kindle for PC only). You'll STILL have to code all the fallbacks for normal KF8+KF7.

Overall, I agree with Peter's quote above. To say "it's supported" is a joke.

jhowell 11-27-2018 11:23 PM

Quote:

Originally Posted by Tex2002ans (Post 3780052)
Overall, I agree with Peter's quote above. To say "it's supported" is a joke.

To me it looks like they did just enough to mark off “MathML support” on a checklist without actually doing enough to make it useful.


All times are GMT -4. The time now is 09:19 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.