Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 12-26-2022, 05:04 PM   #1
philja
Zealot
philja began at the beginning.
 
Posts: 121
Karma: 10
Join Date: Nov 2015
Location: Europe EEC
Device: none
Difficulty with Numerical entities and Prettify in 1.9.20

I was showing a friend how they could improve the appearance of a solid block of text and html tags covering many paragraphs by the use of Tools > Reformat HTML > Mend and Prettify all HTML files. I was using as an example one of my epubs (epub2) in which I had removed the spacing and blank lines to provide a demo model.

This example contained some non-breaking spaces either inserted using Insert > Special Character > nbsp or with the numerical entity X#160; manually typed in. It also contained an 'ampersand' written as the numerical entity X#38; (I had been using X#38; instead of & because Epubcheck raised objections to & )

[I have had to replace the & in the numerical entities by an X, otherwise, the whole entity disappears when I preview the post even if I wrap it in quotes or code tags]

When I hit the Prettify command, the prettification occurred right enough with spaces and blank lines inserted but the effect on the entities was strange.

nbsp which had been displayed in code view as _. , became X#38;#160; and in preview panel it was displayed as X#160;

X#38; which had been displayed in code view as X#38; and in preview as &, became X#38;amp; in code view and was displayed in preview as &

I thought I had understood the Sigil User Guide for 1.9, section Preferences, Preserve Entities. I had entered both X#160; and X#38; in the Preserve Entities section in order to prevent them from being converted to Unicode chars.

If I remove those entries from Preferences, Preserve Entities, the numerical entities are converted to _. and & which is what I had been trying to prevent by following, as I thought, the instructions in the User Guide.

I get the same in UbuntuStudio and in Windows 10.

Is there something I have misunderstood or is there a bug here?
philja is offline   Reply With Quote
Old 12-26-2022, 06:24 PM   #2
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,645
Karma: 5433388
Join Date: Nov 2009
Device: many
Epubcheck should never be complaining about proper use of & for & unless is it used improperly. As it is a xml entity, it is always legal and using a numeric version should never be required.

So my guess is your are misusing it. Perhaps using it as a direct child of the body tag or using it as a direct child of a tag that does not hold text.

Please show the exact error message from epubcheck and the corresponding line or lines of source code.

Preserve Entities purposely does not handle the unneeded numeric equivalents for the standard xml entities as the standard xml entities are perfectly legal xml and xhtml.

Preserve enties requires an entity not a unicode character escape. So numeric entities anyways have a # and in hexadecimal form are followed by X.

For example: the named entity   is only valid in epub2. epub3 requires numeric entities instead of named entities (other than the xml standard entities). So under epub3, nbsp would be written as: ( ignore the extra spaces )

Code:
& # 1 6 0 ;
& # x A 0 ;

The standard xml entities include & < > " '

Do not use Preserve Entities to change any of these standard xml entities to something else, especially the "&" as it is the escape character for all entities.

The CodeView Editor uses XHTML syntax colouring and highlighting to attempt to indicate a number of pure whitespace related unicode characters that are non-breaking in nature.

If the preserve entities dialog properly includes them, they are converted to their entity equivalents as a last pass from both mend and prettify.

Last edited by KevinH; 12-27-2022 at 10:10 AM.
KevinH is offline   Reply With Quote
Advert
Old 12-27-2022, 01:37 AM   #3
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Hopefully, the following examples, will help you better understand your problem:

Code:
<p>Johnson & Johnson</p>
will cause error messages, because an ampersand is a special character in html files.

An ampersand on its own needs to be written as &amp;

Code:
<p>Johnson &amp; Johnson</p>
You most likely ended up with this:

Code:
<p>Johnson &amp;amp; Johnson</p>
This is valid, but most likely not what you wanted, because it'll be rendered as:

Code:
Johnson &amp; Johnson
To fix this, simply search for &amp; followed by amp; and replace it with a regular ampersand.
Doitsu is offline   Reply With Quote
Old 12-27-2022, 09:30 AM   #4
philja
Zealot
philja began at the beginning.
 
Posts: 121
Karma: 10
Join Date: Nov 2015
Location: Europe EEC
Device: none
KevinH and Doitsu, thank you both for your replies.

My mistake was to include the numerical entity for ampersand in the Preserve Entities section of preferences. I had not realised that &amp; was a standard XML entity. And, of course, with the ampersand being part of all numerical entities, my inclusion in the Preserve Entities section guaranteed bad results. Once I removed that value (& # 38 ; ) from the list, all works ok when prettify and mend are activated.

I had taken to using the numerical entity for both ampersand and nbsp after I read that they were required for epub3 but not banned from epub2. I thought it might be better to standardise on the use of numerical entities only.

I did have occasions about a couple of years ago when checks on a completed epub called the use of &amp; into question. I had remembered that as resulting from the use of epubcheck but I cannot reproduce it today. So I must have been mistaken in that idea.

My concern was being afraid to recommend the use of prettify to friends in case it caused them any problems.
philja is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] Prettify Cover m1d1 Plugins 63 03-19-2024 08:40 AM
Prettify removes space between bold an italic inside figcaption repilo Sigil 7 05-11-2022 06:35 PM
Prettify HTML questions n9qqk Sigil 13 03-06-2018 12:47 PM
decimal entities in ePub instead of character entities epub4ever Calibre 4 04-20-2012 02:27 AM
changing numerical position hombre Library Management 1 09-09-2011 11:46 AM


All times are GMT -4. The time now is 08:08 PM.


MobileRead.com is a privately owned, operated and funded community.