12-26-2022, 05:04 PM | #1 |
Zealot
Posts: 121
Karma: 10
Join Date: Nov 2015
Location: Europe EEC
Device: none
|
Difficulty with Numerical entities and Prettify in 1.9.20
I was showing a friend how they could improve the appearance of a solid block of text and html tags covering many paragraphs by the use of Tools > Reformat HTML > Mend and Prettify all HTML files. I was using as an example one of my epubs (epub2) in which I had removed the spacing and blank lines to provide a demo model.
This example contained some non-breaking spaces either inserted using Insert > Special Character > nbsp or with the numerical entity X#160; manually typed in. It also contained an 'ampersand' written as the numerical entity X#38; (I had been using X#38; instead of & because Epubcheck raised objections to & ) [I have had to replace the & in the numerical entities by an X, otherwise, the whole entity disappears when I preview the post even if I wrap it in quotes or code tags] When I hit the Prettify command, the prettification occurred right enough with spaces and blank lines inserted but the effect on the entities was strange. nbsp which had been displayed in code view as _. , became X#38;#160; and in preview panel it was displayed as X#160; X#38; which had been displayed in code view as X#38; and in preview as &, became X#38;amp; in code view and was displayed in preview as & I thought I had understood the Sigil User Guide for 1.9, section Preferences, Preserve Entities. I had entered both X#160; and X#38; in the Preserve Entities section in order to prevent them from being converted to Unicode chars. If I remove those entries from Preferences, Preserve Entities, the numerical entities are converted to _. and & which is what I had been trying to prevent by following, as I thought, the instructions in the User Guide. I get the same in UbuntuStudio and in Windows 10. Is there something I have misunderstood or is there a bug here? |
12-26-2022, 06:24 PM | #2 |
Sigil Developer
Posts: 7,645
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Epubcheck should never be complaining about proper use of & for & unless is it used improperly. As it is a xml entity, it is always legal and using a numeric version should never be required.
So my guess is your are misusing it. Perhaps using it as a direct child of the body tag or using it as a direct child of a tag that does not hold text. Please show the exact error message from epubcheck and the corresponding line or lines of source code. Preserve Entities purposely does not handle the unneeded numeric equivalents for the standard xml entities as the standard xml entities are perfectly legal xml and xhtml. Preserve enties requires an entity not a unicode character escape. So numeric entities anyways have a # and in hexadecimal form are followed by X. For example: the named entity is only valid in epub2. epub3 requires numeric entities instead of named entities (other than the xml standard entities). So under epub3, nbsp would be written as: ( ignore the extra spaces ) Code:
& # 1 6 0 ; & # x A 0 ; The standard xml entities include & < > " ' Do not use Preserve Entities to change any of these standard xml entities to something else, especially the "&" as it is the escape character for all entities. The CodeView Editor uses XHTML syntax colouring and highlighting to attempt to indicate a number of pure whitespace related unicode characters that are non-breaking in nature. If the preserve entities dialog properly includes them, they are converted to their entity equivalents as a last pass from both mend and prettify. Last edited by KevinH; 12-27-2022 at 10:10 AM. |
Advert | |
|
12-27-2022, 01:37 AM | #3 |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Hopefully, the following examples, will help you better understand your problem:
Code:
<p>Johnson & Johnson</p>
An ampersand on its own needs to be written as & Code:
<p>Johnson & Johnson</p>
Code:
<p>Johnson &amp; Johnson</p>
Code:
Johnson & Johnson
|
12-27-2022, 09:30 AM | #4 |
Zealot
Posts: 121
Karma: 10
Join Date: Nov 2015
Location: Europe EEC
Device: none
|
KevinH and Doitsu, thank you both for your replies.
My mistake was to include the numerical entity for ampersand in the Preserve Entities section of preferences. I had not realised that & was a standard XML entity. And, of course, with the ampersand being part of all numerical entities, my inclusion in the Preserve Entities section guaranteed bad results. Once I removed that value (& # 38 ; ) from the list, all works ok when prettify and mend are activated. I had taken to using the numerical entity for both ampersand and nbsp after I read that they were required for epub3 but not banned from epub2. I thought it might be better to standardise on the use of numerical entities only. I did have occasions about a couple of years ago when checks on a completed epub called the use of & into question. I had remembered that as resulting from the use of epubcheck but I cannot reproduce it today. So I must have been mistaken in that idea. My concern was being afraid to recommend the use of prettify to friends in case it caused them any problems. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] Prettify Cover | m1d1 | Plugins | 63 | 03-19-2024 08:40 AM |
Prettify removes space between bold an italic inside figcaption | repilo | Sigil | 7 | 05-11-2022 06:35 PM |
Prettify HTML questions | n9qqk | Sigil | 13 | 03-06-2018 12:47 PM |
decimal entities in ePub instead of character entities | epub4ever | Calibre | 4 | 04-20-2012 02:27 AM |
changing numerical position | hombre | Library Management | 1 | 09-09-2011 11:46 AM |