|  11-18-2019, 01:36 AM | #1 | 
| just an egg            Posts: 1,848 Karma: 8006346 Join Date: Mar 2015 Device: Kindle, iOS | 
				
				Entities oddities / 0.9.991 bug?
			 
			
			I may have found another bug in 0.9.991, but I'm struggling in how to describe it. I have Preferences set to Mend on Open and Preserve Entities only for #160. I have one epub (so far) that, when loaded into 0.9.991, all the character entities (quotes, apostrophes, etc.) persist, even though Prefs are set to preserve only #160. Running "Mend All HTML Files" fixes this: all the character entities are properly converted, and all is well. I have another epub where the quotes and apostrophes are converted, but the non-breaking spaces show up as #x00A0. Again, running Mend fixes this: #x00A0 gets converted to #160 and all is well. Now, when I load the exact same epubs into 0.9.18, all the character entities (except #160) are automatically and properly converted without my having to do anything extra. So why is 0.9.991 struggling with these character entities, requiring me to run Mend manually, when 0.9.18 is handling it all seamlessly and automatically right off the bat? Note1: the first epub where none of the character entities converted had <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> which was corrected on Mend along with the character entities. Note2: Both of these epubs originated as AZW3, brought into Sigil via KindleImport plugin. At first I thought it was a KindleImport plugin problem, but when I saved the epub then re-opened them, the character entities continued to persist. But manually running Mend fixed things. So it seems like Mend wasn't being run on Open, despite the Preference settings? I will play with this more tomorrow to see if I can find more clues, but I wanted to throw this out there. Also, if anyone has suggestions for what I should look for, let me know. | 
|   |   | 
|  11-18-2019, 08:22 AM | #2 | 
| Sigil Developer            Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			Sigil 0.9.991 is not struggling with character entities.  Remember the goal of Sigil 0.9.991 is to load any epub "as-is", in other words make no changes.   What you are seeing is how that epub currently handles its entities. So as long as the xhtml files are well-formed (or do not have mend on open set), Sigil will not touch the files. The Preserve Entities are only used by Mend on Sigil as the gumbo parser that Sigil uses to mend, removes all entities and converts them to their character equivalent. After mending, your Preserve Entities settings are used to determine which ones should be converted back to entities. So all of this is expected/desired behaviour ... in other words we do not want Sigil to touch or alter valid html source code. So if you want to only use your entities set, just run Mend as you discovered. Mend is also run to update xhtml links, so any rename or move, will effectively do the same thing. As will Standardizing to Sigil norm. Hope this helps, Kevin | 
|   |   | 
|  11-18-2019, 01:31 PM | #3 | 
| just an egg            Posts: 1,848 Karma: 8006346 Join Date: Mar 2015 Device: Kindle, iOS | 
			
			I am confused now. I have Preferences set to Mend on Open. But you're saying there is a difference between Preferences > Mend on Open and Tools > Reformat HTML > Mend All HTML Files? Having Preferences set to Mend on Open will not convert entities? The two Mend commands are different? Thank you | 
|   |   | 
|  11-18-2019, 01:56 PM | #4 | 
| Sigil Developer            Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			No, We test each file on import and only if it is not wellformed do we run Mend on it and (if and only if you allow that in preferences) If it is valid xhtml we do not touch it on import. Again, the point here is not to mess up a dev's existing code, unless really necessary. Last edited by KevinH; 11-18-2019 at 02:13 PM. | 
|   |   | 
|  11-18-2019, 03:49 PM | #5 | 
| just an egg            Posts: 1,848 Karma: 8006346 Join Date: Mar 2015 Device: Kindle, iOS | 
			
			Ah. Okay. So the Mend function is same, whether it's through Preferences > Mend on Open or Tools > Reformat HTML > Mend All HTML Files.  And checking "Mend on Open" in Preferences only causes Mend on Open to run if the file is not wellformed. If the file is wellformed, then Mend doesn't happen on Open, even if Preferences > Mend on Open is checked. Am I understanding correctly now? Thank you | 
|   |   | 
|  11-18-2019, 04:44 PM | #6 | 
| Sigil Developer            Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			Yes, in other words we do not mend what isn't broken anymore unless you invoke Mend yourself.
		 | 
|   |   | 
|  11-18-2019, 05:17 PM | #7 | 
| just an egg            Posts: 1,848 Karma: 8006346 Join Date: Mar 2015 Device: Kindle, iOS | 
			
			Thank you for taking the time to explain. I think I get it now    | 
|   |   | 
|  11-18-2019, 06:42 PM | #8 | 
| Grand Sorcerer            Posts: 28,866 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			Perhaps we should alter the wording slightly to clarify what's happening when opening/saving or when manually executing Mend? I can see how it might be a bit confusing to some. Maybe "Attempt to Fix XHTML Errors on Open" or something? Or... since it seems that many of of Sigil's other main features (Save [if Mend on Save is checked], Rename, Split, Merge, Restructure, etc...) will trigger the entity substitution/preservation anyway, can we just make Mend on Open behave like Mend on Save does regarding entity preservation? I have no idea what that would entail--and I'm certainly not trying to cause extra work--but there would be a certain symmetry/logic to Mend on Open/Save following the same roadmap RE entities when checked/unchecked, no? In case that's not clear what I'm suggesting is: 1) Mend on Save/Open - handle entity preservation/substitution based on whether the option is checked or not in Preferences (that's how Mend on Save behaves now, for what it's worth) 2) The manual Reformat HTML->Mend would continue to behave as it does. That's all "if possible" of course.   Last edited by DiapDealer; 11-18-2019 at 06:46 PM. | 
|   |   | 
|  11-18-2019, 07:06 PM | #9 | 
| Grand Sorcerer            Posts: 28,866 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			Also--and this maybe my fault--0.9.991 is converting all unicode no-break-space characters to   upon opening if the Preserve Entities list is completely empty (EPUB2). EDIT: maybe not my fault. I was thinking about my suggestion to do this. I think.  https://github.com/Sigil-Ebook/Sigil...339e73d8e9d0ef | 
|   |   | 
|  11-18-2019, 07:33 PM | #10 | |
| Sigil Developer            Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			Yes we could change that option description to Mend XHTML Errors in Open.  We could also just run preserve entities itself not involving Mend on every file imported but that seems to be against the as-is adjustment. As for Mend on Save, we could modify it not to pass everything to mend and instead parse for errors, and only mend the files with errors just like we do on importing an epub. I will look into that tomorrow. Quote: 
 | |
|   |   | 
|  11-18-2019, 08:30 PM | #11 | 
| Grand Sorcerer            Posts: 28,866 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			I'm OK with either, actually. Consistency is what I'm thing of mostly. So if modifying Mend on Save to skip entity substitution (like Mend on Open does now) is feasible, that would serve just as well, I think.
		 | 
|   |   | 
|  11-18-2019, 11:45 PM | #12 | |
| just an egg            Posts: 1,848 Karma: 8006346 Join Date: Mar 2015 Device: Kindle, iOS | Quote: 
 That is clear, consistent, and also alerts users like me that this function has changed since 0.9.18. As long as the manual Reformat HTML > Mend continues to run preserve entities, I am happy and can adjust  Thank you! | |
|   |   | 
|  11-19-2019, 07:24 AM | #13 | 
| Grand Sorcerer            Posts: 28,866 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | |
|   |   | 
|  11-19-2019, 08:52 AM | #14 | 
| Sigil Developer            Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			I will look into it.  As I remember the default preserve entities list in Settings has the 160 in it only.  I did not think it was ever special cased after the code to visually show the spaces went it. I will check. | 
|   |   | 
|  11-19-2019, 10:11 AM | #15 | 
| Sigil Developer            Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			I have now pushed the following to master ... - rewording of Prefs to make it clear only broken xhtml files will be mended on open and save if selected - changed Save to only run mend on broken xhtml files (if selected in prefs) to match how we handle it on open. | 
|   |   | 
|  | 
| Thread Tools | Search this Thread | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Metadata oddities | MSWallack | Marvin | 3 | 11-20-2014 01:55 AM | 
| Catalog oddities | tamhas | Library Management | 7 | 07-25-2014 10:55 AM | 
| decimal entities in ePub instead of character entities | epub4ever | Calibre | 4 | 04-20-2012 02:27 AM | 
| Anachronism or other oddities | Hellmark | General Discussions | 34 | 05-03-2011 01:28 PM |