MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Question (https://www.mobileread.com/forums/showthread.php?t=333869)

Thasaidon 10-09-2020 02:57 AM

Question
 
Hi KevunH and Diapdealer;

I do not know quite how to classify this query.

I was near the end of editing a nook. I had some small errors so tried Mend and pretify. This cleared the errors but put the following artefact in the affected lines.

<!—?p—>

This was not detected by F7 or by ePubcheck but was detected by ePubtidy. It may also have been detected when I used Preview (the big panel at the top of the screen) but forgot to take note.

No biggie. I found the problem lines and fixed them but should F7 or
Pubcheck pick these errors up?

DiapDealer 10-09-2020 06:57 AM

It's not an error. It looks like a syntactically valid (if not contextually helpful) html comment to me. It is ePubTidy that is in the wrong for flagging it, IMO.

https://www.w3schools.com/html/html_comments.asp

KevinH 10-09-2020 11:08 AM

Yes, DiapDealer is 100% correct. That is a valid xhtml comment. Never heard of ePubtidy but it is a bug if they say xhtml comments are incorrect.

Is ePubtidy a derivative of the old html tidy (the one that Sigil used in Sigil 0.7.X timeframe and dropped because it "tidied" too much and code was lost)?

If the gumbo gets too confused it may covert tags to text but I have never actually seen it create a comment. Either way it should never completely actually lose anything. That is why we moved to gumbo (a self-correcting parser) over tidy. Also html tidy was not html5 capable at the time either.

Please file a bug report with ePubtidy's devs so that they can get that fixed.

Thasaidon 10-09-2020 11:20 AM

Quote:

Originally Posted by KevinH (Post 4045063)
Yes, DiapDealer is 100% correct. That is a valid xhtml comment. Never heard of ePubtidy but it is a bug if they say xhtml comments are incorrect.

Please file a bug report with ePubtidy's devs so that they can get that fixed.

Thank You. At least this question resulted in no more work for you.:D

When I finish editing a book I like Everything to agree that there are no problems. Hence my question.

I also did recognize the code so wondered what it was. I will follow your advice about raising a bug report with the ePubtidy developers.

DiapDealer 10-09-2020 12:46 PM

Actually... I take back what I said. That's NOT a properly formatted html comment. Those look to be emdashes, or endashes instead of two consecutive dashes that an html comment requires. Sigil's Preview Window, F7 Well-formedness check, and Epubcheck all correctly flag <!—?p—> as invalid for me.

Can you verify that the code in your epub is actually what you posted instead of two consecutive dashes?

Does the ePubTidy plugin attempt to "smarten" double-dashes to an emdash?

Also ... Sigil's "Mend" will correct the situation and turn that back into a valid html comment, so the question is whether this is something that ePubTidy is causing, or if it is already in your epub's code before running ePubTidy and Sigil is fixing it when you mend.

And the answer is, "yes." The ePubTidy plugin is breaking valid html comments with its default settings. I just checked. Keep in mind that ePubCheck is not a validator. As such, it cannot "agree" with other validators about problems. Validators don't make changes to code.

KevinH 10-09-2020 01:07 PM

I spent some time studying the gumbo code to see under what condition it tries to create a comment.

It seems that an xml "processing instruction" is now illegal in html and html5 and so gumbo will convert those to comments <!-- blah --> so that they are not lost.

So someplace in the original xml was a <?p> xml processing instructions meant for an external xsl stylesheet that got converted to <!--?p-->.

Something in ePubTidy or some other plugin that "smartens" things must have converted the -- to emdashes or endashes given what DiapDealers says above as gumbo just produces normal dashes when creating a comment.

Thasaidon 10-09-2020 10:58 PM

I am sorry but I have now sorted out the book concerned and now cannot remember the exact sequence of Plunins I used. I have been to sleep since then.:)

I cut and pasted the string into notebook. I then deleted the strings in the book and made the posting here. I cannot now remember if I was able to use the original cut or cut and pasted from notebook.

I will keep my eyes open for this in the future and immediately document what plugins I used and when. We can then speak with certainty. Watch this space.

I'll be back.

Thasaidon 10-09-2020 11:18 PM

It just occurred to me. If the <!—?p—> is not valid should it be picked up by ePubCheck or Check well formed HTML?

KevinH 10-09-2020 11:47 PM

As long as they are normal dashes, that line is a valid xml comment. Nothing should complain about it. If it was generated by Mend in Sigil it was created with normal dashes and is correct.

If however you run a smarten plugin and it does not understand xml/xhtml/html comments, it will change the normal dashes to endashes or emdashes. That would then make that line illegal.

epubcheck will not complain about valid xml,xhtml,html comments. It should complain if changed to endashes or emdashes.

DiapDealer 10-10-2020 01:38 AM

Quote:

Originally Posted by Thasaidon (Post 4045338)
It just occurred to me. If the <!—?p—> is not valid should it be picked up by ePubCheck or Check well formed HTML?

That's what I mentioned above. Your posted code (copied and pasted into Sigil as is) IS picked up by ePubCheck, and the well-formed check (and Preview) for me.

AlanHK 10-10-2020 02:07 AM

Quote:

Originally Posted by KevinH (Post 4045344)
As long as they are normal dashes.

Note:not "dashes", hyphens.
Hyphens are the simple ASCII characters on the keyboard. Dashes are typographic glyphs you need some combination keys to enter.

- hyphen
– en-dash
— em-dash


"Smart quotes" functions often convert a spaced hyphen ( - ) to an en-dash and a double hyphen (--), as used in a HTML comment, to an em-dash.

Thasaidon 10-10-2020 05:42 AM

Quote:

Originally Posted by DiapDealer (Post 4045359)
That's what I mentioned above. Your posted code (copied and pasted into Sigil as is) IS picked up by ePubCheck, and the well-formed check (and Preview) for me.

I get what you and KevinH are saying now. I will keep an eye out for this in future but will properly document what I have done so that if there is buglet hiding here we can track it down.

I have already amended my workflow to accomodate.

DiapDealer 10-10-2020 02:04 PM

I did note that the ePubTidy plugin was indeed taking valid HTML comments and converting the two HYPHENS :rolleyes: into en- or emdash characters for me. I just used whatever default settings the plugin had set up out of the box.


All times are GMT -4. The time now is 10:51 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.