![]() |
#1 |
Member
![]() Posts: 21
Karma: 10
Join Date: Jul 2016
Location: Fremont, CA
Device: Kindle Paperwhite Signature Edition
|
Sigil 2.6.2 - help with html error
As I posted in a recent thread, I got an error message while loading an html file into Sigil. The message said "line 30: Expected '>' or '/', but got ':'.
I traced this back to a script block which displayed a license notice (shown below). When I deleted this script block, the file loaded into Sigil with no problems, and I was able to convert it to epub with no further issues. However, this block does *not* appear to contain any colons?!?! Line 30 is at the @licend line... why was I getting this error?? Code:
<script nonce="1103b75ea8a534e00bd01d677f2ea330" > /* @licstart The following is the entire license notice for the * JavaScript code in this page. * * This program is free software: you can redistribute it and/or modify * it under the terms of the GNU Affero General Public License as published by * the Free Software Foundation, either version 3 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Affero General Public License for more details. * * You should have received a copy of the GNU Affero General Public License * along with this program. If not, see <http://www.gnu.org/licenses/>. * * @licend The above is the entire license notice * for the JavaScript code in this page. */ </script> |
![]() |
![]() |
![]() |
#2 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,026
Karma: 6361556
Join Date: Nov 2009
Device: many
|
'/* ' is not an xhtml comment and urls are not self-closed:
'see <http://www.gnu.org/licenses/>' All of this should have been inside a separate javascript file and not inlined. Or all of the < and > characters in that comment would have to be xml escaped to be included in xhtml. Last edited by KevinH; Yesterday at 02:13 PM. |
![]() |
![]() |
![]() |
#3 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,026
Karma: 6361556
Join Date: Nov 2009
Device: many
|
Please do not open a new thread for every reply. Simply add a new comment to the existing thread.
|
![]() |
![]() |
![]() |
#4 |
Member
![]() Posts: 21
Karma: 10
Join Date: Jul 2016
Location: Fremont, CA
Device: Kindle Paperwhite Signature Edition
|
Okay, sorry about that; I felt that this was a different question than the previous thread ('why did this block fail??', as opposed to 'what is causing that error?'), but the distinction is probably obtuse, and I could have merged them together.
|
![]() |
![]() |
![]() |
#5 |
Member
![]() Posts: 21
Karma: 10
Join Date: Jul 2016
Location: Fremont, CA
Device: Kindle Paperwhite Signature Edition
|
So you are saying that licstart/licend block is not valid?? It was inserted by Internet Archive, I would have expected that they knew how to make their headers... but it appears that isn't always the case...
I will consider this thread answered, and it can be closed. |
![]() |
![]() |
![]() |
#6 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,026
Karma: 6361556
Join Date: Nov 2009
Device: many
|
Note that html is much much more accepting of embedded < and > chars and has much less strict syntax but requires a very forgiving and specialized parser. So what the internet archive gets away with in spaghetti html is not always valid in an epub.
The epub spec uses much stricter xhtml/xml parsing rules and therefore can use a very fast and simpler parser than html. Sigil's Mend is an a forgiving html parser that can create valid xhtml. Which is why it is recommended to enable Mend when opening html files. |
![]() |
![]() |
![]() |
#7 |
Member
![]() Posts: 21
Karma: 10
Join Date: Jul 2016
Location: Fremont, CA
Device: Kindle Paperwhite Signature Edition
|
However, this *does* lead back to my other question...
Code:
Mend Not Well Formed HTML Source code On: |
![]() |
![]() |
![]() |
#8 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,026
Karma: 6361556
Join Date: Nov 2009
Device: many
|
Probably because that open and close is referring to Sigil just on opening an epub and on saving an epub. It looks more like you are trying to import a standalone html file that is not well formed. Just edit the file and remove that script open and close tag and replace it with an xhtml comment if you want the license info. Otherwise remove that script tag and its contents completely.
Another option is to open that html file in a text editor and copy and paste it into a blank xhtml document inside Sigil then run Mend on it. Probably easier just to remove that bloody script tag and its contents, though. |
![]() |
![]() |
![]() |
#9 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,026
Karma: 6361556
Join Date: Nov 2009
Device: many
|
I just took a look at the code that uses AddExisting to load an html file and it should detect it is not well formed, alert you with that info, if you have mend on open set, it should repair it and add it to your current epub.
If you have Sigil preferences set to clean/mend on open, you should see the fixed file in Sigil's BookBrowser after you dismiss the error alert. So it sounds like this is not working. It could be a bug. Would you please zip up a copy of that html file you want and attach it to your reply to this post, so we can use it to try to recreate your error, and fix it if needed. Are you sure you do not see the mended version of that file in Bookbrowser after you dismiss the error alert? |
![]() |
![]() |
![]() |
#10 | |
Member
![]() Posts: 21
Karma: 10
Join Date: Jul 2016
Location: Fremont, CA
Device: Kindle Paperwhite Signature Edition
|
Quote:
I just checked this morning, and Sigil definitely is not opening the file at all. BTW, I *did* simply delete that entire script block, which allowed me to import the file and generate an epub from it, as I wished... However, I decided to pursue these discussions here, so that I would have a better understanding of what the problems were, because I don't have much experience with xml or epub formats at all... Thank you for all of your insights here!! Later note: I don't seem to be able to attach a file here, and "Go Advanced" isn't working for me, so I just dropped the file onto my website, you can download it from there: https://derelllicht.42web.io/files/ldsv.testing.zip |
|
![]() |
![]() |
![]() |
#11 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,026
Karma: 6361556
Join Date: Nov 2009
Device: many
|
I grabbed it, thank you. If I can recreate the error with it, then I should be able to track it down and fix it.
|
![]() |
![]() |
![]() |
#12 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,026
Karma: 6361556
Join Date: Nov 2009
Device: many
|
There is a bug in Sigil's ImportHTML module that occurs when there is no fix found.
I will hopefully get that fixed for the next release. For the record, the correct fix is the following: Code:
* You should have received a copy of the GNU Affero General Public License * along with this program. If not, see <http://www.gnu.org/licenses/>. Also for what is is worth. That is a huge file. For performance sake especially for old ereaders you should insert Sigil split markers at proper demarcation points and split this file into many separate chapter or sections or whatever. Some old epub2 only e-readers could only use 320k of xhtml/html before slowing down (or even crashing) so this has becomes a reasonable maximum file size for epub chapters. Hope this explains things. I will try to track down and fix the import bug. Last edited by KevinH; Today at 09:57 AM. |
![]() |
![]() |
![]() |
#13 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,026
Karma: 6361556
Join Date: Nov 2009
Device: many
|
It turns out that
/* <http://www.gnu.org/licenses/> */ inside a script tag is enough to break even Google's gumbo parser and prevent it from "fixing" this file. I am not sure if this is worth a "fix" or not as using illegal '<' and '>' tags inside a javascript comment inside a script tag is not a good idea. Instead using a normal xhtml comment to include this information or include it as a separate javascript file is much better. It is just something I have never seen before and not something Google tested in its huge testing effort on literally millions of websites. An xhtml parser would have to be able to successfully parse all of javascript to even detect this is a comment and not code. Making gumbo "change" javascript inside a script tag is not something we want to do as it is simply a bad idea. I think instead we will chalk this up as "a really dumb thing to do in general" but something html parsing will accept but not something anyone would ever want in an epub as there is no way for a normal e-reader user to ever see this license. We can revisit this in the future if ever needed. Thank you for your interesting test case. |
![]() |
![]() |
![]() |
#14 |
Member
![]() Posts: 21
Karma: 10
Join Date: Jul 2016
Location: Fremont, CA
Device: Kindle Paperwhite Signature Edition
|
I agree that I don't think any code modification is worth the effort in this edge case... I am quite happy just to have all this insight into what is going on in this example!! Thank you very much!
|
![]() |
![]() |
![]() |
#15 |
Member
![]() Posts: 21
Karma: 10
Join Date: Jul 2016
Location: Fremont, CA
Device: Kindle Paperwhite Signature Edition
|
In case you are curious about how this html file came to be, it is part of my new technique for converting PDF documents/books to epub...
In the past I would run the PDF through an OCR converter, which generated a text file (that usually required a ton of cleanup). The biggest headache of the resulting text file was that all the lines ended up with hard CR/LF characters at end of every line, which needed to be removed if I wanted to make the pages flow smoothly with changing screen dimensions. But it recently occurred to me that if I just wrap basic html constructs around the text file (html, head, title, body), then the newline issue completely vanishes, because html and derivatives ignore those breaks!! So all I have to do then is walk through the file, deleting all hard-coded pagination lines, insert <p> at end of each paragraph, and I'm done; just import into Sigil to generate the epub, and I'm ready to publish... My mistake here, was that I wanted to retain Internet Archive's signatures, so anyone looking at the code would know where I got it... so I took that header from some other file on the IA page (for this book) and imported into my document... but I didn't realize until now that I had some traps to look out for !! I also wasn't aware of the issues with a large html file, which you pointed out to me here... I just went back and added page breaks at all the new-chapter points. Last edited by Derell Licht; Today at 01:09 PM. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Sigil 2.6.2 - html open fails, do not understand error message | Derell Licht | Sigil | 2 | Yesterday 02:20 PM |
Sigil Error Message: Book File Would Not Open in Sigil | fkustaa | Sigil | 9 | 04-27-2025 05:11 AM |
Refresh files in Sigil when html files have changed outside Sigil | Echeban | Sigil | 43 | 10-29-2021 08:29 PM |
After changes to HTML in Sigil... | Education | Sigil | 24 | 03-18-2014 10:39 AM |
Sigil loses all text after an html error | grumbles | Sigil | 3 | 05-13-2010 10:28 AM |