![]() |
#1 |
Witchman
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
Sigil's Infamous "colon" Error on File Split
Well this fault occurs relatively frequently on EpubCheck validation whenever you use the file splitter in Sigil. Here is the error message on EpubCheck from the IDPF validator:
ERROR(RSC-005): Error while parsing file 'value of attribute "id" is invalid; must be an XML name without colons'. Of course, there are no colons in the id line to speak of so that error message from the Validator is complete hogwash and no help at all. I kept getting this annoying error on Sigil file split then one day as I was browsing the rules and regs for epubs on the IDPF site I read something interesting which said this, more or less: if you use a 32 char hex id in an 8-4-4-4-12 configuration to denote xml structure ids in the epub then the first hex character in the uid must be an alphabet character. To illustrate this more broadly(with emphasis): This id will fail IDPF EpubCheck because it start with a numeric digit: 7d0d5c28-5743-40c1-bafa-048c5bba8e6f But this id will pass because it starts with an alphabet character: ed0d5c28-5743-40c1-bafa-048c5bba8e6f So if you get this EpubCheck error on file split, just check the file split idref in the opf spine and manifest and, if necessary, change the first character from a numeric digit to an alphabet character in the range of a to f(because its hex). Do this for both ids in the spine and manifest and the problem will be resolved. And it would also be quite nice if this problem was fixed in Sigil since it has been with us for such a long time. Can someone please fix this problem? By the way, the book id in the metadata section is a different can of beans because it isn't part of the epub structure -- so it doesn't matter if this uid starts with a numeric digit or an alpha character. Last edited by slowsmile; 10-25-2016 at 06:40 AM. |
![]() |
![]() |
![]() |
#2 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,680
Karma: 23983815
Join Date: Dec 2010
Device: Kindle PW2
|
This is not a Sigil bug; it's a case of GIGO.
Except when generating a TOC, Sigil does not change/add id values. It's up to Sigil users to ensure that ids in epub2 files start with a letter. (You can use ids that start with a number in epub3 files.) |
![]() |
![]() |
![]() |
#3 |
Witchman
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
Sorry Doitsu, not sure what you mean by GIGO. I'm not talking about TOC generation, I'm talking about the uid that is genersted the first time you do a file split in Sigil. That 32 char uid is automatically generated by Sigil. This fault is hit and miss since Sigil's uid generator will generate a uid that can start with either an alpha or a numeric character. This should really be fixed and changed so that Sigil's uid generator generates uid's that start with only alpha characters on a file split. And that's why this problem is a Sigil problem(which isn't helped much by EpubCheck's crappy and misleading error messaging).
Last edited by slowsmile; 10-25-2016 at 05:20 AM. |
![]() |
![]() |
![]() |
#4 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,680
Karma: 23983815
Join Date: Dec 2010
Device: Kindle PW2
|
The epubcheck error message that you got was triggered by ids that start with a number in (X)HTML or NCX files.
Sigil generated book ids that start with a number in the .opf or .ncx files won't trigger that message. For example, I just generated a new epub2 book that was assigned a hex value that starts with number and wasn't flagged. content.opf Spoiler:
toc.ncx Spoiler:
If you still believe that Sigil generated ids cause epubcheck error messages, please provide step-for-step instructions that allow the developers to reproduce this issue. |
![]() |
![]() |
![]() |
#5 |
Witchman
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
I'm not talking about TOCs or Book ids.
I'm talking about the 32 char uid that is generated when you do a file split. Please can you forget about TOCs and ebook uids. Your travelling down the wrong road. Try this. Open an ebook of yours in Sigil. Then choose a file in the Book Browser and split that file anywhere you like using the File Splitter button in the Sigil Toolbar. After you have split the file, check the content.opf and you will see the rather large uid that has been automatically generated in the spine and in the manifest because of the file split. That's what I'm talking about. And if that large uid -- which is indeed automatically generated by Sigil -- starts with a numeric digit then it will fail IDPF Epubcheck validation online and will give you the "colon" error. Try it for yourself. Last edited by slowsmile; 10-25-2016 at 05:52 AM. |
![]() |
![]() |
![]() |
#6 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,680
Karma: 23983815
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Split at Cursor does indeed generate item id attributes that might start with a number and will trigger epubcheck error messages. This is indeed a bug. As a workaround simply select Insert > Split Marker followed by Edit > Split at Markers or press CTRL+SHIFT+RETURN followed by F6. This will ensure that the id of the split file will start with the file name. |
|
![]() |
![]() |
![]() |
#7 |
mostly an observer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,518
Karma: 987654
Join Date: Dec 2012
Device: Kindle
|
Is this what we're talking about?:
<dc:identifier opf:scheme="UUID" id="BookId">urn:uuid:3f219299-e69b-41b7-b163-17aeb2668e9b</dc:identifier> It's an epub2, and it passes Epubcheck. I always split by placing the cursor, left-clicking, then clicking on the file-split icon in the second menu line. If that's a bad idea, why is the option there and so easy to use? |
![]() |
![]() |
![]() |
#8 |
Witchman
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
No Notjohn, we aren't talking about book ids. We are talking about ids generated when you split the file at the cursor. This generates a uid in the opf -- see below(in bold);
Code:
<manifest> <manifest> <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/> <item id="styles_css" href="Styles/stylesheet.css" media-type="text/css"/> <item id="cover" href="Text/cover.xhtml" media-type="application/xhtml+xml"/> <item id="contents" href="Text/contents.xhtml" media-type="application/xhtml+xml"/> <item id="Section0002.xhtml" href="Text/Section0002.xhtml" media-type="application/xhtml+xml"/> <item id="cover.jpg" href="Images/cover.jpg" media-type="image/jpeg"/> <item id="body1" href="Text/Chapter_1.xhtml" media-type="application/xhtml+xml"/> <item id="body2" href="Text/Chapter_2.xhtml" media-type="application/xhtml+xml"/> <item id="body3" href="Text/Chapter_3.xhtml" media-type="application/xhtml+xml"/> <item id="body4" href="Text/Chapter_4.xhtml" media-type="application/xhtml+xml"/> <item id="body5" href="Text/Chapter_5.xhtml" media-type="application/xhtml+xml"/> <item id="body6" href="Text/Chapter_6.xhtml" media-type="application/xhtml+xml"/> <item id="body7" href="Text/Chapter_7.xhtml" media-type="application/xhtml+xml"/> <item id="body8" href="Text/Chapter_8.xhtml" media-type="application/xhtml+xml"/> <item id="body9" href="Text/Chapter_9.xhtml" media-type="application/xhtml+xml"/> <item id="body10" href="Text/Chapter_10.xhtml" media-type="application/xhtml+xml"/> <item id="imag25849" href="Images/image001.jpg" media-type="image/jpeg"/> <item id="imag70213" href="Images/image002.jpg" media-type="image/jpeg"/> <item id="Title.xhtml" href="Text/Title.xhtml" media-type="application/xhtml+xml"/> <item id="12c14ef3-e5d0-4c7f-af43-9a5fa134c2bf" href="Text/Section0001.xhtml" media-type="application/xhtml+xml"/> </manifest> <spine toc="ncx"> <itemref idref="cover"/> <itemref idref="contents"/> <itemref idref="Title.xhtml"/> <itemref idref="12c14ef3-e5d0-4c7f-af43-9a5fa134c2bf"/> <itemref idref="Section0002.xhtml"/> <itemref idref="body1"/> <itemref idref="body2"/> <itemref idref="body3"/> <itemref idref="body4"/> <itemref idref="body5"/> <itemref idref="body6"/> <itemref idref="body7"/> <itemref idref="body8"/> <itemref idref="body9"/> <itemref idref="body10"/> </spine> Last edited by slowsmile; 10-25-2016 at 06:46 AM. |
![]() |
![]() |
![]() |
#9 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
Quote:
If you have found this bug quite a while ago, you could have reported it back then. It would have probably have been fixed by now then. And to call it 'infamous' is not correct in any way and quite harsh. It is not well known at all and it is not bad quality, just a silly prerequisite from the specs that has been corrected in ePUB3. Last edited by Toxaris; 10-25-2016 at 08:12 AM. |
|
![]() |
![]() |
![]() |
#10 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,341
Karma: 203719646
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Guys, I can't get Sigil to create a uuid manifest id (valid or otherwise) no matter how I split a file: at markers; at cursor (in Book View or Code View).
EDIT: never mind... I see it. It only seems to happen when there's only one manifested xhtml file in the epub. If more than one file exists, the new split file gets assigned the generated unique file name as its manifest id. I almost never work with one-file epubs. Easy to see how this would escape detection. Especially if no one ever reports it. ![]() In the meantime: add a blank html file, do your splits, and then delete the blank html file to work around the issue until such time as it gets resolved. Last edited by DiapDealer; 10-25-2016 at 12:13 PM. |
![]() |
![]() |
![]() |
#11 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,439
Karma: 5703082
Join Date: Nov 2009
Device: many
|
Hi All,
Of course this would only be reported the day AFTER we close the tree to changes so that translators get a chance to update their translations for Sigil-0.9.7? It always seems to happen that way ;-) If we can fix this without messing up the source line numbers used to key the translations too much, we will try to sneak this fix into the upcoming Sigil-0.9.7 otherwise it will be the first bug fixed for the follow-on release. Thanks for the bug report. KevinH |
![]() |
![]() |
![]() |
#12 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 697
Karma: 150000
Join Date: Feb 2010
Device: none
|
Fascinating!
I've been splitting a single file containing multiple chapters (InDesign CS4 export) for years, and never got bitten by this bug. Why? Because (1) I always use "split at markers"; and (2) I always rename the split files to something like "chapter002" instead of "Section0001_0001" which also fixes the id's in the opf. I must be living right! ![]() Albert |
![]() |
![]() |
![]() |
#13 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,341
Karma: 203719646
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
The easiest working workaround is: rename the single xhtml file to anything other than "Section0001.xhtml" before any Splitting at Cursor activity. In short ... no files named "Section000?.*" and you won't get bit. |
|
![]() |
![]() |
![]() |
#14 |
Witchman
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 628
Karma: 788808
Join Date: May 2013
Location: Philippines
Device: Android S5
|
My thanks to KevinH and DiapDealer for recognizing this as a fault.
Some added info that might help: I'm using Sigil v0.9.6 on Windows. This fault usually always occurs randomly whenever I use the File Splitter button(to the right of the Text/Html view button) on the Toolbar. I've always corrected this problem by just directly changing the first char in the uid to an alpha char within the opf file itself. I've also written several python apps that deal with conversion to epub and, as a necessary consequence and precaution from this problem, I've also written several uid generators that work to only generate uids that always start with an alpha character. I don't know if this will be of any help to KevinH or DiapDealer but an example is given below: Code:
from random import sample #========================================================# # # generates a 32 char uid with an alpha char start. # def getUID(): """ Generates a 32 char uid in 8-4-4-4-12 grouping with an alpha char always as the start character. This is necessary to avoid epubcheck "colon" errors occuring from structural uids used within epub xml files. """ # create a hex count sample a = ['1','2','3','4','5','6','7','8','9','a','b','c','d','e','f'] # split the id into 8-4-4-4-12 grouping # and randomize sample length values b = sample(a, 8) c = sample(a, 4) d = sample(a, 4) e = sample(a, 4) f = sample(a, 12) # create a random alpha hex value as the first char z = ['a', 'b', 'c', 'd', 'e', 'f'] first_char = sample(z, 1) b[0] = first_char[0] # merge the groups into strings b = ''.join(b) c = ''.join(c) d = ''.join(d) e = ''.join(e) f = ''.join(f) # build the uid uid = b + '-' + c + '-' + d + '-' + e + '-' + f return(uid) Last edited by slowsmile; 10-25-2016 at 08:59 PM. |
![]() |
![]() |
![]() |
#15 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,439
Karma: 5703082
Join Date: Nov 2009
Device: many
|
Our bug is in ResourceObjects/OPFResource.cpp in GetUniqueID and is only hit when a preferred id already exists someplace (ie. splitting at a cursor when the original file name is no longer enough to be unique).
Checking if first digit is a number is quite easy and if so prepending a non-number will work just fine. Code:
QString OPFResource::GetUniqueID(const QString &preferred_id, const OPFParser& p) const { if (p.m_idpos.contains(preferred_id)) { return Utility::CreateUUID(); } return preferred_id; } |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
this file converts with no error but the resulting file is "invalid" | mlevin77 | Conversion | 3 | 01-11-2014 08:34 AM |
Sigil "Split " issues | Russellsstudent | Sigil | 4 | 03-12-2013 10:07 AM |
ES file explorer: getting error "network path not found or timed out" | JoeyBlaze | Kindle Fire | 29 | 03-05-2012 03:07 PM |
"PK": Only text when I open in Sigil an ePub file generated with Calibre | Terisa de morgan | Sigil | 3 | 12-14-2009 11:24 AM |
The "Infamous Kindle Letter" | Dr. Drib | Amazon Kindle | 24 | 11-10-2009 06:56 PM |