MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Auto-correcting "Page-map.xml" (https://www.mobileread.com/forums/showthread.php?t=154925)

thomasbelrod 10-27-2011 05:36 PM

Auto-correcting "Page-map.xml"
 
2 Attachment(s)
I'm using Sigil at the end of our QA process to clean up a couple of things in our Epubs, which are currently being put together by an outside vendor. As a press, we've decided to maintain the page breaks and numbers from our printed edition (we're an academic press). Adobe Digital Editions has an extension for doing this called "page-map," where the info is kept in an xml file. (I know that this is possible to do with NCX, too, but I'm fairly new here and this decision was made before my time. I'm working on changing it, so that we aren't married to ADE, but for now this is what we've got.)

However, when I open an epub in Sigil, the "page-map.xml" file is almost totally erased. All that remains are a couple of HTML tags. (I'm attaching "before and after" versions of this file so you can see.) While I can always just manually place the original XML file back into the Epub after working in Sigil, I'd rather just not have to worry about it. Is there a way to set Sigil so that it doesn't automatically wipe this XML file? (For what it's worth, Epubs won't open in ADE if the XML file is empty. I'm not a fan of ADE, but again...this is what we've got for now.)

---
Tom Elrod
UNC Press

Serpentine 10-27-2011 08:43 PM

Hmmm yeah - It seems it wants to use it as part of the book, if you change the extension to something strange it will accept it, the relevant nodes seem fine too - however I'm not sure if ADE would have issues then.

Putting in a ticket over at sigil's bug tracker would be a good idea.

daubnet 10-28-2011 08:03 AM

The problem is probably more complex than that, but please note that your "page-map BEFORE.xml" is not valid XML. It is missing at least the XML-header
Code:

<?xml version="1.0" ?>
An excerpt from your NCX might also help.

user_none 10-28-2011 10:45 AM

Any XML file missing the XML header is rewritten automatically and it results in a blank file.

Serpentine 10-28-2011 10:57 AM

I forgot to say, I did test that - as well as with a relevant doctype, both resulted in the same thing. It was also my first guess :)

The other bits I lifted from:
https://wiki.mobileread.com/wiki/Adob...PUB_extensions

thomasbelrod 10-28-2011 02:40 PM

Thanks. Not sure why the XML header hasn't been included before. I'll check with our vendor to make sure they start doing that.

-Tom

Serpentine 10-28-2011 04:09 PM

Quote:

Originally Posted by thomasbelrod (Post 1807122)
Thanks. Not sure why the XML header hasn't been included before. I'll check with our vendor to make sure they start doing that.

-Tom

Did adding the header fix it for you? (tho it would be a good idea for them to add it never-the-less)

daubnet 10-28-2011 05:14 PM

Quote:

Originally Posted by user_none (Post 1806664)
Any XML file missing the XML header is rewritten automatically and it results in a blank file.

I'm unsure what you mean by that. A file that's missing the XML header is not an XML file, so I'm taking wild guesses:
  • Any file that "somewhat resembles" an XML file but lacks the header gets rewritten
  • Any file that is linked from the NCX and which should be XML but isn't, gets rewritten
  • Any file that lacks the XML header is rewritten

Since we know that Sigil leaves other non-XML files alone (e.g. JPEGs), I assume it's one of the other two. I must admit I haven't looked at the code yet, but I'd be curious which conditions trigger that behaviour.

The more I think about it, the more it makes sense to me to either just mark files as invalid/display an error, or to send them through some kind of XML sanitizer (like TagSoup/Tagger).

user_none 10-28-2011 05:22 PM

Any file that is known / required to be XML such as the page-map.XML is run through an XML Sanitizer which unfortunately gets confused by the missing XML declaration (which gives the XML version) and produces a "blank" file.

daubnet 10-28-2011 05:40 PM

Thanks a lot for clarifying.

thomasbelrod 11-01-2011 09:33 AM

Quote:

Did adding the header fix it for you? (tho it would be a good idea for them to add it never-the-less)
Yes, it did. Thanks a lot!


All times are GMT -4. The time now is 09:52 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.