View Single Post
Old 09-16-2009, 11:09 AM   #29
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Quote:
Originally Posted by Jellby View Post
I think it should be pretty obvious, the XML parsing is done by XMLStarlet, which uses XPath expressions (I had no knowledge of XPath until yesterday ). This is what is needed:

Open the META-INF/container.xml file. There should be a <rootfile> element with a full-path attribute. The value of this attribute is the path to the main OPF file.

Open the main OPF file. There should be a <spine> element there. The <spine> contains a list of <itemref> elements, each of them with a idref attribute. Get the values of these attributes in the order they are defined.

In the OPF file there should be a <manifest> element too. For each idref obtained in the previous step, there should be a <item> element inside the <manifest> with an id attribute identical to the idref. The href attribute of each <item> has the file path and name (relative to the directory where the OPF file is located).

Now you have the ordered list of all the files in the ePUB (actually, assuming there are no fallback items).

To get the "bookstyle.css": Find, in the OPF file, the <metadata> element, and inside it a <meta> element with an attribute name with the value "prince-style". The content attribute of this element is the id that you have to look for in the <manifest>, as done above for the items in the <spine>.

"default.css" and "output.pdf" are command-line or configuration arguments, those are not read from XML.
Well, as I have time (maybe tonight, but definitely in the next few days), I'll whip up a python script for that... and will, once completed, relinquish it to you!

If you picked up XPath as quickly as you did, you'll probably get Python easily enough as well. It's a great language, albeit you might have to make peace with some of its oddities.

The CSS stuff doesn't compromise the final PDF output? In a LaTeX context, my intuition would be to assume less is more and ignore CSS clowning around, in favour of LaTeX class defaults (whether customized or not).

- Ahi
ahi is offline   Reply With Quote