|  09-14-2009, 06:12 PM | #16 | |||||
| Wizard            Posts: 1,213 Karma: 12890 Join Date: Feb 2009 Location: Amherst, Massachusetts, USA Device: Sony PRS-505 | Quote: 
 I was going to say I didn't think there was anything wrong with using sed though... but upon further reflection, I realized it is rather dangerous if some of the entries in the .opf or .ncx file have linebreaks in the middle of a tag or element. E.g., to test this, I made an epub with an .opf that had a part looked like this: Code: <item href="titlepage.xhtml" id="titlepage" media-type="application/xhtml+xml"/> <item href="test.html" id="html" media-type="application/xhtml+xml"/> Code: <item href="titlepage.xhtml" id="titlepage" media-type="application/xhtml+xml"/> <item href="test.html" id="html" media-type="application/xhtml+xml"/> Code: prince: ./:1: error: Document is empty prince: ./:1: error: Start tag expected, '<' not found prince: ./: error: could not load input file Quote: 
 Code: body {
  font-size: 9.9pt;
  font-family: serif; 
  text-align: justify;
  prince-image-resolution: 166dpi;
  hyphens: auto;
}Code: body {
  font-size: 9.9pt;
  font-family: serif !important; 
  text-align: justify;
  prince-image-resolution: 166dpi;
  hyphens: auto;
}Code: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1" />
<meta name="author" content="frabjous" />
<meta name="title" content="Prince Test" />
<title>Prince Test</title>
<style type="text/css">
body { font-family: Georgia; }
</style>
</head>
<body>
<p>The quick brown fox jumps over the lazy dog. 0123456789</p>
</body>
</html>Code: ebook-convert test.html test.epub epub2pdf.sh test.epub test.pdf Code: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1" />
<meta name="author" content="frabjous" />
<meta name="title" content="Prince Test" />
<title>Prince Test</title>
<style type="text/css">
.mypararagraph { font-family: Georgia; }
</style>
</head>
<body>
<p class="myparagraph">The quick brown fox jumps over the lazy dog. 0123456789</p>
</body>
</html>(I know calibre mucks with the css in conversion to epub, but I got the same results using prince directly on the html files directly with the "-s ~/.epub2pdf/default.css" option.) Again, not a huge deal since the usual place for the font specification would be under "body", and if something further down changed it, it's probably got a good reason-- I mainly worry about epubs made with WYSIWYG editors and suchlike that might place the the font-family attribute anywhere. Quote: 
 Quote: 
 Quote: 
 | |||||
|   |   | 
|  09-15-2009, 07:00 AM | #17 | ||
| frumious Bandersnatch            Posts: 7,570 Karma: 20150435 Join Date: Jan 2008 Location: Spaniard in Sweden Device: Cybook Orizon, Kobo Aura | Quote: 
  This does not mean it's not possible to override anything (and everything) in the book, just that it may be not so straightforward and "drag-and-drop" as wished. In this particular case I guess you could include in your default.css: Code: .mypararagraph { font-family: inherit !important; }Quote: 
 | ||
|   |   | 
|  09-15-2009, 11:57 AM | #18 | |||
| Wizard            Posts: 1,213 Karma: 12890 Join Date: Feb 2009 Location: Amherst, Massachusetts, USA Device: Sony PRS-505 | Quote: 
 But what I had in mind above is something that went through all the CSS of the source and just changed any "font-family: XXX" attributes to "font-family: inherit" (and stripped any obsolete <font face="XXX"> tags) or something like that. I recognize (as I admitted in an earlier post) that this might be dangerous, since there might be a good reason for it changing it in a particular portion (e.g., in a multilingual document). The idea is that this would be an optional feature of the script that one would have to enable "force font change" or something like that. I don't think that would require a full CSS parser/editor. A couple regex search and replace should handle it, no? Actually, I think I could probably alter your script accordingly with a few sed lines. Quote: 
 Quote: 
 | |||
|   |   | 
|  09-15-2009, 01:52 PM | #19 | ||
| frumious Bandersnatch            Posts: 7,570 Karma: 20150435 Join Date: Jan 2008 Location: Spaniard in Sweden Device: Cybook Orizon, Kobo Aura | Quote: 
 Quote: 
 | ||
|   |   | 
|  09-15-2009, 02:10 PM | #20 | 
| Wizard            Posts: 1,213 Karma: 12890 Join Date: Feb 2009 Location: Amherst, Massachusetts, USA Device: Sony PRS-505 | 
			
			From what I've read, Stanza, at any rate, allows the user to change the font on the fly, and it reaches down and changes it at every level. It would be great if someone from the IDPF or somesuch developed a system of standard ePub class names that would get modified or added to only when necessary. I don't know, but they already have official recommendations for good ePub CSS practices, like using relative sizes rather than absolutes sizes for subsidiary elements. Calibre on the other hand, at present, generates a whole bunch of custom class names when it processes any document. Every new use of style="..." inside a tag gets turned into a new calibre class, which is kind of messy. | 
|   |   | 
|  09-15-2009, 02:35 PM | #21 | |
| Wizard            Posts: 1,790 Karma: 507333 Join Date: May 2009 Device: none | Quote: 
 - Ahi | |
|   |   | 
|  09-15-2009, 02:36 PM | #22 | 
| frumious Bandersnatch            Posts: 7,570 Karma: 20150435 Join Date: Jan 2008 Location: Spaniard in Sweden Device: Cybook Orizon, Kobo Aura | 
			
			I've updated the script uploaded in post #11. Now it uses XMLStarlet to parse the OPF file (I believe that's more robust, and no need for dos2unix now), and I've changed the special meta@name to "prince-style". The default.css file has also been changed a bit: I moved the "@page title" style to the book-specific stylesheet, where it rather belongs, and changed my preferred fonts (grew tired of the default "st" ligatures in FreeSerif).
		 | 
|   |   | 
|  09-15-2009, 02:38 PM | #23 | |
| Wizard            Posts: 1,790 Karma: 507333 Join Date: May 2009 Device: none | Quote: 
 - Ahi | |
|   |   | 
|  09-16-2009, 09:07 AM | #24 | 
| Wizard            Posts: 1,213 Karma: 12890 Join Date: Feb 2009 Location: Amherst, Massachusetts, USA Device: Sony PRS-505 | 
			
			Thanks for the new script. Unfortunately, I can't test it right now because I'm having trouble installing/configuring XML starlet. (I know, I know... I'm the one who recommended it... I should really try something myself before I do that...) I'm determined to get it working though, so I'll let you know. | 
|   |   | 
|  09-16-2009, 09:50 AM | #25 | |
| Wizard            Posts: 1,790 Karma: 507333 Join Date: May 2009 Device: none | Quote: 
 If so, it might be fairly simple to turn it into a rather plain Python script (which then, I am given to understand, can be with reasonable ease turned into an .exe as well). The preliminary HTML parsing part of pacify should be more than up to the task of fishing a couple of attribute values out of barely structured XML. - Ahi | |
|   |   | 
|  09-16-2009, 10:26 AM | #26 | ||
| frumious Bandersnatch            Posts: 7,570 Karma: 20150435 Join Date: Jan 2008 Location: Spaniard in Sweden Device: Cybook Orizon, Kobo Aura | Quote: 
 Quote: 
 Code: prince -s default.css -s bookstyle.css -o output.pdf Cover.xhtml Chapter-01.xhtml Chapter-02.xhtml ... | ||
|   |   | 
|  09-16-2009, 10:29 AM | #27 | |
| Wizard            Posts: 1,790 Karma: 507333 Join Date: May 2009 Device: none | Quote: 
 - Ahi | |
|   |   | 
|  09-16-2009, 10:53 AM | #28 | 
| frumious Bandersnatch            Posts: 7,570 Karma: 20150435 Join Date: Jan 2008 Location: Spaniard in Sweden Device: Cybook Orizon, Kobo Aura | 
			
			I think it should be pretty obvious, the XML parsing is done by XMLStarlet, which uses XPath expressions (I had no knowledge of XPath until yesterday   ). This is what is needed: Open the META-INF/container.xml file. There should be a <rootfile> element with a full-path attribute. The value of this attribute is the path to the main OPF file. Open the main OPF file. There should be a <spine> element there. The <spine> contains a list of <itemref> elements, each of them with a idref attribute. Get the values of these attributes in the order they are defined. In the OPF file there should be a <manifest> element too. For each idref obtained in the previous step, there should be a <item> element inside the <manifest> with an id attribute identical to the idref. The href attribute of each <item> has the file path and name (relative to the directory where the OPF file is located). Now you have the ordered list of all the files in the ePUB (actually, assuming there are no fallback items). To get the "bookstyle.css": Find, in the OPF file, the <metadata> element, and inside it a <meta> element with an attribute name with the value "prince-style". The content attribute of this element is the id that you have to look for in the <manifest>, as done above for the items in the <spine>. "default.css" and "output.pdf" are command-line or configuration arguments, those are not read from XML. | 
|   |   | 
|  09-16-2009, 11:09 AM | #29 | |
| Wizard            Posts: 1,790 Karma: 507333 Join Date: May 2009 Device: none | Quote: 
 If you picked up XPath as quickly as you did, you'll probably get Python easily enough as well. It's a great language, albeit you might have to make peace with some of its oddities. The CSS stuff doesn't compromise the final PDF output? In a LaTeX context, my intuition would be to assume less is more and ignore CSS clowning around, in favour of LaTeX class defaults (whether customized or not). - Ahi | |
|   |   | 
|  09-16-2009, 11:47 AM | #30 | 
| frumious Bandersnatch            Posts: 7,570 Karma: 20150435 Join Date: Jan 2008 Location: Spaniard in Sweden Device: Cybook Orizon, Kobo Aura | 
			
			Not if the CSS is well designed. The intent is not "fixing" arbitrary ePUBs, but converting good ePUBs into good PDFs. If the CSS is so bad one would better drop it, one could pass --no-author-style to prince. I guess I could add an option for this in the script, that should address Frabjous's worries with fonts as well. | 
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Creating XML book listing with Calibre | JTAL604622 | Library Management | 5 | 06-01-2010 02:57 PM | 
| Question about creating PDFs (resolved - my error, d'oh) | Prince Hal | 19 | 03-02-2010 11:30 PM | |
| Software for creating image-based PDFs | 301verbs | Workshop | 2 | 06-13-2009 12:51 PM | 
| Mobile reader being able to display A4 pdfs | Mononofu | Which one should I buy? | 10 | 01-17-2009 07:22 AM | 
| Creating media.xml manually | pepak | Sony Reader | 5 | 11-28-2008 10:26 AM |