MobileRead Forums - View Single Post - Prince XML for creating mobile reader-sized PDFs?

frabjous · 09-14-2009, 07:12 PM

Quote:

Originally Posted by Jellby

This is one of the things that need some work, at the moment I process the spine with sed scripts, which rely on "correct" newlines (that's why I needed dos2unix in some cases). Ideally, the .opf file should be processed with some XML tool, do you know any?

I'm not at all experienced in any such things (I just like to pretend that I am), but maybe something like XML starlet? Of course, perl or python probably have libraries for it.

I was going to say I didn't think there was anything wrong with using sed though... but upon further reflection, I realized it is rather dangerous if some of the entries in the .opf or .ncx file have linebreaks in the middle of a tag or element.

E.g., to test this, I made an epub with an .opf that had a part looked like this:

Code:

<item href="titlepage.xhtml" 
id="titlepage" media-type="application/xhtml+xml"/>
<item href="test.html" id="html" 
media-type="application/xhtml+xml"/>

rather than this:

Code:

<item href="titlepage.xhtml" id="titlepage" media-type="application/xhtml+xml"/>
<item href="test.html" id="html" media-type="application/xhtml+xml"/>

Running your script generated errors such as:

Code:

prince: ./:1: error: Document is empty
prince: ./:1: error: Start tag expected, '<' not found
prince: ./: error: could not load input file

I don't think a well-made .opf would look like that, however, and FWIW, ADE can choke on stuff like this too.

Quote:

The standard .epub settings (not those in the "special" pdf-style file) can be overriden by adding !important to the default.css file, at least according to the documentation. I could add another option to specify highes-priority rules (it would be just adding another .css after the book-specific one in the prince command-line).

Playing around with this, it sort of works. E.g., I only change your default.css to make:

Code:

body {
  font-size: 9.9pt;
  font-family: serif; 
  text-align: justify;
  prince-image-resolution: 166dpi;
  hyphens: auto;
}

into:

Code:

body {
  font-size: 9.9pt;
  font-family: serif !important; 
  text-align: justify;
  prince-image-resolution: 166dpi;
  hyphens: auto;
}

I then took this following simple HTML (test.html) file for testing:

Code:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1" />
<meta name="author" content="frabjous" />
<meta name="title" content="Prince Test" />
<title>Prince Test</title>
<style type="text/css">
body { font-family: Georgia; }
</style>
</head>
<body>
<p>The quick brown fox jumps over the lazy dog. 0123456789</p>
</body>
</html>

I then ran (calibre):

Code:

ebook-convert test.html test.epub
epub2pdf.sh test.epub test.pdf

The resulting PDF used Droid, as per default.css, not Georgia. However, if I start instead with:

Code:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1" />
<meta name="author" content="frabjous" />
<meta name="title" content="Prince Test" />
<title>Prince Test</title>
<style type="text/css">
.mypararagraph { font-family: Georgia; }
</style>
</head>
<body>
<p class="myparagraph">The quick brown fox jumps over the lazy dog. 0123456789</p>
</body>
</html>

Then the PDF used Georgia, not Droid, so the !important flag is not "cascading down" so-to-speak.

(I know calibre mucks with the css in conversion to epub, but I got the same results using prince directly on the html files directly with the "-s ~/.epub2pdf/default.css" option.)

Again, not a huge deal since the usual place for the font specification would be under "body", and if something further down changed it, it's probably got a good reason-- I mainly worry about epubs made with WYSIWYG editors and suchlike that might place the the font-family attribute anywhere.

Quote:

Yes, feel free to code it

When I have more free time, I might try it, though I have almost no experience with Python myself. Still, this is the best way to learn, eh?

Quote:

Oooh... a GUI, it makes me shudder

I think that's quite beyond my goal at the moment, but of course, it would be welcome.

It would really be out of altruism, since a lot more people would use a tool like this if there were a GUI running under Windows/Mac. But it might also further the cause that properly sized PDFs is an ebook format afterall!

Quote:

For the moment, let's see if the introduction of this <meta name="pdf-style"> has any acceptance...

I'll cross my fingers!