View Full Version : Script for converting ePUB to PDF using Prince


Jellby
11-22-2009, 07:49 AM
Some time ago I wrote the epub2pdf (http://www.mobileread.com/forums/showpost.php?p=591684&postcount=11) script to render ePUB files into PDF, using the excellent abilities of Prince.

Well, I think it's about time to have it with its own thread in the ePUB forum :)

It is tried in Linux only, but I guess it should work in Windows under Cygwin too, and in MacOs. The script uses unzip, XMLstarlet (http://xmlstar.sourceforge.net/) and Prince (http://www.princexml.com). These are the usage notes:

epub2pdf.sh [options] input.epub output.pdf

Where the options are:
-s "style.css" Use "style.css" as stylesheet (default is "default.css")
-S "style.css" Use "style.css" as highest-priority stylesheet
Stylesheets will be searched in the current directory first, and then in ~/.epub2pdf
-v Verbose output
-h Show this help

So you see, it's fairly easy. Place the included "default.css" file in ~/.epub2pdf and it will be used automatically for all conversions, or you can specify other css file (or modify default.css at your will), it will be searched in ~/.epub2pdf first (so you can keep different "profiles" there).

I added a feature to use a book-specific stylesheet if found. This stylesheet should be included in the .epub and referenced thus:

1.- Include a .css with rules and selectors for Prince. These are not going to be used in the normal ePUB rendering, only when processing with Prince, so you can use everything supported by Prince (use !important to override the standard css rules).

2.- As with every file you include in the epub, there must be an entry in the <manifest> (in the .opf file).

3.- Add a <meta name="prince-style" content="XXXXX"> to the <metadata> block of the .opf file, where "XXXXX" is the id of the above .css file.

That's all, epub2pdf will use this .css file included in the .epub in addition to the default.css or whatever you use. The ePUB files I've uploaded here have all this epub2pdf-specific style.

Current version is 3.1

GUI version
Calibre plugin version

frabjous
11-26-2009, 06:19 PM
I've said it before, but I'll say it again... This script is fantastic.

Works great on linux anyway. Has anyone tried it with Cygwin, though? I have my doubts.

Valloric
01-01-2010, 11:22 AM
OK, feedback coming your way.

First off, bloody amazing tool! I can finally get justified text on my PRS-505. I love epub, but I'm not a big fan of the ADE renderer on the 505.

Secondly, using "-v" showed I was missing "pdftk" and "recode". sudo apt-get install fixed that, but you may want to add it to the list of dependencies.

Thirdly, I hate the horizontal ruler and the page counter at the top. I know I can edit "default.css" (and I have), but it would be wise to provide a switch for this.

Lastly, you could turn this into a very nice little OSS project with a little bit of effort: switch from bash to python, include all the dependencies in a package/installer (I recommend InstallJammer) and provide it stand-alone for the three major platforms. Host it on google code or something. Don't forget to note that because of Prince it could only be used for personal uses and that they'd have to buy a license for commercial ones.

It would be very nice indeed.

Jellby
01-01-2010, 11:51 AM
Secondly, using "-v" showed I was missing "pdftk" and "recode". sudo apt-get install fixed that, but you may want to add it to the list of dependencies.

Well... the script works fine without them, they're only used for setting the metadata. They are listed as optional dependencies at the top of the script.

Lastly, you could turn this into a very nice little OSS project with a little bit of effort: [...]

I'm afraid your "little bit of effort" is too much for me :D I lack the knowledge and energy for efficiently creating, testing and maintaining something like what you suggest. I know it's probably easier than it seems, but even so, I'm not motivated enough...

But anyone is welcome to do that, converting to python (or whatever) should be relatively easy indeed. Hey, you could even add it to Sigil ;)

Thirdly, I hate the horizontal ruler and the page counter at the top. I know I can edit "default.css" (and I have), but it would be wise to provide a switch for this.

I know, I know... I expected each user to "develop" his/her own default.css, and only provided mine as an example. Of course, sharing CSS files is always an option ;)

Valloric
01-01-2010, 12:19 PM
Hey, you could even add it to Sigil ;)

I would if Prince XML were open source.

Valloric
01-22-2010, 08:18 PM
Jellby, I've noticed that my output PDF's are somehow screwing up the italics... I'm using embedded fonts in the epub file (roman, bold, italic) and they get picked up just fine, but it seems Prince slants the italic text even though it uses the italic font I embedded.

So, the italic font is used, but slanted some more. It's as if Prince is trying to create a "fake" italic font, but since it also uses the real italic font, the results look ugly and weird.

Is this a bug in Prince or in the bash script/default.css?

BTW the epub book displays just fine in ADE.

frabjous
01-22-2010, 11:23 PM
It would probably help to see the relevant portions of default.css and the tags used in the html. The problem might be with Prince but I'd bet there was a way around it; perhaps by declaring the true italic as its own font-family, but calling it with font-style: normal rather than font-style: italic.

Jellby
01-23-2010, 03:57 AM
I'm using embedded fonts in the epub file (roman, bold, italic) and they get picked up just fine, but it seems Prince slants the italic text even though it uses the italic font I embedded.

[...]

Is this a bug in Prince or in the bash script/default.css?

It's a bug in Prince, see here (http://www.princexml.com/bb/viewtopic.php?f=4&t=2571).

Valloric
01-23-2010, 09:46 AM
It's a bug in Prince, see here (http://www.princexml.com/bb/viewtopic.php?f=4&t=2571).

Thanks Jellby. I actually managed to figure out last night that Prince determines whether to apply faux bold or italic to a font based on the font properties in the file itself, not how the font is invoked.

But it's good to know they're working on fixing this bug.

Sullivan
01-26-2010, 05:33 AM
wow, I did not even know this existed (new to ePub myself) - will put some new sparkle in my rusty PRS-505. thank you!

tonhou
07-11-2010, 08:02 PM
Just to say I have used this script quite extensively with excellent results. I use Linux on the desktop and an Ectaco Jetbook (5").
I have found that public domain epubs from Feedbooks have worked really well.
Others that I have purchased and converted to epub are mostly good.
The TOC is the most interesting bonus. If it is present the script seems to haul it in particularly well.

I am a little foggy about how the settings can be changed and have tended to run with the default.

Thanks for making this available.

--Tony

Jellby
07-12-2010, 01:13 PM
Welcome to MobileRead, tonhou

I'm glad you like the script, you may find the new GUI version (http://www.mobileread.com/forums/showthread.php?t=89689) easier to work with, at least you see the CSS stylesheets that are being used.

Heisenberg
08-01-2012, 08:27 AM
Hi there,

first I want to thank you for publishing your great script. It is the best solution for converting epub to pdf I found on the web (and i searched long time).
Now I use your script for the following task:
I export a dokuwiki to epub and then convert it with your script. Works fine but one problem:
When I click the footnotes which were generated for external links, I get a error message:
"epub2pdf/OEBPS/footnotes.html': No such file or directory"

errors from console:
prince: OEBPS/title.html:3: error: Misplaced DOCTYPE declaration
prince: OEBPS/title.html:4: error: htmlParseStartTag: misplaced <html> tag
prince: OEBPS/footnotes.html:167: error: EntityRef: expecting ';'
prince: OEBPS/footnotes.html:167: error: EntityRef: expecting ';'
prince: OEBPS/footnotes.html:239: error: EntityRef: expecting ';'
prince: OEBPS/footnotes.html:239: error: EntityRef: expecting ';'
prince: OEBPS/footnotes.html:240: error: EntityRef: expecting ';'
prince: OEBPS/footnotes.html:240: error: EntityRef: expecting ';'
prince: OEBPS/footnotes.html: error: could not load input file


my config css looks like this:
@font-face {
font-family: serif;
src: local("Georgia")
}

@font-face {
font-family: sans-serif;
src: local("Verdana")
}

@font-face {
font-family: monospace;
src: local("Courier New")
}

@page {
size: 21cm 29.7cm;
margin: 20mm 20mm 20mm 20mm;
@top-left {
font-size: 60%;
font-style: italic;
border-bottom: solid thin black;
margin-bottom: 1mm;
content: string(booktitle);
}
@top-center {
font-size: 60%;
font-style: italic;
border-bottom: solid thin black;
margin-bottom: 1mm;
content: string(chaptertitle);
}
@top-right {
font-size: 50%;
border-bottom: solid thin black;
margin-bottom: 1mm;
content: counter(page) "/" counter(pages);
}
}

@page:first {
margin: 20mm 20mm 20mm 20mm;
@top-left {
border-width: 0;
margin: 0;
content: normal;
}
@top-center {
border-width: 0;
margin: 0;
content: normal;
}
@top-right {
border-width: 0;
margin: 0;
content: normal;
}
}

body {
font-size: 8.0pt;
font-family: serif;
text-align: justify;
hyphens: auto;
prince-image-resolution: auto;
}

img{
max-width: 100%;
}

h1, h2, h3, h4, h5, h6 {
hyphens: none;
}

:lang(it), :lang(es) {
hyphenate-before: 3;
hyphenate-after: 3;
}


I'm not very familiar to prince so if someone could help me I would really appriciate!!!

Thanks and kind Regards!

Jellby
08-01-2012, 10:40 AM
It looks like there's a problem with the HTML source. Can you post the epub file? Have you tried validating the epub with epubcheck or Sigil?

Heisenberg
08-05-2012, 03:07 PM
Hi Jellby,
I can not post the whole epub because it contains top secret information ( ;) ) but I exported only one site to show the problem to you.
You can get the epub here: https://www.dropbox.com/s/c5epp1owz4p6is6/2012_august_5_08-47-48.epub
And here is is what I get from the script: https://www.dropbox.com/s/tqk47wa039d8iye/testOut.pdf
The errors are the same as i mentioned before.

Thanks for your efforts!

Jellby
08-06-2012, 04:38 AM
The file has a few errors. I suggest you use Sigil to fix it and validate it.

First, image files are missing from the ePub (I see they are only referenced in the CSS, probably in unused styles, though). Second, the title.html has two DTD lines, and they are incorrect. The .html files must be XHTML (which means they must have closing tags, have all tags in lowercase, etc.). Then the problem with footnotes.html is that you have & instead of &amp;

roger64
08-06-2012, 06:23 AM
Thanks to your previous post, I discovered your nice and clever tool/script and GUI (better late than never). :)

As my EPUBs are the product of odt files tweaked with Sigil, it could really be very useful to enjoy justified text, at long last without losing any Sigil-produced enhancements.

I use v 1.2 and tried to convert one of my EPUBs. I have two questions:

1. - I think I've got on my Linux all dependencies but I have this (see screenshot). It reports it does not find a file, where can I inform him about the right path to it?

2. - All my EPUB use two file sheets which are of course correctly manifested. Have I to do something special?

Jellby
08-06-2012, 06:39 AM
1. - I think I've got on my Linux all dependencies but I have this (see screenshot). It reports it does not find a file, where can I inform him about the right path to it?

That looks like a problem with Prince, not with the script/GUI... Anyway, maybe you can install libssl0.9.8, apparently I have both in my system:

/usr/lib32/libssl.so.0.9.8
/usr/lib32/libssl.so.1.0.0

2. - All my EPUB use two file sheets which are of course correctly manifested. Have I to do something special?

No, all stylesheets referenced by the XHTML files should be automatically used (you can view them in the View CSS tab). To these stylesheets included in the book, two additional ones are added. One is the "Default CSS" which is ~/.epub2pdf/default.css, this is intended for settings that you want to apply to all your epub2pdf conversions, like page size, fonts, etc. The other is the "Additional CSS" which is an ad-hoc stylesheet intended for a specific book without having to modify the book itself. I don't remember the priority order now, you may have to use "!important" to override some styles.

roger64
08-06-2012, 08:42 PM
Nota: The installation problems above this post were resolved using the static package instead of the dynamic one.

Conclusion: Wonderful software!!!

I can now have a PDF with embedded fonts, dropcaps, full cover page and a justified text. The total weight of the file went up from 490 k to 920 k which is quite OK.

No work to do. As far as your original EPUB uses styles, no problem.

Thank you very much!! :2thumbsup Congratulations to Jellby and Prince.

I join this six inches (9x12) PDF as a proof. The original EPUB has been published on MR some days ago.

@Jellby
EDIT: I tried to produce a 9.7inch PDF (that is 14.8cm 19.7cm) and I did produce a frame of this size but the image and text were not increased beyond 9x12. I surely did something wrong, can you produce one for me as a demo (and explain)?
The original EPUB is here (http://www.mobileread.com/forums/showpost.php?p=2171953&postcount=1).

Jellby
08-07-2012, 04:41 AM
How about this? I just used this default.css:

@page {
size 14.8cm 19.7cm;
margin: 5mm;
}

By the way, you have a mix of straight and curly apostrophes in the text.

roger64
08-07-2012, 08:40 AM
@Jellby

Thanks a lot for your precious help. One reader reported me that he could not read my EPUBs with his brand new ONYX M92. I think, this way, it is possible to offer him a 9.7 inches PDF solution without having to renounce the use of embedded fonts, dropcaps and other niceties. And for the trouble, he gets on top of this a justified text...

I'll fix these disordered apostrophes. Thank you also for spotting this. C'est une honte.

BTW, I wonder also if the CSS kind of PDF produced with Prince could not allow the use of thin spaces? When you display a ragged text, like in common EPUBs, it may seem superfluous to bother about thin spaces. When you have a nice justified hyphenated text..., then maybe it's time to think seriously about it. :)

roger64
08-09-2012, 02:58 AM
There is no doubt that this software (Jellby's + Prince) allows to produce seamlessly from a source EPUB, PDF files of an excellent quality. I know it because, with a lot of Jellby's help, I succeeded. (http://www.mobileread.com/forums/showpost.php?p=2178199&postcount=173)

Beyond the limits?

There is a human tendency to always try to push the limits a little further. Now that I enjoy justified and hyphenated text, I think about "optically justified text (http://help.adobe.com/en_US/illustrator/cs/using/WS714a382cdf7d304e7e07d0100196cbc5f-63aba.html)". Would this be possible using Prince? What also about inserting thin spaces? This can be done using emulation. Is there a nicer way to do it using Prince?

Jellby
08-09-2012, 05:13 AM
I think about "optically justified text (http://help.adobe.com/en_US/illustrator/cs/using/WS714a382cdf7d304e7e07d0100196cbc5f-63aba.html)". Would this be possible using Prince?

That would be a feature request for Prince... and it's already in the roadmap (http://www.princexml.com/roadmap/).

What also about inserting thin spaces? This can be done using emulation. Is there a nicer way to do it using Prince?

Thin spaces can be used all right. I guess your problem is that you don't use thin spaces in the epub, because some readers/fonts do not support them, and you want to use them in Prince instead...

Depending on how you faked the thin spaces in the epub, you could either preprocess the xhtml files before feeding them to Prince, or set some transformations on-the-fly. For example, if you have <span class="thinsp"> </span>, maybe this works:

span.thinspace { content: "?" }

where ? is either the thin space UTF-8 character or its code in whatever format understood by CSS.

Trouhel
08-09-2012, 05:46 AM
What you call "optically justified text" is but a feature in a set of typographic techniques. It's on Prince's roadmap, but they still have quite a way to go before doing high-quality typography. This said, it certainly is a state of the art software on other matters, but not on that particular point.

So, why not go with what already exists and is a long time proven solution: LaTeX (using pdfTeX as engine) does what it calls "microtypography" (cf. wikipedia on the subject, as a start). I don't no how far it goes with using True and Open Type fonts natively (the SIL XeTeX engine cannot use the microtype package): that's why ConTeXt (using the LuaTeX engine), is to me, besides being much more user friendly, one of the best solution (integration ofLuaTeX into LaTeX is on ongoing process, but, from what I've seen so far, it looks like not yet ready for non-expert end users).

Of course, InDesign does that, and a lot more, but it's a different ball game.

With LaTeX+microtype or ConTeXt, you'll get protrusion (which manages hanging punctuation) and expansion (using Hermann Zapf's HZ algorithm, as InDesign does), and actually some more goodies with LaTeX, such as kerning adjustment, etc. You can read the ConTeXt manual on typography, from p.33, or the documentation for the LaTeX microtype package, if you want to know more.

Now, to be effective, microtypography must be done on a font face par font face basis, because settings that will work for one may lead to a disaster with another. It might even have to be done on a face+script basis, when using different scripts.

So, it always comes back to the same story when it comes to printing quality, in the free open source world that is, and it spells TeX, there's no way around. Now, the choice of one or the other macro-package and PDF engine must be carefully made depending on your own needs (fonts, scripts, typographic features are some of the points that must be given attention).

Don't listen to the "high learning curve" talk and give it a try: getting started is easier than it looks like at first glance.

roger64
08-09-2012, 05:54 AM
EDIT: My mistake. This code is not good. Among other errors, it creates breakable spaces.
I suppressed it. Sorry for this.

I find this not very useful to try to fake thin spaces with the EPUB
- it really clutters the code with tons of spans
- the usual EPUB text display is ragged (non justified) and it would be a little like feeding caviar to a crocodile.

But of course, the nice "Prince" PDF display makes me reconsider this question.

If I did it this way above , what rule should I write instead of content: "?"

@Trouhel: sorry, I saw your post too late. I'll reply later.

Jellby
08-09-2012, 05:56 AM
So, why not go with what already exists and is a long time proven solution: LaTeX (using pdfTeX as engine)

Sure, that's best, but converting an arbitrary ePub to LaTeX or similar is far from trivial.

roger64
08-09-2012, 06:12 AM
My wish is very limited on this field because I am not a professional. If only I was able to adapt TNR and Libertine regular fonts, I would feel satisfied.

But really, I feel a little afraid about Latex and other brothers. A small EPUB on one side, a cathedral on the other...

Heisenberg
08-10-2012, 07:15 AM
The file has a few errors. I suggest you use Sigil to fix it and validate it.

First, image files are missing from the ePub (I see they are only referenced in the CSS, probably in unused styles, though). Second, the title.html has two DTD lines, and they are incorrect. The .html files must be XHTML (which means they must have closing tags, have all tags in lowercase, etc.). Then the problem with footnotes.html is that you have & instead of &amp;

Ok the only error which matters to me is the one with the footnotes.
I don't see missing images or other major problems?!
But I don't understand what you mean by
Then the problem with footnotes.html is that you have & instead of &amp;
I can't find any & without amp in the footnotes.html?! Or was that meant the other way around?

Thanks & Regards

Heisenberg
08-10-2012, 07:30 AM
Ok the only error which matters to me is the one with the footnotes.
I don't see missing images or other major problems?!
But I don't understand what you mean by

I can't find any & without amp in the footnotes.html?! Or was that meant the other way around?

Thanks & Regards

ok, now I see the problem I think. I validated the epub with sigil and the footnotes issue is fixed, but now the images are missing :rolleyes:
don't know what to do to fix the problem :blink:

Jellby
08-10-2012, 07:45 AM
ok, now I see the problem I think. I validated the epub with sigil and the footnotes issue is fixed, but now the images are missing :rolleyes:
don't know what to do to fix the problem :blink:

Since this is unrelated to the epub->pdf conversion, I suggest you create a new thread either here in the ePub forum or in the Sigil one. Your problem with the images was that there were many references in the css file, just remove the unused styles, or add the images.

Heisenberg
08-10-2012, 08:24 AM
Since this is unrelated to the epub->pdf conversion, I suggest you create a new thread either here in the ePub forum or in the Sigil one. Your problem with the images was that there were many references in the css file, just remove the unused styles, or add the images.

Ok, thank you I will do so next week!
It's redicolous, all I wanted to do was exporting a dokuwiki to pdf. If I had known that this will be so complicated I would have given this task to someone else :D

Thanks & Regards

Heisenberg
08-13-2012, 10:29 AM
Since this is unrelated to the epub->pdf conversion, I suggest you create a new thread either here in the ePub forum or in the Sigil one. Your problem with the images was that there were many references in the css file, just remove the unused styles, or add the images.

Hi there,

I fixed the problem and I wanted to inform you about it :)
The problem was, that after validating the epub with sigil, sigil url-encoded the links to the image files! So the links get changed from something like "../Images/privat@images.jpeg" to "../Images/privat%40images.jpeg" which can not be found by prince later...

Thanks for your help, my task is completed :thumbsup:

Darr247
03-27-2013, 01:46 AM
Works great on linux anyway. Has anyone tried it with Cygwin, though? I have my doubts.

Prince does not show up in Cygwin's package manager... haven't taken the time to try installing it manually, yet.

Actually, I've never tried installing anything into Cygwin without using its manager, so I don't even know if the generic linux gz files will work.

PageLab
04-22-2013, 01:11 PM
Hi folks,

I'm getting an error message when running the script. It generates a zero KB PDF file. It happened with every ePUB file I tried. I changed file permissions, but it didn't work. I'm on a Mac.

Does anyone have any clue about it? Any help is appreciated. Here's the error message:


sh epub2pdf.sh test.epub test.pdf
readlink: illegal option -- f
usage: readlink [-n] [file ...]
readlink: illegal option -- f
usage: readlink [-n] [file ...]
readlink: illegal option -- f
usage: readlink [-n] [file ...]
Cannot read file test.epub

Jellby
04-23-2013, 08:51 AM
Well, evidently your readlink version does not support the -f option.

Anyway, this script has not been updated or even looked at for a very long time. Maybe you could try the GUI version, if you can get python and pyQT for your system.

PageLab
04-23-2013, 09:08 AM
Thanks Jellby, I'll try the GUI version.

Jellby
12-26-2013, 07:48 AM
There's a calibre plugin version now!

stevelitt
12-26-2013, 10:55 AM
There's a calibre plugin version now!

Now all we need is an ePub to LaTeX converter, or an Xhtml to LaTeX converter, both of which need to respect and pass through styles rather than trying to render them, and I can write all my books in Xhtml and render them to ePub, PDF or Print.

SteveT

pressmatters
10-21-2014, 02:16 PM
How do you even load an epub to convert in the GUI version? I can't for the life of me work out how it works.

Jellby
10-21-2014, 03:37 PM
Do you mean this version? I'll answer there (where you should have asked).