View Full Version : epub2pdf script, now with GUI!


Jellby
07-06-2010, 03:57 PM
So... someone told me that I should create a GUI version of my epub2pdf.sh script, and after some time I bit the bullet and had a quick learning of python and Qt. This is a first working version of the thing. It not only generates PDF files from ePUB (if you have prince installed), but also creates thumbnails for use in the Cybooks (if you have convert installed). I think the interface is quite self-explanatory, at least for someone who knows more or less what the program is meant to do.

I'm sure there are many things that can be improved or added, and I'm open to suggestions. Something I'd like to do but I don't know how is having the output from prince redirected to some pop-up window (currently, the whole GUI just freezes while it runs), I'd appreciate any help with this.

Umm... as I said, it's a python script with PyQt4, so I guess both of them are needed. I have them in my linux box, and it works here, but I know nothing about MacOS or Windows.

Command line version
Calibre plugin version

EDIT (v 1.1):

Fixed detection of Book CSS
Included default CSS
Added tab to view CSS files in the ePUB
The generated PDF now includes correct title and author (or so I hope)

EDIT (v 1.2):

Fixed some issues in Windows
Fixed PDF metadata
Show only image files for possible covers
Show also HTML files in CSS viewer

EDIT (v 1.3):

Decode percent-encoded filenames
Fixed file path computation for non-UNIX filesystems (again)
Fixed metadata setting (truncate modified file on writing)
Fixed treatment of empty creators

EDIT (v 1.4):

There's apparently a bug in Prince that means links with directory separator do not work correctly in Windows, unless the command line is passed with unix/http-style paths. I hope it works fine with this version.

nyrath
07-06-2010, 05:08 PM
Yes, you will have to install PyQt as well as Python. It is available here:
http://www.riverbankcomputing.com/software/pyqt/download

There is an automated installer for Windows, I fear that Mac OS users will have to read the install instructions.

zelda_pinwheel
07-06-2010, 07:54 PM
wow. impressive and brilliant. thank you jellby. :)

frabjous
07-09-2010, 01:24 AM
Thanks for the new GUI version. Works for me.

I have one quick suggestion. The old version of the script came packaged with a default extra CSS you could use and modify to your preferences. I think it would be nice if there were something similar here. Perhaps if you clicked the CSS tab, you'd have something filled in for you by default, which you could modify if you wished: or perhaps, if that's too hands-on, something like that where it is all commented out but you could uncomment suggested options as you wished.

I don't think you can expect to provide a full GUI for people who know no CSS whatever to just choose all the CSS options there are. But especially when it comes to Prince-specific CSS like setting the page sizes and hyphenation options, etc., it would be nice to see a sample. Defaulting to letter-sized paper with very little margins isn't likely to help anyone.

Further along the same lines, you might even show the user the CSS files already contained in the ePub and let them modify them too directly, though I can imagine that would be difficult to implement.

Still, it's a great script as is. Thanks again!

Jellby
07-09-2010, 12:52 PM
I have one quick suggestion. The old version of the script came packaged with a default extra CSS you could use and modify to your preferences.

I didn't include it here, but if you have a file named "default.css" in the $HOME/.epub2pdf directory, it will be read and used (it should appear as another tab in the GUI). I'll think about how to do this more obvious and user-friendly.

Further along the same lines, you might even show the user the CSS files already contained in the ePub and let them modify them too directly, though I can imagine that would be difficult to implement.

Yes, that would be useful too, especially for examining the class names used in the ePUB... I'll think about it too, probably a read-only view of *.css files in the ePUB would be enough.

Jellby
07-12-2010, 02:10 PM
I've updated the script with some improvements:


The generated PDF has now correct metadata.
If no $HOME/.epub2pdf/default.css exists, the "Default CSS" tab shows some default code
It's now possible to view (but not edit) all *.css files included in the ePUB.

mmq
07-17-2010, 11:01 AM
I keep getting "NameError: global name 'meta' is not defined" on line 205 as an error. Anybody else?

My guess:

change line 205 from
if meta.getAttribute("name") == "author":
to
if child.getAttribute("name") == "author":

Seems to produce the right output (with author and title correctly set).

Thanks a ton for epub2pdf.sh and the new GUI!

Jellby
07-17-2010, 12:27 PM
My guess:

change line 205 from
if meta.getAttribute("name") == "author":
to
if child.getAttribute("name") == "author":

Good guess! It was one of the many bugs in the script :) That's only a problem if the first XHTML file has an author in the head, which is not always the case.

Valloric
07-25-2010, 07:23 AM
Great work Jellby! Very nice indeed. :)

But you won't get many Windows users without an installer. For Windows, you should freeze the python script into an exe and bundle the dependencies into the installer.

Hey I'm perfectly satisfied with the bash script which I use on my Ubuntu box, I'm just saying the average Windows user needs a bit more hand-holding. :)

Jellby
07-25-2010, 07:31 AM
And I don't say you are not right. But I don't even have a Windows computer to test it, so that's something I can't do at the moment :)

nyrath
07-26-2010, 03:32 PM
Yes, one can freeze the python script into a Windows exe by using a program called Py2Exe, but this is a complicated process not for the faint of heart. (the corresponding program for a Macintosh is Py2App)

kovidgoyal
07-26-2010, 03:42 PM
Yeah, one of the biggest annoyances (from the perspective of a linux developer) is how painful it is to make "installers" for other platforms. I remember I started out with py2exe/py2app for calibre. But they proved insufficient as calibre grew and I ended up writing my own replacements for them.

viktorz
09-08-2010, 04:16 PM
Hello, and thanks for a script.

I have succesfully ran it under Windows 7, although got to change couple lines to fix some windows-specific issues (like back-slash paths and file permissions).

But I have a problem with links (like endnotes). Looks like Prince resolved relative urls in the source like "notes.html#note_1" to a absolute paths of temporary files like "c:\users\blabla\temp\blabla\notes.html#note_1" which obviously doesn't work in the resulting pdf. Is it a common problem? Or is it just me? Or my particular book? Or my Windows 7? Can we somehow tell Prince to make all links internal?

Jellby
09-09-2010, 11:41 AM
I have succesfully ran it under Windows 7, although got to change couple lines to fix some windows-specific issues (like back-slash paths and file permissions).

Argh. It seems I missed the path separation issue in one place, and I overlooked this piece of information about the NamedTemporaryFile procedure:

"Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later)."

I think I've fixed it now, could you try the new 1.2 version? You may have to modify the "prince" filename again, or include in your system's path.

But I have a problem with links (like endnotes). Looks like Prince resolved relative urls in the source like "notes.html#note_1" to a absolute paths of temporary files like "c:\users\blabla\temp\blabla\notes.html#note_1" which obviously doesn't work in the resulting pdf. Is it a common problem? Or is it just me? Or my particular book? Or my Windows 7? Can we somehow tell Prince to make all links internal?

It doesn't happen to me, intra-ePUB links are transformed into intra-PDF links. Is the "notes.html" file included in the ePUB (and in the <spine>)? If it isn't, it's not passed to Prince, and it's not included in the PDF. If it is, and you see the contents in the PDF, but it's only the links that are not working: can I see a sample ePUB?

viktorz
09-09-2010, 04:11 PM
I think I've fixed it now, could you try the new 1.2 version?

No, it is not quite right yet. Please see file attached, specifically the method "file" that I have changed. See, I just replaced the slash? That makes it work, otherwise reading from zip fails because of the wrong slash:
Traceback (most recent call last):
File "C:\Utils\epub2pdf\epubutils_2.py", line 493, in loadFile
self.fileLoaded(unicode(filename))
File "C:\Utils\epub2pdf\epubutils_2.py", line 497, in fileLoaded
self.epub = ePUB(filename)
File "C:\Utils\epub2pdf\epubutils_2.py", line 66, in __init__
self.readMetadata()
File "C:\Utils\epub2pdf\epubutils_2.py", line 110, in readMetadata
pages = int(math.ceil(zfile.getinfo(file).compress_size/1024.0))
File "C:\Python27\lib\zipfile.py", line 818, in getinfo
'There is no item named %r in the archive' % name)
KeyError: "There is no item named u'OPS\\\\cover.xhtml' in the archive"

viktorz
09-09-2010, 04:17 PM
It doesn't happen to me, intra-ePUB links are transformed into intra-PDF links.

I think I have figured it out. The problem was two-fold

First, apparently there is a problem when prince gets absolute paths to the files in command-line. So I have modified script to pass file names only, and changed workdir for the prince in the Popen call. Again, see attached in the previous post. That produces working links - sometimes.

Now there is another problem, when the link happens to refer to an empty span. It depends on a book, obviously. I think it's a known bug in the prince.

Jellby
09-10-2010, 09:55 AM
No, it is not quite right yet. Please see file attached, specifically the method "file" that I have changed. See, I just replaced the slash? That makes it work, otherwise reading from zip fails because of the wrong slash:

Hmm... I guess I had read it backwards. If I now understand it correctly, the problem is I was generating the pathnames with the OS-specific separator ("\" for windows), but zipfile's getinfo() wants them with "/" always? What I don't quite understand then is why prince works when you feed it the filenames with "/".

Could you run the attached file (load an ePUB and convert it to PDF) and copy whatever it outputs to the terminal? I'd like to see what's happening exactly to the file paths, or maybe you can explain it :)

Regarding you other changes, I've incorporated them, but haven't uploaded it yet.

viktorz
09-11-2010, 01:52 AM
Hmm... I guess I had read it backwards. If I now understand it correctly, the problem is I was generating the pathnames with the OS-specific separator ("\" for windows), but zipfile's getinfo() wants them with "/" always? What I don't quite understand then is why prince works when you feed it the filenames with "/". I guess Prince is smart enough to take care of slashes. As a command-line parameters, those file paths are coming into the Prince as strings, right? And Prince is free to do with those strings whatever it wants. And if Prince is, like, "hey, those strings supposed to be file paths, but I am under Windows now, so I'd better make sure all slashes are right ones... I mean wrong ones..." - that would explain it, right?

Could you run the attached file (load an ePUB and convert it to PDF) and copy whatever it outputs to the terminal? I'd like to see what's happening exactly to the file paths, or maybe you can explain it :)
Here you are:
OPS\cover.xhtml
OPS\ch1.xhtml
OPS\ch2.xhtml
OPS\ch3.xhtml
OPS\ch4.xhtml
OPS\ch5.xhtml
OPS\ch6.xhtml
OPS\ch7.xhtml
OPS\ch8.xhtml
OPS\ch9.xhtml
OPS\ch10.xhtml
OPS\ch11.xhtml
OPS\ch12.xhtml
OPS\ch13.xhtml
OPS\ch14.xhtml
OPS\ch15.xhtml
OPS\ch16.xhtml
OPS\ch17.xhtml
OPS\ch18.xhtml
OPS\ch19.xhtml
OPS\ch20.xhtml
OPS\ch21.xhtml
OPS\ch22.xhtml
OPS\ch23.xhtml
OPS\ch24.xhtml
OPS\ch25.xhtml
OPS\ch26.xhtml
OPS\ch27.xhtml
OPS\ch28.xhtml
OPS\ch29.xhtml
OPS\ch30.xhtml
OPS\ch31.xhtml
OPS\ch32.xhtml
OPS\ch33.xhtml
['C:\\Program Files\\Prince\\Engine\\bin\\prince.exe', '-v', '-s', 'c:\\users\\myname\\appdata\\local\\temp\\tmpxt9o1 b\\tmp6gghkl', '-o', 'C:\\Utils\\epub2pdf
\\epub2pdf.pdf', u'OPS/cover.xhtml', u'OPS/ch1.xhtml', u'OPS/ch2.xhtml', u'OPS/c
h3.xhtml', u'OPS/ch4.xhtml', u'OPS/ch5.xhtml', u'OPS/ch6.xhtml', u'OPS/ch7.xhtml
', u'OPS/ch8.xhtml', u'OPS/ch9.xhtml', u'OPS/ch10.xhtml', u'OPS/ch11.xhtml', u'O
PS/ch12.xhtml', u'OPS/ch13.xhtml', u'OPS/ch14.xhtml', u'OPS/ch15.xhtml', u'OPS/c
h16.xhtml', u'OPS/ch17.xhtml', u'OPS/ch18.xhtml', u'OPS/ch19.xhtml', u'OPS/ch20.
xhtml', u'OPS/ch21.xhtml', u'OPS/ch22.xhtml', u'OPS/ch23.xhtml', u'OPS/ch24.xhtm
l', u'OPS/ch25.xhtml', u'OPS/ch26.xhtml', u'OPS/ch27.xhtml', u'OPS/ch28.xhtml',
u'OPS/ch29.xhtml', u'OPS/ch30.xhtml', u'OPS/ch31.xhtml', u'OPS/ch32.xhtml', u'OP
S/ch33.xhtml']

Jellby
09-11-2010, 05:13 AM
OK, so apparently the only source of pernicious "\" is the os.path.join, when used to read files from the ZIP. Then the best solution is probably not using os.path.join, but simply adding a hard-coded "/" when needed. I've done this in a few places. For the actual prince run, even though it seems it's not needed, I've replaced "/" with the OS-specific path separator, just to be "cleaner".

Please try this and tell me how it works.

viktorz
09-11-2010, 05:53 AM
Works OK

tonhou
10-20-2010, 08:58 PM
I have a question concerning the table of content generated.
At present my reader (Jetbook) can only read one level of TOC (no sub-levels), other wise it crashes the reader. The epub2pdf generator does a superb job of creating a TOC, but I would like to limit it to the top level.
Is this possible? What do I need to do?

Thanks.
--Tony

Jellby
10-21-2010, 08:01 AM
The epub2pdf generator does a superb job of creating a TOC, but I would like to limit it to the top level.
Is this possible? What do I need to do?

Do you mean the "bookmarks" that are created for the PDF, and which you can display or hide in Acrobat for instance? Those are created by Prince, have a look here (http://www.princexml.com/doc/7.0/pdf-bookmarks/). You could add "h3 { prince-bookmark-level: none; }" or similar to the additional CSS, it depends on how the specific ePUB is coded.

If you mean an inline TOC, which appears just as a normal page (or many pages) in the PDF, then that's hard-coded in the ePUB, and you'd have to edit it.

tonhou
10-23-2010, 01:35 AM
Thank you for the reply.
Yes I believe I meant the bookmarks that are created as a popup TOC in my reader (and as a side bar in Linux pdf readers).
I have tried your suggestion and it did bring some changes, though I am still a bit bewildered as to how to get just the top level eg:
Chapter 1 -->Headings -->Sub-headings
Chapter 2 -->Headings --> Sub-headings
Chapter 3 --> Headings --> Sub-headings
etc

I only want Chapter 1, Chapter 2, Chapter 3 and none of the other nested headings.
I shall keep experimenting and see if I can happen on to it.

Thanks again.
--Tony

Jellby
10-23-2010, 05:22 AM
You'd have to take a look at the code (which you can do selecting the html files in the drop-down list next to "CSS files" in epubutils.py) and see what is used for Chapter 1 (maybe <h1>Chapter 1</h1>), what for Headings (maybe <h2>Headings</h2>) and what for Sub-heading (maybe <h3>Sub-heading</h3>). Then you should modify the CSS rules passed to prince accordingly, by adding "prince-bookmark-level: none;" to the elements you don't want in the TOC.

insomniac-
03-20-2011, 03:53 PM
Hi people,
maybe it's a too old thread, but here it is. I needed to convert from ePUB to PDF in a decent quality, and I landed onto this forum. The tool written by Jellby works fine and has a nice GUI, but I had troubles converting a number of ePUB files. So I modified the code (it required the Author field to be non-empty), and also added some code to remove the annoying PrinceXML annotation on the first page of the PDF. I also created a python package (by adding a setup.py script and splitting the code into library and start script). If Jellby or anyone is interested, let me know and I'll publish the code.

Bye :)
insomniac

insomniac-
03-20-2011, 03:57 PM
So I modified the code (it required the Author field to be non-empty), and also added some code to remove the annoying PrinceXML annotation on the first page of the PDF.
Forgot to mention, for the annotation stripping it requires the pyPDF package. If it's not available on the system, it will silently skip this operation.

insomniac

Jellby
03-21-2011, 03:40 PM
I would like to see the modifications at least, and incorporate them in my version if you don't mind :)

Except for removing the Prince logo. It's good if you can do it yourself (and I have something similar too), but since the Prince people offer a free version and expect those wanting the "full" version (without the logo) to pay, I think it's rather unfair to them to offer such a tool pre-packaged.

insomniac-
03-22-2011, 08:16 AM
Excluding the logo removal, the project didn't change much, except for a re-packaging and a try..except on the author name. I'm sending you a PM with the link to the sources, feel free to take only what you need and include in your project :-)

Cheers,
insomniac

Jon Hurst
12-20-2011, 07:54 AM
Jellby - I've been looking for this for ages. Works great! I was just about to start down a route of XSLT and FO - you've probably saved me about a fortnight's work. Thanks.

Jellby
01-18-2013, 02:35 PM
There was a problem with links in Windows, I think it's fixed in version 1.4.

Jellby
12-26-2013, 08:51 AM
There's a calibre plugin version now!

Jellby
10-21-2014, 04:38 PM
Responding to a question in another thread: to load an epub use the menu item "Load", or simply pass the filename as an argument if using the command line.