View Full Version : ePub to TeX to PDF

02-27-2010, 04:34 AM
I assume I probably just need a link to a previous thread, but in my (admittedly lazy) search, I didn't see one...

Given the argument raging in another thread on format and rendering, I thought I'd like to try taking an epub (no DRM), running it through TeX to create a PDF that would look superb on a PRS600 (a fairly standard 6" screen, 600x800 pixels resolution, 167DPI). My problem is that I really wouldn't know where to start, or even what features of typography are likely to be 'nice' - kerning and ligatures pop up, I'm sure they are probably important, and I do kinda know what they are, but they do always sound more like medieval sports injuries than typography, to me.

For me superb would be reasonably small font, very small font, paragraph indentation, portrait format, ability to try with either left justification, full justification with white space expansion, full justification with hyphenation (sorry, I'm not really up on the correct terms for those two, I think the semi-description makes sense...).

A bit of an assumption - there are probably people who already have a set of scripts to do this, for their values of superb rather than mine. I have no issue hacking the script to give adjust these values. I'd just like to try it out, and if I prefer the output to the epub, then to be able to just run all my epubs through this. Hey, as long as I maintain the source then I'm going to be able to reformat if (hehehe, when), I change to a device with different screen attributes.

Technical info: I'm on an OSX 10.6 Mac, have TeX 3.141592, reasonably comfortable with shell scripting, able to hack a bit of perl, python and similar, or at least know how to read enough documentation to get there. (I just realised, TeX is the software where the version number tends towards pi, isn't it?).

Not even sure if this is the appropriate area of the forums to post this request.

Many thanks in advance for any help, suggestions.

02-27-2010, 04:53 AM
If your goal is getting a PDF, you might try my epub2pdf script ( It does not use TeX, but another program called Prince, which can create PDFs directly from (X)HTML, with similar typographic features to TeX. Since you are on OSX and comfortable with shell scripting, it should be easy for you to use it ;)

02-27-2010, 05:29 AM
Jellby - thanks for that, just updating things now, and I'll give it a go.

02-27-2010, 09:42 AM
Jellby's script is quite good, from what I've seen. I found that Prince gives slightly more inconsistent results than pdfLaTeX does, and there are some things i know how to do in LaTeX PDFs that I don't personally know how to do with Prince (footnotes come to mind), but that may be my Prince-ignorance, and it may be that it doesn't really matter with source converted from ePub.

The ease of conversion definitely makes things a plus for using Prince at the moment.

But something using (La)TeX would also be appreciated.

Ahi was working on a script like this -- let's see if I can find the thread.

LaTeX source Packages and Autogeneration (

I'm not sure how far he got with his pacify script. I haven't seen him post recently.

I suppose a script would have to do the following:
1. Unzip the ePub and read the content.opf.
2. Convert any image files or other resources in there to something pdfTeX can handle.
3. Convert the individual xhtml files to TeX source.
4. Put the the pieces back together in a way matching the toc.ncx file in the ePub.
5. Pass the appropriate metadata from content.opf to the hyperref package.
6. Pass the user's specified font, font size and page choices to the appropriate LaTeX packages (e.g., geometry, and perhaps fontspec for XeLaTeX).
7. Run (Xe)(pdf)LaTeX on the results to create the PDF.

The trickiest parts would be (3) and (4).

For (3), most likely the easiest thing would be take some of the existing open source html2tex converters out there (several are listed here) (, and modify it as needed. I've been meaning to look into this further, but haven't had time. One thing I have used is the command line conversion tools from AbiWord (, but does do latex output. There's also the typehtml ( package that will directly typeset html input, but is limited to HTML2 and some HTML3. Unfortunately, I'm not sure anything can handle xhtml that goes significantly beyond html at this point.

(4) is tricky especially if you want it done well, but how tricky it would be might depend on how consistent the ePub source is. It might just be a matter of creating a wrapper document that includes \include commands to insert the parts generated by converting the individual XHTML chunks. Another easy way would be pdfpages, but that would kill hyperlinks, and complicate the ToC-creation process.

This is the kind of thing I'd work on if I had limitless free time, but alas, realistically, as a parent and a person with a job unrelated to this stuff, I don't foresee it happening. Theoretically, however, it shouldn't be too difficult, and jellby's script is certainly good enough (quite good actually) in the meantime.

Anyway, any progress on this end would be appreciated.

02-27-2010, 12:08 PM
Frabjous - thanks for the input. I think I may start looking into this, after all, how hard can it be...

02-27-2010, 10:44 PM
Definitely only one way to find out! Keep us posted!