View Full Version : ePub to PDF, passing through TeX


Trouhel
08-09-2012, 10:36 AM
@roger64: 1) Neither am I a professional when it comes to publishing (but I do have a fairly wide background in computing): I'm but learning step by step, from resources on the Web. 2) Then you might try writer2latex or writer4latex someday (never did, I don't have LaTeX installed). No Libertine though without XeTeX. There's a "legacy" package that can use Biolinum. And a TexGyre Termes font built from a Times New Roman clone (because of licensing issues). But there's a bunch of pretty good fonts available, see the "Latex Font Catalog" website.

@Jellby: True. But with some restrictions, on CSS mostly, it becomes fairly easier. And you get a typesetting engine, that is printing quality, where Prince, as far as I can see, doesn't give you much more than "export as PDF" in Word or OO. It all depends on what you're doing PDF for.

roger64
08-10-2012, 02:28 AM
@Trouhel

Thanks for infos and advices.

As a Linux user, my workflow begins with OpenOffice. I already use writer2xhtml to convert to EPUB (writer2xhtml is a part of writer2latex) and Sigil to complete it (dropcaps, others...). I tried recently to use whatever but failed to install it. All my current process is very basic and Prince produced PDF, once understood, are equally basic.

My wish would be to learn how to produce ebooks adding some few advanced microtypography features (extrusion), thin spaces, ligature fonts... In the world of ebooks, these features are seen like luxury (professional) items.

But I am frightened to take Latex on my own just for this purpose. Knowing that my wish concerns only ebooks (no magazines, or sites) in EPUB or PDF formats, I would gladly reconsider the process above if there was also a plain and direct way to do these kind of things.

Trouhel
08-10-2012, 05:58 AM
To make it short (detailing would be very technical), there's no ready made solutions to your wishes yet. Writer2latex + XeTeX is the closest I can think of, but I gave it a look and it looks (given my basic knowledge of LaTeX: I'm really into ConTeXt which goes a different way) definitely requires, even if slightly, getting into the LaTeX code it produces before you can feed it to LaTeX to get a PDF. It would not do everything either:

extrusion (?: do you mean "protrusion" -- let's stick to Adobe nicely coined "optical margins") OK
ligature : that's OTF/TTF job OK
thin space: I assume not if not available in the fonts (usually they lack the "quart de cadratin"--if I remember well--needed for french). Not sure about workarounds such as using non-breaking space + a smaller font (you may certainly find an answer to that on the Web)
drop caps: I have no ideas

Stick to Prince for the time being (there is a free, Open-Source, very well conceived, equivalent called WeasyPrint, but looks not that easy to install). I'm overly critical because I believe typography deserves the best treatment, but it's state-of-the-art on everything else.

There seems to be quite a lot going-on around Open Source electronic publishing: so one can hope a solution might come out in a not too far future. And I'm pretty sure it will have to resort, one way or the other to TeX.

Jellby
08-10-2012, 07:23 AM
thin space: I assume not if not available in the fonts (usually they lack the "quart de cadratin"--if I remember well--needed for french). Not sure about workarounds such as using non-breaking space + a smaller font (you may certainly find an answer to that on the Web)

In LaTeX thin spaces are dealt with automatically with the babel package. You don't have to actually write the spaces before ";", "?", etc., they are automatically inserted (see [thread=19978], for example).

there is a free, Open-Source, very well conceived, equivalent called WeasyPrint

Thanks for the pointer. I'll keep an eye on it.

Trouhel
08-10-2012, 10:05 AM
In LaTeX thin spaces are dealt with automatically with the babel package. You don't have to actually write the spaces before ";", "?", etc., they are automatically inserted (see [thread=19978], for example).


Did you do that with pdfTeX or XeTeX ?

Last time I used LaTeX was around some 20 years ago, so I'll definitely won't be affirmative on the matter but:

My answer to roger64 was specifically about XeTeX, which replaces babel with polyglossia. Whether a) babel and XeTeX are compatible b) polyglossia does thin spaces the way babel does, are both unclear to me. Some say Yes, others No ... and I'm not going to install the whole stuff to give it a try.

Now, if doing only latin, we might probably agree that one will get better result with pdfTeX than with anything else. Provided you're ready to go with available fonts (there's enough for various tastes, but not mine).

Jellby
08-10-2012, 10:26 AM
Did you do that with pdfTeX or XeTeX ?

It was with pdfTeX (pdfeTeX, actually, I believe). I have no experience with XeTeX or ConTeXt.

Trouhel
08-11-2012, 12:55 PM
@roger64:

Posting on a (not so) sunny saturday afternoon before I forget about it.

If you're interested, you can send me (use the address given at the end of the doc - I'll AR you when received) as zip or tarball, both the .odt and "saved as HTML" version (as it might version from one OO an another) of some part of one of your books (with PNGs or JPGs if any, no fonts. If I can do any better, I'll use different ones).

I'll run that through the latest uploaded "whatever", on Windows and then send you back the PDF, together with the odt, if there was anything I had to modify to get it to work. You could then get a (rough, because I still have to work on tuning, especially for microtypography) idea of how it differs from a Prince output. May be we might even discover there are things that Prince does better.

Trouhel
08-11-2012, 12:58 PM
correction to the previous post : use the address at the end of the French doc (It is faulty in the English one)

Trouhel
08-11-2012, 01:06 PM
one more correction (sorry for that, it's saturday, I'm really busy at something else) : start your email subject with "[whatever]", if not, it's going to be filtered.

roger64
08-11-2012, 02:28 PM
@Trouhel

Thanks very much for your interest. Here is what you asked for with some comments:
(deleted link)

It's not really easy to get in touch with you... :)

Trouhel
08-13-2012, 01:46 PM
@roger64

PDFs (I also made an A4, as I felt stuff such as "optical margins" don't make much difference), have been ready since very early Sunday, but my graphic card died in the meantime. I bundled them with the ConTeXt source file, so this may give you a rough idea of the kind of work involved to do it by hand. There are differences with your version though: I don't do cover page, drop caps. Provide me a return address at whatever point ebooks arobase orange point fr (this box too will go soon, see below), I'll send it to you.

Interesting experience, though, (and the end of "whatever", I deleted the repository as soon as me computer went back to work). I do have a few stumbling blocks, the biggest one being: what if Open Office feeds me some HTML I cannot make sense of. Which is what happened. Somewhere in your text, what should read <p>* *<br>*</p> is rendered <p></span>* *<br></p>. Not unlawful, may be. But I cannot that easily drop an unopened closing tag, without being sure it's not actually hidding something else, if not even a bug. Open Office HTML is admittedly not so good, but usually follows some logic: this one has no obvious reason, it's the first time I ever see such a thing, and there was now way I could get rid of it (even delete an redo the par in OO would lead to the same result. I ended up deleting the unfortunate para.

Next stumbling block is font families. It's not always that easy to gather all faces in a family together: there I had to make choices, either go OO way, which ends sometimes with such as two families with two faces each instead a single one of four faces. As I was more concerned with correct HTML giving correct results, I decided than when you what to do bold, <B> should work instead of having to do <FONT FACE="... Bold">. The consequence is that font families I see them do not, in some case, have the same name Open Office, though being perfectly correct in its own ways, decides they have, which
neither mean the fonts themselves are wrong. I ran into that as I had to use a substitute family for Times New Roman, since I don't have that one. I had to modify the fonts, I already had to convert to TTF to use with OO, as a work around.

So given, I also ran into one or two "whatever" bugs-the kind that can be fixed on the spot-well, "whatever" is gone. I've given up trying to convince than OO->TeX can be done, and that the way I settled trying to do it works.

Jellby
08-14-2012, 05:31 AM
I've split these posts from the Prince thread, since they are not about the script, but about a different workflow. I hope you don't mind.