View Single Post
Old 02-27-2010, 09:42 AM   #4
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
Jellby's script is quite good, from what I've seen. I found that Prince gives slightly more inconsistent results than pdfLaTeX does, and there are some things i know how to do in LaTeX PDFs that I don't personally know how to do with Prince (footnotes come to mind), but that may be my Prince-ignorance, and it may be that it doesn't really matter with source converted from ePub.

The ease of conversion definitely makes things a plus for using Prince at the moment.

But something using (La)TeX would also be appreciated.

Ahi was working on a script like this -- let's see if I can find the thread.

LaTeX source Packages and Autogeneration

I'm not sure how far he got with his pacify script. I haven't seen him post recently.

I suppose a script would have to do the following:
1. Unzip the ePub and read the content.opf.
2. Convert any image files or other resources in there to something pdfTeX can handle.
3. Convert the individual xhtml files to TeX source.
4. Put the the pieces back together in a way matching the toc.ncx file in the ePub.
5. Pass the appropriate metadata from content.opf to the hyperref package.
6. Pass the user's specified font, font size and page choices to the appropriate LaTeX packages (e.g., geometry, and perhaps fontspec for XeLaTeX).
7. Run (Xe)(pdf)LaTeX on the results to create the PDF.

The trickiest parts would be (3) and (4).

For (3), most likely the easiest thing would be take some of the existing open source html2tex converters out there (several are listed here), and modify it as needed. I've been meaning to look into this further, but haven't had time. One thing I have used is the command line conversion tools from AbiWord, but does do latex output. There's also the typehtml package that will directly typeset html input, but is limited to HTML2 and some HTML3. Unfortunately, I'm not sure anything can handle xhtml that goes significantly beyond html at this point.

(4) is tricky especially if you want it done well, but how tricky it would be might depend on how consistent the ePub source is. It might just be a matter of creating a wrapper document that includes \include commands to insert the parts generated by converting the individual XHTML chunks. Another easy way would be pdfpages, but that would kill hyperlinks, and complicate the ToC-creation process.

This is the kind of thing I'd work on if I had limitless free time, but alas, realistically, as a parent and a person with a job unrelated to this stuff, I don't foresee it happening. Theoretically, however, it shouldn't be too difficult, and jellby's script is certainly good enough (quite good actually) in the meantime.

Anyway, any progress on this end would be appreciated.

Last edited by frabjous; 02-28-2010 at 12:31 AM.
frabjous is offline   Reply With Quote