![]() |
#1 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
Prince XML for creating mobile reader-sized PDFs?
I've started to experiment cross-platform (Windows, Mac, Linux, FreeBSD, etc.) commandline tool Prince XML for creating liseuse-sized PDFs from HTML or XML source.
It's free to download and run for personal use, though it does add a small watermark on the first-page. My road there has been the following. Those of you who followed the "PDF is not an ebook format" thread know that some of us who do still think PDF is an ebook format have been disappointed with the typography quality delivered by the renderers for most standard reflowable formats like ePub and mobi. In particular these tend not to support such things as end-of-line hyphenation, ligatures, font kerning, widow and orphan control, embedding font subsets, etc. But at the same time we all must admit that the ability to customize/change the font size and page properties is a desirable feature. This lead us to discuss what options there might be for the best of both worlds. At least presently, it seems that the best looking ebooks are those generated by something like (pdf)LaTeX, made especially for the size of the device in question, and the font size preferred by the user. Some of us are still looking into the possibility of automating the process of generating appropriately-sized PDFs from LaTeX code, as in this thread. One stumbling block is that LaTeX uses its own mark-up language, whereas most other ebook formats are HTML or XML based, and while conversion is certainly possible, it's unclear if any converters right now work well enough that the resulting code wouldn't have to be manually checked and corrected In researching the use of LaTeX for creating ebooks, I discovered that Feedbooks used to do something like this with LaTeX, but has switched to Prince XML -- so I decided to experiment with it myself. Some interesting features:
The results are interesting, so far. I don't think the results are as good as LaTeX, but it may just be that I'm less familiar with it. Still the possibility of more easily incorporating it within a conversion script--for example, from ePub--(their website even gives instructions on how to call it from within various programming/scripting languages), makes exploration of it worthwhile, in my opinion. I'd be very interested in hearing about anyone else's experience with it, or opinion about its prospects. My own experiments are just beginning, but I'll post some initial results in the next post. Last edited by frabjous; 09-12-2009 at 01:35 AM. |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
All right, just to give some initial results... I've been working on making Bertrand Russell's Introduction to Mathematical Philosophy (a public domain title) available in several different formats. To this end, I first generated HTML source, which I used as a master document for creating .mobi and .ePub files.
I posted these screenshots in the PDF is not... in this post (which also had some additional screenshots), but to repeat: Here's how my first attempt at a mobi looked like as a screenshot on a Kindle: ![]() Things to note: it looks like crap generally, and the overlining (which you'll see below) doesn't even work since mobi doesn't support it. (I've since had to modify the original notation there... I hate mobi files.) ePub is an improvement, since you can at least overline. Here's a screenshot from ADE from the ePub version: ![]() This looks better than on my Sony, where I can't get full justification, but it still looks pretty bad. The variables are not in true italics, but a slanted roman, and hence run into the Sheffer strokes next to them. The Sheffer strokes are not properly spaced. I took my HTML source and converted it to TeX, and can now make various sized PDFs from that source. Here's what it now looks like if I use 12pt Bitstream Charter for the font and size it for my reader: ![]() Very nice.. hyphenation, kerning, proper mathematical spacing, and a number of other improvements it's hard to fully list. Nevertheless, converting it to LaTeX was a fair amount of work that I couldn't have fully scripted. However, with Prince XML, I could have gotten pretty decent results just by sticking in a few things in the CSS of the HTML, in particular, adding just: Code:
@page{ size: 90mm 120mm; margins: 2mm 2mm 2mm 2mm} body { hyphens: auto; font-family: Charis SIL } The result after running Prince, for the same page of the above book, looks like this: ![]() Here's one page earlier, so you can see what a page of just text looks like: ![]() This isn't as good as the LaTeX, I'll admit, but I haven't really put 1/100th as much work into it. I could probably do the Sheffer Stroke spacing better with some MathML, and I might even be able to do the original pagination in the margins, as in the LaTeX versions with the right code (--it has a lot of interesting options--) etc. The line spacing gap created by the footnote marker is definitely unsightly, but again, maybe with some tweaked CSS this could be fixed. Still, it's much better than the original ePub (at least as displayed by ADE or on my Sony), and infinitely better than the .mobi. We've got hypenation, kerning, a nicer looking font, true italics, justification that will work even on my Sony, etc. Changing the font or font size would just be a matter of making one minor change to the CSS before running Prince. Indeed, some of this would be easier to automate than with LaTeX. If we could get a script working with it to extract the (X)HTML from an ePub and convert it via Prince, I wonder if I'd ever use ePubs on my reader again... I may even begin working on such a script, perhaps even with a GUI, despite my very limited programming skills. Then again, I may not have enough freetime to do anything of the sort. Last edited by frabjous; 09-12-2009 at 01:23 AM. |
![]() |
![]() |
![]() |
#3 | |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,570
Karma: 20150435
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Prince XML looks quite interesting, especially if they're responsive in introducing new features (a pity it isn't open source).
Quote:
|
|
![]() |
![]() |
![]() |
#4 |
Created Sigil, FlightCrew
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
|
|
![]() |
![]() |
![]() |
#5 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
My lack of faith in XML based solutions (or at least proprietary ones) comes primarily from my expectation that sooner or later (knowing me, rather sooner) I'll come across something they haven't thought of... and I'll either have to choose another tool, or try to hack it using the features/tools they already have available, which will probably (given unintended usage) look like crap.
Not to beat a dead horse... but Hungarian Runes written in boustrophedon *is* of genuine and ongoing relevance to me. So is Hanzi with good quality typography. So is weird stuff like being able to put arbitrary accents or subaccents on just about any character, whether latin, cyrillic, or even Hanzi. These things are just off the top of my head... and I suspect all three of them make PrinceXML a non-starter for me. Not to mention far simpler that I suspect they have no proper support for: Hungarian hyphenation. Including properly hyphenating doubled digraph consonants, and having some way to differentiate genuine doubled digraphs from a non-doubled digraph merely sitting beside the "wrong" single letter. e.g.: boccsal -> bocs-csal bérccsoport -> bérc-cso-port sasszárny -> sas-szárny hosszú -> hosz-szú et cetera Or am I underestimating the tool? - Ahi Last edited by ahi; 09-12-2009 at 10:16 AM. |
![]() |
![]() |
![]() |
#6 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
No worries, Ahi... I haven't given up on contributing to a robust LaTeX for eBook automation script; I'm just exploring Prince as something that might fill a gap in the meantime, especially since it might not take much effort to get a pretty decent script for it together.
Jellby, thanks for recommending your ePub script -- it did actually cross my mind. I'll take a closer look when I get a chance. Very busy with other things right now, unfortnately... which is tough when you're excited about "fun" projects like these. Another interesting thing I noticed is Prince 7.0beta automatically uses ligatures. Neat. |
![]() |
![]() |
![]() |
#7 |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,570
Karma: 20150435
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
I've tried Prince XML and it's quite easy and powerful, at least at first sight. As a test, I've done the conversion of The Picture of Dorian Gray, and the result is attached.
All that was needed was this user css: pdf_output.css Code:
@page { size: 9cm 12cm; margin: 5mm 1mm 1mm 1mm; @top-left { border-bottom: solid 0.2pt #000; margin-bottom: 1mm; content: ""; } @top-center { font-size: 60%; font-style: italic; border-bottom: solid 0.2pt #000; margin-bottom: 1mm; content: string(chaptitle); } @top-right { font-size: 50%; border-bottom: solid 0.2pt #000; margin-bottom: 1mm; content: counter(page) "/" counter(pages); } } @page:first { margin: 1mm 1mm 1mm 1mm; @top-left { border-size: 0; margin: 0; content: normal; } @top-center { border-size: 0; margin: 0; content: normal; } @top-right { border-size: 0; margin: 0; content: normal; } } @page title { margin: 1mm 1mm 1mm 1mm; @top-left { border-size: 0; margin: 0; content: normal; } @top-center { border-size: 0; margin: 0; content: normal; } @top-right { border-size: 0; margin: 0; content: normal; } } /* specific code for this image */ @page cover { size: 7.98cm 12cm; margin: 0 -2.01cm; /* make the virtual width 12cm */ } body { font-size: 9.9pt; font-family: serif; text-align: justify; prince-image-resolution: 166dpi; hyphens: auto; prince-text-replace: " – " "—" /*replace em-dashes*/ "st" "s\FEFFt"; /*disable st ligatures*/ } body.cover { page: cover; } div.header { string-set: chaptitle content(); } div.title, div.edition { page: title; } div.edition { float: bottom; } p.logo { display: none; } div.toc a { text-decoration: none; } div.toc a::after { content: leader('. ') target-counter(attr(href), page); } h1 { prince-bookmark-level: 1 } And then on the directory with the ePUB files uncompressed, I ran: Code:
prince OEBPS/Cover.xhtml OEBPS/Title.xhtml OEBPS/Contents.xhtml OEBPS/Preface.xhtml OEBPS/Chapter-*.xhtml -s pdf_output.css -o test.pdf ![]() Now I have to separate the "universal" stuff from the settings and classes particular to this ebook or to my coding style. I'd like to place the latter in a separate css file in the ePUB, and maybe use some metadata container for it, then a converter could use this for automatically convert the ePUB to PDF... By the way, the logo watermark in the first page is quite easy to remove if you output the PDF with --no-compress (the pdf can later be compressed with pdftk). Last edited by Jellby; 09-13-2009 at 04:42 PM. |
![]() |
![]() |
![]() |
#8 | |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 80,655
Karma: 150249619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
![]() |
![]() |
![]() |
#9 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
|
![]() |
![]() |
![]() |
#10 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Quote:
Obviously it has its benefits, and Feedbooks definitely shows its value very well. - Ahi |
|
![]() |
![]() |
![]() |
#11 |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,570
Karma: 20150435
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
OK, here it is, the first version of epub2pdf, a bash script for converting ePUB books to PDF. Since it's a bash script, it needs bash (doh!), that means linux users will have it easy, for MacOS users it will be probably similar, but Windows users will have to install Cygwin for the moment (although it should be easy to translate the script to Windows...)
I've tried it with some ePUBs of my own, as well as generated by Calibre, and uploaded by Zelda and Abecedary, and it seems to work great! These are the usage notes: Code:
epub2pdf.sh [options] input.epub output.pdf Where the options are: -s "style.css" Use "style.css" as stylesheet instead of the default ~/.epub2pdf/default.css -v Verbose output -h Show this help I added a feature to use a book-specific stylesheet if found. This stylesheet should be included in the .epub and referenced thus: 1.- Include a .css with rules and selectors for Prince XML. These are not going to be used in the normal ePUB rendering, only when processing with Prince XML, so you can use everything supported by Prince XML (use !important to override the standard css rules). 2.- As with every file you include in the epub, there must be an entry in the <manifest> (in the .opf file). 3.- Add a <meta name="prince-style" content="XXXXX"> to the <metadata> block of the .opf file, where "XXXXX" is the id of the above .css file. That's all, epub2pdf will use this .css file included in the .epub in addition to the default.css or whatever you use. As an example, I'm updating the The Picture of Dorian Gray upload. Please, try it and tell me what you think! EDIT: Script updated to version 2.0 (now it uses XMLStarlet to process the metadata and "pdf-style" has been changed to "prince-style"). EDIT: Now updated to version 3.0 EDIT: The script is now available here. Last edited by Jellby; 11-22-2009 at 07:52 AM. |
![]() |
![]() |
![]() |
#12 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
|
Quote:
(Can't check it until later today... so forgive the question, if it is a dumb one.) - Ahi |
|
![]() |
![]() |
![]() |
#13 |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,570
Karma: 20150435
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Sort of. The good thing about Prince XML is that it works on standard XHTML files, so nothing has to be changed in the source ePUB. All the script does, actually, is uncompress the .epub file and call Prince XML on all the files in the spine on the right order. The formatting is done through .css files.
|
![]() |
![]() |
![]() |
#14 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
Wow, great job, Jellby. I had begin playing around with your epub-read script as a starting place, but I had a feeling you'd beat me to the punch.
Ahi, I haven't studied the script in too much detail, but it looks simpler still. It just extracts the (X)HTML source of the ePub, reads the contents of its spine and table of contents, and then processes those files in that order --- (you can include multiple files in a single PDF with Prince) -- adding only a CSS file that controls the page layout and some defaults (fonts, hyphenation pattern, etc.). (EDIT: oops... didn't see Jellby's reply...) This seems to work well. Some notes though: 1. Dont' know what linux distro you're using, but dos2unix does not come standard on Ubuntu Jaunty; fixed by installing the tofrodos package. (Actually I did that earlier for your other script.) 2. Right now, if the CSS of the ePub chooses a different font/font size/justification setting, etc., it overrides the settings in default.css; this is perhaps as it should be, but a setting that would make default.css override these would be great. (This would be tougher to code, and perhaps dangerous in certain circumstances, depending its aggression level...) 3. Defaulting to a 9.9pt font seems a little small... Some things that would be nice:
Last edited by frabjous; 09-14-2009 at 12:53 PM. |
![]() |
![]() |
![]() |
#15 | |||||
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,570
Karma: 20150435
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
![]() Quote:
Quote:
![]() Quote:
![]() Quote:
![]() For the moment, let's see if the introduction of this <meta name="pdf-style"> has any acceptance... |
|||||
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Creating XML book listing with Calibre | JTAL604622 | Library Management | 5 | 06-01-2010 02:57 PM |
Question about creating PDFs (resolved - my error, d'oh) | Prince Hal | 19 | 03-02-2010 11:30 PM | |
Software for creating image-based PDFs | 301verbs | Workshop | 2 | 06-13-2009 12:51 PM |
Mobile reader being able to display A4 pdfs | Mononofu | Which one should I buy? | 10 | 01-17-2009 07:22 AM |
Creating media.xml manually | pepak | Sony Reader | 5 | 11-28-2008 10:26 AM |