View Full Version : on-the-fly epub creation


ilovejedd
04-14-2009, 03:59 PM
Need some (major) help... :help:

I have a modified PHP script (based on FLAG (http://www.mobileread.com/forums/showthread.php?t=26055)) that dynamically creates a Stanza catalog for my favorite FanFiction.Net categories. It basically allows me to browse FanFiction.Net in real-time and creates epub-format ebooks on the fly using Calibre for reading in Stanza iPhone. The script currently runs on my Windows PC running XAMPP.

I have a shared Linux hosting account on 1and1 and I wish to upload the script there. My current dilemma is the epub creation part. I'm currently researching what methods I can use to create epub files using utilities/etc already installed or are user-installable on the shared account. The Linux host has Perl, Python and PHP installed and the operating system is CentOS, iirc.

Options I'm considering:

Install Calibre on Linux Host
Pros:

no changes to the PHP script required

Cons:

no idea how to do this or if it's even feasible

BookGlutton API
Pros:

seems like this might be the easiest to implement

Cons:

dependent on another website
don't know how I'm supposed to handle the post request :sweatdrop

DocBook+XSLT
Pros:

seems like the dependencies should already be installed or are user-installable (no admin rights required)

Cons:

don't know a thing about docbook :o
don't know a thing about xslt :o

Code my own PHP script to create epubs
Pros:

highly customizable

Cons:

I might be able to finish this in 2 years if I'm lucky :smack:


Right now, I'm thinking using the BookGlutton API might be the best option for me (unless, of course, it's possible to install Calibre or at least html2epub on a shared host). I'm just not sure how I'm supposed to handle the post requests via PHP. Currently, I have an epub.php script that calls html2epub and returns the epub file. I guess I could modify this to send a post request to BookGlutton instead. I just don't know how, particularly the part where you upload the html file.

Anyway, not really looking for a discussion on the merits of the different methods. Just asking for help on the how. If you know of another way to do this (preferably something even an inexperienced coder can do), please post it here.

Thanks!

kovidgoyal
04-14-2009, 04:09 PM
the calibre binary installer is (almost) fully self contained, so you should be able to install it on a shared host.

ilovejedd
04-14-2009, 04:59 PM
Thanks! That's good to know.

I don't have secure shell/terminal access to the shared host. Can I just extract the tarball on my home computer and upload via ftp? What does calibre_postinstall do? The binary installer seems to call it at the end. Is it necessary to run it?

Again, thank you very, very much! :)

kovidgoyal
04-14-2009, 05:01 PM
No you should be able to run it without running postinstall (postinstall just sets up integration with the host OS which you don't need if all you want to do is conversions). I don't know if the FTP will preserve execute permissions on the files in the tarball though

DigitalFeonix
04-14-2009, 09:35 PM
Need some (major) help... :help:

I have a modified PHP script (based on FLAG (http://www.mobileread.com/forums/showthread.php?t=26055)) that dynamically creates a Stanza catalog for my favorite FanFiction.Net categories. It basically allows me to browse FanFiction.Net in real-time and creates epub-format ebooks on the fly using Calibre for reading in Stanza iPhone. The script currently runs on my Windows PC running XAMPP.

I have a shared Linux hosting account on 1and1 and I wish to upload the script there. My current dilemma is the epub creation part. I'm currently researching what methods I can use to create epub files using utilities/etc already installed or are user-installable on the shared account. The Linux host has Perl, Python and PHP installed and the operating system is CentOS, iirc.

Options I'm considering:

Install Calibre on Linux Host
Pros:

no changes to the PHP script required

Cons:

no idea how to do this or if it's even feasible

BookGlutton API
Pros:

seems like this might be the easiest to implement

Cons:

dependent on another website
don't know how I'm supposed to handle the post request :sweatdrop

DocBook+XSLT
Pros:

seems like the dependencies should already be installed or are user-installable (no admin rights required)

Cons:

don't know a thing about docbook :o
don't know a thing about xslt :o

Code my own PHP script to create epubs
Pros:

highly customizable

Cons:

I might be able to finish this in 2 years if I'm lucky :smack:


Right now, I'm thinking using the BookGlutton API might be the best option for me (unless, of course, it's possible to install Calibre or at least html2epub on a shared host). I'm just not sure how I'm supposed to handle the post requests via PHP. Currently, I have an epub.php script that calls html2epub and returns the epub file. I guess I could modify this to send a post request to BookGlutton instead. I just don't know how, particularly the part where you upload the html file.

Anyway, not really looking for a discussion on the merits of the different methods. Just asking for help on the how. If you know of another way to do this (preferably something even an inexperienced coder can do), please post it here.

Thanks!

I created my own script similar to FLAG some time ago to create .oeb files for my EB-1150 from stories on portkey.org and fanfiction.net. I have updated it to output .epub using a class to zip up the data. If you have customized FLAG, this is pretty easy to integrate as an output method.

Usage as follows:


$tstamp = time(); // timestamp for zip entries
$epub = new ZipCreate();

$prev_encoding = $epub->ztype;
$epub->ztype = 'store';
$epub->add_file('application/epub+zip', 'mimetype', $tstamp);
$epub->ztype = $prev_encoding;

// add container
$epub->add_file($container, 'META-INF/container.xml', $tstamp);

// add opf
$epub->add_file($opf, 'OEBPS/content.opf', $tstamp);

// add toc
$epub->add_file($toc, 'OEBPS/toc.ncx', $tstamp);

// add your xhtml and CSS and pictures and fonts here

// finish it up and download
$output_file = $epub->build_zip();
$output_name = $story['title'] . '.epub';
$output_mime = 'application/epub+zip';

header('Content-Type: application/x-download');
header('Content-Length: '. strlen($output_file));
header('Content-Disposition: attachment; filename="' . $output_name . '"');
header('Content-Transfer-Encoding: binary');

echo $output_file;

ilovejedd
04-14-2009, 10:45 PM
I created my own script similar to FLAG some time ago to create .oeb files for my EB-1150 from stories on portkey.org and fanfiction.net. I have updated it to output .epub using a class to zip up the data. If you have customized FLAG, this is pretty easy to integrate as an output method.
Thanks! That looks pretty cool. If installing Calibre doesn't pan out, I might work with this. However, I didn't see any code for opf and ncx creation and those parts, I'm still trying to figure out how to handle. I tried reading the IDPF spec, but ADHD kicked in before anything could sink in. :o

nrapallo
04-14-2009, 11:26 PM
I created my own script similar to FLAG some time ago to create .oeb files for my EB-1150 from stories on portkey.org and fanfiction.net.

Wow, that would make a great Impserve (http://www.mobileread.com/forums/showthread.php?t=28363) plugin... but in reverse, that is, .epub to .imp and served up to the EBW1150. :chinscratch:

p.s. care to share your .oeb script? Inquirying minds would like to know... :snicker:

DigitalFeonix
04-15-2009, 11:34 AM
Essentially the script takes three arguments; a site, a story id and an output format. I slurp the whole story into an associative array - using htmlpurifier as it's brought in - and output using the desired format.

For epub the .opf is created using this

/************************************************** ****************************
CONTENT.OPF
************************************************** ****************************/

// create info for XML
$manifest = '';
$spine = '';

for ($i=1;$i<=$story['chapter_count'];$i++)
{
$id = sprintf('%03d', $i);

$manifest .= ' <item id="chapter-' . $id . '" href="chapter-' . $id . '.xhtml" media-type="application/xhtml+xml"/>' . "\n";
$spine .= ' <itemref idref="chapter-' . $id . '"/>' . "\n";
}

$story_title = $story['title'];

// add the OPF info
$opf = <<<EOM
<{$qm}xml version="1.0"{$qm}>
<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="bookid" version="2.0">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
<dc:title>{$story_title}</dc:title>
<dc:identifier id="bookid">urn:uuid:{$UID}</dc:identifier>
<dc:language>en</dc:language>
<dc:creator>{$story['author']}</dc:creator>
<dc:publisher>DigitalFeonix</dc:publisher>
<dc:rights>Public Domain</dc:rights>
<dc:subject>FanFiction</dc:subject>
</metadata>
<manifest>
<item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/>
<item id="cover" href="cover.xhtml" media-type="application/xhtml+xml"/>
{$manifest}
<item id="backcover" href="backcover.xhtml" media-type="application/xhtml+xml"/>
</manifest>
<spine toc="ncx">
<itemref idref="cover"/>
{$spine}
<itemref idref="backcover"/>
</spine>
</package>

EOM;

$epub->add_file($opf, 'OEBPS/content.opf', $tstamp);


and the toc is created using
/************************************************** ****************************
TOC.NCX
************************************************** ****************************/

// create info for XML
$navpoint = '';

for ($i=1;$i<=$story['chapter_count'];$i++)
{
$id = sprintf('%03d', $i);
$iplus = $i + 1;

$chapter_title = $story['chapters'][$i]['title'];

$navpoint .= <<<EOM
<navPoint id="navpoint-{$iplus}" playOrder="{$iplus}">
<navLabel>
<text>{$chapter_title}</text>
</navLabel>
<content src="chapter-{$id}.xhtml"/>
</navPoint>

EOM;
}

$iplus = $i + 1;

//
$toc = <<<EOM
<{$qm}xml version="1.0" encoding="UTF-8" {$qm}>
<!DOCTYPE ncx PUBLIC "-//NISO//DTD ncx 2005-1//EN" "http://www.daisy.org/z3986/2005/ncx-2005-1.dtd">
<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1">
<head>
<meta name="dtb:uid" content="{$UID}"/>
<meta name="dtb:depth" content="1"/>
<meta name="dtb:totalPageCount" content="0"/>
<meta name="dtb:maxPageNumber" content="0"/>
</head>
<docTitle>
<text>{$story_title}</text>
</docTitle>
<docAuthor>
<text>{$story['author']}</text>
</docAuthor>
<navMap>
<navPoint id="navpoint-1" playOrder="1">
<navLabel>
<text>Cover</text>
</navLabel>
<content src="cover.xhtml"/>
</navPoint>
{$navpoint}
<navPoint id="navpoint-{$iplus}" playOrder="{$iplus}">
<navLabel>
<text>Backcover</text>
</navLabel>
<content src="backcover.xhtml"/>
</navPoint>
</navMap>
</ncx>

EOM;

$epub->add_file($toc, 'OEBPS/toc.ncx', $tstamp);

The .oeb script takes the same associative array and outputs the flat .oeb file with the mime wrapping, building the .opf part the same way. This .oeb script was intended to create files suitable for upload to the eBookwise personal content server.

nrapallo
04-15-2009, 12:01 PM
Essentially the script takes three arguments; a site, a story id and an output format. I slurp the whole story into an associative array - using htmlpurifier as it's brought in - and output using the desired format.

For epub the .opf is created using this

**snip**

The .oeb script takes the same associative array and outputs the flat .oeb file with the mime wrapping, building the .opf part the same way. This .oeb script was intended to create files suitable for upload to the eBookwise personal content server.

Good to know. I take it that the server side .php code would have to be installed on a personal server. Would this be easy to port to Python i.e. Impserve plugin? ;) :snicker:

I'm working on a Perl script, Epub2IMP.pl, that will convert any .epub to .imp after it is tweaked to accomodate some shortcomings of the ETI eBook Publisher software. It seems that any <img src> with a width=100% stretches the image without regard to the image's aspect ratio. Also, any CSS applied to <div class=>'s doesn't appear to be honoured so must be wrapped within a <p class=> </p> with the same CSS class reference.

To boot, within a .opf, even capiltalized Dublin Core metadata elements i.e. <dc:Title> cause problems. My Perl script will do many text subtitutions to alleviate these issues. Hopefully, ETI will improve their .epub support, especially since they co-authored many of the standards involved. :smack:

ilovejedd
04-15-2009, 01:40 PM
Thanks DigitalFeonix! Those scripts really help a lot. I'm still going to try to get Calibre working, but if it doesn't, I now have fallback #4, except you've done the job for me. :) Haven't been able to test anything, though, since I'm experiencing some weird issues with 1and1 mod_rewrite. The .htaccess file I use for my local XAMPP server doesn't want to work with 1and1 so I'm slowly trying to troubleshoot it. :(

If/when I get this working, I can start on making the covers look spiffy with ImageMagick. :)

@nrapallo
The PHP scripts don't look complicated at all, barring for the ZipCreate class. That, though, I attribute to my lack of knowledge of the zip file structure. Seems like that's the only thing you really need to port to Python. The rest is basically just creating text files.

nrapallo
04-15-2009, 01:54 PM
@nrapallo
The PHP scripts don't look complicated at all, barring for the ZipCreate class. That, though, I attribute to my lack of knowledge of the zip file structure. Seems like that's the only thing you really need to port to Python. The rest is basically just creating text files.

Thanks for the heads up!

I'm still not proficient at coding in Python and really need my hand held... ;)

DigitalFeonix
04-15-2009, 04:45 PM
The zipcreate class was not written by me and is a little over my head. I did work with the author to make sure that epubs that it created worked within ADE and that the native zip utilities on both Mac and PC could open them.

I have looked at python as a possible language to pick up, but my scripts should be easy to port for someone with knowledge in both.

The scripts should be hosted wherever you are going to download or create the epubs from. There is nothing in them that most hosting companies would disallow (ie no exec() calls).

ilovejedd
04-16-2009, 02:15 PM
Tried Calibre last night (still haven't fixed mod_rewrite, though). Looks like some of the commands used in the html2epub shell script requires admin access and we're only given user rights. I even changed permissions to 777 and still no go. Created a test.sh and that worked just fine with shell_exec(). I'll probably try the ZipCreate method tonight or at least attempt to create a wrapper for the whole epub creation process.

Thanks again for all the help!

kovidgoyal
04-16-2009, 02:30 PM
html2epub shouldn't require admin access. What errors do you get when running it?

JSWolf
04-16-2009, 04:19 PM
The zipcreate class was not written by me and is a little over my head. I did work with the author to make sure that epubs that it created worked within ADE and that the native zip utilities on both Mac and PC could open them.

I have looked at python as a possible language to pick up, but my scripts should be easy to port for someone with knowledge in both.

The scripts should be hosted wherever you are going to download or create the epubs from. There is nothing in them that most hosting companies would disallow (ie no exec() calls).
Did you check that the ePub works on a 505 or 700?

DigitalFeonix
04-16-2009, 04:37 PM
Did you check that the ePub works on a 505 or 700?

Since the 505 and 700 use mobile ADE, as long as it works in the desktop version it should work on the devices. (And getting it to work with the Mac OS X native BOMArchiver was required to get ADE on the Mac to open them.)

I used that reasoning until I actually got a 505 as a present and was able to test on an actual device.

It works on a 505, so should also work on a 700. I use my script frequently to create my bedtime reading material that I put on my own 505.

wallcraft
04-16-2009, 04:50 PM
Since the 505 and 700 use mobile ADE, as long as it works in the desktop version it should work on the devices. The Mobile ADE has some extra restrictions, e.g. each XHTML file has to be below a certain size. Calibre has a PRS-505 mode which works around all the PRS-505's mobile ADE issues. I don't know if there is a complete list of these outside the Calibre source code. Note that Sony is using an old version of mobile ADE and some of the issues may be Sony specific (e.g. if Sony supported the soft hypen character in its font, soft hypens would presumably stop showing up as "?"). We won't know for sure until other devices come out with mobile ADE this summer.

ilovejedd
04-16-2009, 05:15 PM
html2epub shouldn't require admin access. What errors do you get when running it?
No idea what the errors are. I tried both:
echo shell_exec('sh html2epub');
echo shell_exec('html2epub');

I didn't get any output.

I tried doing:
echo shell_exec('ls');
echo shell_exec('sh test.sh'); //test.sh contains chdir to calibre directory and ls

Those worked just fine.

It's worth noting that the server is using CentOS, I believe, and on a thread in the Calibre forums, a guy had to install it from source to get it working.

JSWolf
04-16-2009, 06:53 PM
The Mobile ADE has some extra restrictions, e.g. each XHTML file has to be below a certain size. Calibre has a PRS-505 mode which works around all the PRS-505's mobile ADE issues. I don't know if there is a complete list of these outside the Calibre source code. Note that Sony is using an old version of mobile ADE and some of the issues may be Sony specific (e.g. if Sony supported the soft hypen character in its font, soft hypens would presumably stop showing up as "?"). We won't know for sure until other devices come out with mobile ADE this summer.
I believe the limit is 300k. Not sure though if that is 300k compressed or uncompressed.

DigitalFeonix
04-16-2009, 08:36 PM
I was aware of the 300k limit on mobile ADE (and that's uncompressed AFAIK). I do not know of any other show stoppers from the desktop to the mobile versions. That's why I put the "should" in there. Until I had a physical reader, I only had the desktop to go by.

I've always split my books up by chapter anyways, so each chapter has it's own xhtml file. Rarely do those even go past 16k let alone 300k. With no pictures and no embedded fonts, these are barebone epubs.

Every time I've had an epub break on the reader it broke on the desktop too. Usually illegal characters causing problems. Just tweaked the script to cleanse that bit of data too, reran and it was fine.

My only complaint with my flow is that both desktop and mobile version of ADE has that stupid page number along the right side that can't be turned off.