It might be of interest to those here that I have just uploaded the first official release of EBook-Tools to CPAN. It includes a command-line tool ("ebook") for unpacking, modifying, and repacking e-books, and documented Perl modules containing object classes where all of the functionality resides, so it should be fairly simple to expand.
The CPAN entry for it is at:
The bug-tracking system for it is at:
Latest version is 0.4.4, released 2009.04.01 (no joke!):
* split_metadata now writes split components into the directory where
the source file is located instead of the current working
directory. The old behaviour could cause failure when running as
* gen_opf() now accepts a 'mediatype' argument to override
autodetection of the mime type of the 'textfile' argument.
* The opffile argument in gen_opf() was not being set correctly
* unpack_ereader now forces the appropriate mime type instead of
letting it be autodetected. Fixes incorrect setting of text/plain
on HTML output on Windows systems.
0.4.1 - 0.4.2: minor bugfixes only
* IMP support!
* It is now possible to unpack unencrypted IMP files both into .RES
directories and into HTML files. Encrypted IMP files can still
be unpacked into .RES directories.
* .RES directories can be repacked into IMP files.
* IMP metadata can be edited in-place
* LZSS compression and decompression is now available as a general
library component, though this may be split out into a separate
module in the future.
* Thanks go to Nick Rapallo for assistance with this feature set,
and Jeffrey Kraus-yao for most of the original
* Mobipocket files with EXTH headers but no EXTH records now unpack
Library and Syntax Changes:
* Some of the input and output options in the 'ebook' command-line
tool have been standardized to '--input' or '-i' and '--output' or
'-o'. Check the documentation for exact syntax.
* EBook::Tools::Unpack::usedir() has been moved into EBook::Tools as
a procedure, not a method.
* The known uid check in EBook::Tools::search_knownuids() has been
factored out into the twigelt_is_knownuid() twig search procedure.
This causes a lot of 'undefined value' warning spew from XML::Twig
to be bypassed and has the added advantage of removing a loop
It does, however, slightly change the search behaviour --
previously, the highest priority known UID in the array was
selected if multiple known UID identifiers were found. Now, the
first dc:identifier matching any known good UID is used instead.
It's possible to reclaim the old behaviour by sorting the returned
array, but on afterthought, it is probably better to let the user
file order determine the package id by default.
Although there's some overlap with Calibre and MobiPerl (and will probably be more as development continues) I tried to focus on the things that those projects didn't do or didn't do optimally yet. Its most major feature is that it cleans up and standardizes the most hideously mangled OPF data I could find and neatly lines it up into either the OEB 1.2 or OPF 2.0 standard formats (and can easily convert between the two). Also of particular note to Mobipocket users is that the unpacking tool retains substantially more metadata in the unpacked OPF file than either Calibre or MobiPerl (at least as of the last time I checked them). Unpacking a (non-DRM) Mobipocket file, adding a new HTML file to it containing links to other HTML files, and then packing the entire thing up as an EPub is as simple as:
ebook unpack MyBook.prc
ebook add newfile.html
Current downsides: NCX generation is really simplistic, the license is GPL2 (so code can't be shared between it and the GPL3 MobiPerl project), and there are probably bugs I don't know about.
Please feel free to bang on it and send me bug reports or feature requests either here, at the CPAN RT system, or by e-mail.
I hope this is useful to you.