View Single Post
Old 10-27-2008, 12:48 PM   #1
AZed
Connoisseur
AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.
 
Posts: 57
Karma: 307
Join Date: Oct 2008
Device: PalmOS PDA
EBook-Tools on CPAN

It might be of interest to those here that I have just uploaded the first official release of EBook-Tools to CPAN. It includes a command-line tool ("ebook") for unpacking, modifying, and repacking e-books, and documented Perl modules containing object classes where all of the functionality resides, so it should be fairly simple to expand.

The CPAN entry for it is at:

http://search.cpan.org/dist/EBook-Tools/

The bug-tracking system for it is at:

http://rt.cpan.org/Ticket/Create.html?Queue=EBook-Tools


Latest version is 0.4.4, released 2009.04.01 (no joke!):

Code:
0.4.4

 Bug Fixes:
  * split_metadata now writes split components into the directory where
    the source file is located instead of the current working
    directory.  The old behaviour could cause failure when running as
    CGI.


0.4.3

 New Features:
  * gen_opf() now accepts a 'mediatype' argument to override
    autodetection of the mime type of the 'textfile' argument.

 Bug Fixes:
  * The opffile argument in gen_opf() was not being set correctly
  * unpack_ereader now forces the appropriate mime type instead of
    letting it be autodetected.  Fixes incorrect setting of text/plain
    on HTML output on Windows systems.

0.4.1 - 0.4.2: minor bugfixes only

0.4.0

New Features:
 * IMP support!
   * It is now possible to unpack unencrypted IMP files both into .RES
     directories and into HTML files.  Encrypted IMP files can still
     be unpacked into .RES directories.
   * .RES directories can be repacked into IMP files.
   * IMP metadata can be edited in-place
   * LZSS compression and decompression is now available as a general
     library component, though this may be split out into a separate
     module in the future.
   * Thanks go to Nick Rapallo for assistance with this feature set,
     and Jeffrey Kraus-yao for most of the original
     reverse-engineering work.

Bug Fixes:
 * Mobipocket files with EXTH headers but no EXTH records now unpack
   correctly.

Library and Syntax Changes:
 * Some of the input and output options in the 'ebook' command-line
   tool have been standardized to '--input' or '-i' and '--output' or
   '-o'.  Check the documentation for exact syntax.
 * EBook::Tools::Unpack::usedir() has been moved into EBook::Tools as
   a procedure, not a method.
 * The known uid check in EBook::Tools::search_knownuids() has been
   factored out into the twigelt_is_knownuid() twig search procedure.
   This causes a lot of 'undefined value' warning spew from XML::Twig
   to be bypassed and has the added advantage of removing a loop

   It does, however, slightly change the search behaviour --
   previously, the highest priority known UID in the array was
   selected if multiple known UID identifiers were found.  Now, the
   first dc:identifier matching any known good UID is used instead.
   It's possible to reclaim the old behaviour by sorting the returned
   array, but on afterthought, it is probably better to let the user
   file order determine the package id by default.

Although there's some overlap with Calibre and MobiPerl (and will probably be more as development continues) I tried to focus on the things that those projects didn't do or didn't do optimally yet. Its most major feature is that it cleans up and standardizes the most hideously mangled OPF data I could find and neatly lines it up into either the OEB 1.2 or OPF 2.0 standard formats (and can easily convert between the two). Also of particular note to Mobipocket users is that the unpacking tool retains substantially more metadata in the unpacked OPF file than either Calibre or MobiPerl (at least as of the last time I checked them). Unpacking a (non-DRM) Mobipocket file, adding a new HTML file to it containing links to other HTML files, and then packing the entire thing up as an EPub is as simple as:

Code:
ebook unpack MyBook.prc
cd MyBook
ebook add newfile.html
ebook fix
ebook genepub
Current downsides: NCX generation is really simplistic, the license is GPL2 (so code can't be shared between it and the GPL3 MobiPerl project), and there are probably bugs I don't know about.

Please feel free to bang on it and send me bug reports or feature requests either here, at the CPAN RT system, or by e-mail.

I hope this is useful to you.

Last edited by AZed; 04-01-2009 at 08:33 PM. Reason: v0.4.4 release
AZed is offline   Reply With Quote