Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 10-27-2008, 12:48 PM   #1
AZed
Connoisseur
AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.
 
Posts: 57
Karma: 307
Join Date: Oct 2008
Device: PalmOS PDA
EBook-Tools on CPAN

It might be of interest to those here that I have just uploaded the first official release of EBook-Tools to CPAN. It includes a command-line tool ("ebook") for unpacking, modifying, and repacking e-books, and documented Perl modules containing object classes where all of the functionality resides, so it should be fairly simple to expand.

The CPAN entry for it is at:

http://search.cpan.org/dist/EBook-Tools/

The bug-tracking system for it is at:

http://rt.cpan.org/Ticket/Create.html?Queue=EBook-Tools


Latest version is 0.4.4, released 2009.04.01 (no joke!):

Code:
0.4.4

 Bug Fixes:
  * split_metadata now writes split components into the directory where
    the source file is located instead of the current working
    directory.  The old behaviour could cause failure when running as
    CGI.


0.4.3

 New Features:
  * gen_opf() now accepts a 'mediatype' argument to override
    autodetection of the mime type of the 'textfile' argument.

 Bug Fixes:
  * The opffile argument in gen_opf() was not being set correctly
  * unpack_ereader now forces the appropriate mime type instead of
    letting it be autodetected.  Fixes incorrect setting of text/plain
    on HTML output on Windows systems.

0.4.1 - 0.4.2: minor bugfixes only

0.4.0

New Features:
 * IMP support!
   * It is now possible to unpack unencrypted IMP files both into .RES
     directories and into HTML files.  Encrypted IMP files can still
     be unpacked into .RES directories.
   * .RES directories can be repacked into IMP files.
   * IMP metadata can be edited in-place
   * LZSS compression and decompression is now available as a general
     library component, though this may be split out into a separate
     module in the future.
   * Thanks go to Nick Rapallo for assistance with this feature set,
     and Jeffrey Kraus-yao for most of the original
     reverse-engineering work.

Bug Fixes:
 * Mobipocket files with EXTH headers but no EXTH records now unpack
   correctly.

Library and Syntax Changes:
 * Some of the input and output options in the 'ebook' command-line
   tool have been standardized to '--input' or '-i' and '--output' or
   '-o'.  Check the documentation for exact syntax.
 * EBook::Tools::Unpack::usedir() has been moved into EBook::Tools as
   a procedure, not a method.
 * The known uid check in EBook::Tools::search_knownuids() has been
   factored out into the twigelt_is_knownuid() twig search procedure.
   This causes a lot of 'undefined value' warning spew from XML::Twig
   to be bypassed and has the added advantage of removing a loop

   It does, however, slightly change the search behaviour --
   previously, the highest priority known UID in the array was
   selected if multiple known UID identifiers were found.  Now, the
   first dc:identifier matching any known good UID is used instead.
   It's possible to reclaim the old behaviour by sorting the returned
   array, but on afterthought, it is probably better to let the user
   file order determine the package id by default.

Although there's some overlap with Calibre and MobiPerl (and will probably be more as development continues) I tried to focus on the things that those projects didn't do or didn't do optimally yet. Its most major feature is that it cleans up and standardizes the most hideously mangled OPF data I could find and neatly lines it up into either the OEB 1.2 or OPF 2.0 standard formats (and can easily convert between the two). Also of particular note to Mobipocket users is that the unpacking tool retains substantially more metadata in the unpacked OPF file than either Calibre or MobiPerl (at least as of the last time I checked them). Unpacking a (non-DRM) Mobipocket file, adding a new HTML file to it containing links to other HTML files, and then packing the entire thing up as an EPub is as simple as:

Code:
ebook unpack MyBook.prc
cd MyBook
ebook add newfile.html
ebook fix
ebook genepub
Current downsides: NCX generation is really simplistic, the license is GPL2 (so code can't be shared between it and the GPL3 MobiPerl project), and there are probably bugs I don't know about.

Please feel free to bang on it and send me bug reports or feature requests either here, at the CPAN RT system, or by e-mail.

I hope this is useful to you.

Last edited by AZed; 04-01-2009 at 08:33 PM. Reason: v0.4.4 release
AZed is offline   Reply With Quote
Old 10-27-2008, 01:03 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,447
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
what metadata is missing from the opf file generated by mobi2oeb?
kovidgoyal is offline   Reply With Quote
Old 10-27-2008, 06:21 PM   #3
AZed
Connoisseur
AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.
 
Posts: 57
Karma: 307
Join Date: Oct 2008
Device: PalmOS PDA
Quote:
Originally Posted by kovidgoyal View Post
what metadata is missing from the opf file generated by mobi2oeb?
As of 4.100 still pretty much all of the Mobipocket-specific elements, plus contributor, publication date, and type, it looks like. I ran a quick diff on the output from one of my unit tests with the output from mobi2oeb and came up with the elements in my output missing from mobi2oeb completely:

Code:
      <dc:Contributor>Me Myself</dc:Contributor>
      <dc:Date event="publication">2008-10</dc:Date>
      <dc:Type>Testing Type</dc:Type>
      <Adult>yes</Adult>
      <DictionaryInLanguage>de-at</DictionaryInLanguage>
      <DictionaryOutLanguage>es-ar</DictionaryOutLanguage>
      <Review>&lt;P>Review line 1 — &lt;EM>emphasized&lt;/EM>&lt;/P> &lt;P>Review line 2 — &lt;STRONG>a bold move&lt;/STRONG>&lt;/P></Review>
      <SRP Currency="USD">1.23</SRP>
Also, your <dc:subject> entries were missing the BASICCode attribute and your <dc:language> entry was using a bare word and not an IANA language code.

On the plus side, you're generating a NCX file from the filepos information, which I'm not, yet. Mine is just an ultra-primitive one pointing to the spine elements. Also, mobi2oeb supports HuffDic and I don't, yet.

Update: Also, I just caught a bug in my testing procedure that also caught one of yours. When you put the ISBN into a dc:identifier element, you strip out hyphens. Mine doesn't, but it's got a different bug: it seems to be creating dc:identifier twice, and stupid me didn't even check the ISBN in my unit tests. I only just now caught it by eye, so it'll have to wait for the next version. :/

Last edited by AZed; 10-27-2008 at 06:28 PM. Reason: ISBN addendum
AZed is offline   Reply With Quote
Old 10-27-2008, 06:33 PM   #4
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,021
Karma: 3896796
Join Date: Oct 2007
Location: Link÷png, Sweden
Device: Nexus 7, Nexus 5, iPad 2, Kindle PW
Quote:
Originally Posted by AZed View Post
Current downsides: NCX generation is really simplistic, the license is GPL2 (so code can't be shared between it and the GPL3 MobiPerl project),
Why did you choose GPL2?

I choose GPL3 for MobiPerl because I think that the "TIVO clauses" are important. But I had not thought about possible incompatibilities between GPL2...
tompe is offline   Reply With Quote
Old 10-27-2008, 07:06 PM   #5
AZed
Connoisseur
AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.
 
Posts: 57
Karma: 307
Join Date: Oct 2008
Device: PalmOS PDA
Quote:
Originally Posted by tompe View Post
Why did you choose GPL2?

I choose GPL3 for MobiPerl because I think that the "TIVO clauses" are important. But I had not thought about possible incompatibilities between GPL2...
Actually, the 'TIVO Clause' is one of my problems with it, along with the 'Hans Reiser Clause' and pretty much the entirety of Section 7.

My needs on a Free Software license are pretty simple: if someone else's project uses code from my project, I want to be able to use code from their project -- and just the code I choose, not additional code, legal stipulations, project name changes, or other restrictions. In addition, if someone is willing to send me all improvements (or even 'features' I don't like, as long as I get to choose) they make to my project (that I can then use anywhere on any hardware), I want them to be able to use my code on any hardware of their choosing -- including special-purpose hardware not modifiable by anyone but them.

GPL 3 breaks both requirements, and worse, breaks them in a completely unpredictable manner, because there is no way to tell in advance when a project might suddenly add a Section 7 clause with no obvious change to what license they are using. This is a damn shame, because I do like the patent handling improvements (though I know others that find those controversial as well).
AZed is offline   Reply With Quote
Old 10-27-2008, 07:42 PM   #6
tompe
Grand Sorcerer
tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.tompe ought to be getting tired of karma fortunes by now.
 
Posts: 7,021
Karma: 3896796
Join Date: Oct 2007
Location: Link÷png, Sweden
Device: Nexus 7, Nexus 5, iPad 2, Kindle PW
Well, I really think that if somebody use my code and put it in an E-book reader then they should have to provide the tools for installing modified code. When I read GPL3 I thought it was much better then GPL2 since it covered the cases that GPL2 was intended to cover but did not do formally.

Now it seems improbable that somebody would want to put MobiPerl code inside a device so I will think about this. I saw licensing formulated as "GPL2 or later" and that might be an alternative for me...
tompe is offline   Reply With Quote
Old 10-27-2008, 08:30 PM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,447
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by AZed View Post
As of 4.100 still pretty much all of the Mobipocket-specific elements, plus contributor, publication date, and type, it looks like. I ran a quick diff on the output from one of my unit tests with the output from mobi2oeb and came up with the elements in my output missing from mobi2oeb completely:

Code:
      <dc:Contributor>Me Myself</dc:Contributor>
      <dc:Date event="publication">2008-10</dc:Date>
      <dc:Type>Testing Type</dc:Type>
      <Adult>yes</Adult>
      <DictionaryInLanguage>de-at</DictionaryInLanguage>
      <DictionaryOutLanguage>es-ar</DictionaryOutLanguage>
      <Review>&lt;P>Review line 1 Ś &lt;EM>emphasized&lt;/EM>&lt;/P> &lt;P>Review line 2 Ś &lt;STRONG>a bold move&lt;/STRONG>&lt;/P></Review>
      <SRP Currency="USD">1.23</SRP>
Also, your <dc:subject> entries were missing the BASICCode attribute and your <dc:language> entry was using a bare word and not an IANA language code.

On the plus side, you're generating a NCX file from the filepos information, which I'm not, yet. Mine is just an ultra-primitive one pointing to the spine elements. Also, mobi2oeb supports HuffDic and I don't, yet.

Update: Also, I just caught a bug in my testing procedure that also caught one of yours. When you put the ISBN into a dc:identifier element, you strip out hyphens. Mine doesn't, but it's got a different bug: it seems to be creating dc:identifier twice, and stupid me didn't even check the ISBN in my unit tests. I only just now caught it by eye, so it'll have to wait for the next version. :/
Ah yes the mobipocket specific tags. I dont intend mobi2oeb to be part of a mobi2mobi toolchain, which is why those elements are not included. As far as I know, hyphens are not required to parse ISBN numbers.
kovidgoyal is offline   Reply With Quote
Old 10-27-2008, 08:49 PM   #8
AZed
Connoisseur
AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.
 
Posts: 57
Karma: 307
Join Date: Oct 2008
Device: PalmOS PDA
Quote:
Originally Posted by tompe View Post
Well, I really think that if somebody use my code and put it in an E-book reader then they should have to provide the tools for installing modified code. When I read GPL3 I thought it was much better then GPL2 since it covered the cases that GPL2 was intended to cover but did not do formally.

Now it seems improbable that somebody would want to put MobiPerl code inside a device so I will think about this. I saw licensing formulated as "GPL2 or later" and that might be an alternative for me...
This is unfortunately an ugly and complex issue, and doesn't really have anything to do with EBook-Tools specifically, so I've created another thread in the News and Commentary section to answer you:

http://www.mobileread.com/forums/showthread.php?t=31161
AZed is offline   Reply With Quote
Old 10-27-2008, 08:55 PM   #9
AZed
Connoisseur
AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.
 
Posts: 57
Karma: 307
Join Date: Oct 2008
Device: PalmOS PDA
Quote:
Originally Posted by kovidgoyal View Post
Ah yes the mobipocket specific tags. I dont intend mobi2oeb to be part of a mobi2mobi toolchain, which is why those elements are not included. As far as I know, hyphens are not required to parse ISBN numbers.
*nod* My project intentions are to create a general-purpose e-book manipulation library, so I'm trying to retain as much data as possible exactly in the original form it was specified except where it multiple established standards come into conflict (e.g. my dates get converted from Mobipocket MM/DD/YYYY to W3C standard YYYY[-MM-DD] as specified by IDPF).
AZed is offline   Reply With Quote
Old 10-27-2008, 09:05 PM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,447
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by AZed View Post
*nod* My project intentions are to create a general-purpose e-book manipulation library, so I'm trying to retain as much data as possible exactly in the original form it was specified except where it multiple established standards come into conflict (e.g. my dates get converted from Mobipocket MM/DD/YYYY to W3C standard YYYY[-MM-DD] as specified by IDPF).
May I ask why you feel the need to write another general purpose ebook manipulation library? In any case, I think there are already two projects using the name ebook-tools, you should probably pick a more distinctive name for you're project.
kovidgoyal is offline   Reply With Quote
Old 10-27-2008, 11:09 PM   #11
AZed
Connoisseur
AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.
 
Posts: 57
Karma: 307
Join Date: Oct 2008
Device: PalmOS PDA
Quote:
Originally Posted by kovidgoyal View Post
May I ask why you feel the need to write another general purpose ebook manipulation library? In any case, I think there are already two projects using the name ebook-tools, you should probably pick a more distinctive name for you're project.
It's a bit late to change, unfortunately. I RFC'd the namespace EBook::Tools to modules@perl.org and Perlmonks quite some time ago and nobody pointed me to any conflicting projects, and the name is now officially registered in the CPAN index. I ended up waiting a month or so to work out out some final issues and functionality before my first main release, and still nobody mentioned anything at those two sites. I suppose I should have RFC'd here, but it didn't occur to me at the time, since I was thinking more in terms of it being a Perl library than application project.

As to why, it's because no other code seems to do what I need, which is to be able to unpack my library into a standard format retaining as much metadata as possible, standardize the metadata such that it is consistently searchable, and repack it. So as long as I had to write my own tools, I figured I might as well make it a general library so that my code could be easily re-used, rather than a set of one-off scripts.

My next release will probably handle eReader unpacking, since that's the next personal itch I have to scratch, and I don't think anyone has done a one-stop eReader to OEB tool yet. After that, probably hooks for external generators.
AZed is offline   Reply With Quote
Old 10-27-2008, 11:17 PM   #12
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,447
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Oh well, the other two are pretty much dead and they weren't perl projects to start with in any case, so hopefully there wont be naming conflicts. Why are you standardizing on oeb rather than epub? The reason I chose oeb was because the tools are meant to be chained with converters to some output format.
kovidgoyal is offline   Reply With Quote
Old 10-28-2008, 01:41 AM   #13
AZed
Connoisseur
AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.
 
Posts: 57
Karma: 307
Join Date: Oct 2008
Device: PalmOS PDA
Quote:
Originally Posted by kovidgoyal View Post
Oh well, the other two are pretty much dead and they weren't perl projects to start with in any case, so hopefully there wont be naming conflicts. Why are you standardizing on oeb rather than epub? The reason I chose oeb was because the tools are meant to be chained with converters to some output format.
I'm not quite sure what you're asking. I'm not standardizing on OEB for the metadata as much as I am on IDPF OPF with all known extensions, but the default is the OEB 1.2 OPF because that's what the Mobipocket Creator tool still demands. When mobigen deals with OPF 2.0, the default will change to that. Both are handled by the module and it's trivial to switch from one format to the other already. Does that answer your question? EPub is just a way of wrapping a container around the data (and EPub generation is, in fact, supported already).

I suppose another way of looking at it is that what I'm providing is an entire toolkit chain, from unpacking to modification to generation, so there has to be an OPF in the middle to edit somewhere in any case, which is effectively an OEB directory. If there's call for it, it wouldn't be too hard to add another command to the command-line tool for a single-line conversion from one format to another, with the unpacking and repacking happening silently in the background, but since I store all of my e-books in unpacked form for easy editing and examination, it was natural for me to think of it in terms of unpacking, then modification, then repacking.
AZed is offline   Reply With Quote
Old 10-28-2008, 01:53 AM   #14
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,447
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The reason I'm asking is that storing ebooks in unpacked form is good if you tend to edit them a lot. However, most users dont do that, so from a user perspective its better to store them in EPUB so that you have a one file per book storage paradigm.

And if you're expecting MOBI to evolve, I suspect you will be disappointed. It looks like MOBI is too open for Amazon.
kovidgoyal is offline   Reply With Quote
Old 10-28-2008, 02:10 AM   #15
AZed
Connoisseur
AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.AZed has a complete set of Star Wars action figures.
 
Posts: 57
Karma: 307
Join Date: Oct 2008
Device: PalmOS PDA
Quote:
Originally Posted by kovidgoyal View Post
The reason I'm asking is that storing ebooks in unpacked form is good if you tend to edit them a lot. However, most users dont do that, so from a user perspective its better to store them in EPUB so that you have a one file per book storage paradigm.

And if you're expecting MOBI to evolve, I suspect you will be disappointed. It looks like MOBI is too open for Amazon.
Hm, I personally use a folder tree storage paradigm, but I can certainly add one-pass conversion to the next release. I'll add it to the TODO list.

As to Mobipocket, I received a certain amount of encouragement on their forum when I mentioned a related issue to them.
AZed is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
eBook Tools software available for download Icarusbop Workshop 11 02-26-2011 08:52 AM
Section for non major ebook tools Zorz Feedback 0 01-22-2010 02:09 AM
Useful tools for the iLiad LittleTalker iRex 24 05-15-2006 08:28 AM
Gmail tools ignatz Lounge 8 06-29-2004 11:48 AM


All times are GMT -4. The time now is 05:57 PM.


MobileRead.com is a privately owned, operated and funded community.