![]() |
#1 |
Connoisseur
![]() ![]() ![]() ![]() Posts: 57
Karma: 307
Join Date: Oct 2008
Device: PalmOS PDA
|
EBook-Tools on CPAN
It might be of interest to those here that I have just uploaded the first official release of EBook-Tools to CPAN. It includes a command-line tool ("ebook") for unpacking, modifying, and repacking e-books, and documented Perl modules containing object classes where all of the functionality resides, so it should be fairly simple to expand.
The CPAN entry for it is at: http://search.cpan.org/dist/EBook-Tools/ The bug-tracking system for it is at: http://rt.cpan.org/Ticket/Create.html?Queue=EBook-Tools Latest version is 0.4.4, released 2009.04.01 (no joke!): Code:
0.4.4 Bug Fixes: * split_metadata now writes split components into the directory where the source file is located instead of the current working directory. The old behaviour could cause failure when running as CGI. 0.4.3 New Features: * gen_opf() now accepts a 'mediatype' argument to override autodetection of the mime type of the 'textfile' argument. Bug Fixes: * The opffile argument in gen_opf() was not being set correctly * unpack_ereader now forces the appropriate mime type instead of letting it be autodetected. Fixes incorrect setting of text/plain on HTML output on Windows systems. 0.4.1 - 0.4.2: minor bugfixes only 0.4.0 New Features: * IMP support! * It is now possible to unpack unencrypted IMP files both into .RES directories and into HTML files. Encrypted IMP files can still be unpacked into .RES directories. * .RES directories can be repacked into IMP files. * IMP metadata can be edited in-place * LZSS compression and decompression is now available as a general library component, though this may be split out into a separate module in the future. * Thanks go to Nick Rapallo for assistance with this feature set, and Jeffrey Kraus-yao for most of the original reverse-engineering work. Bug Fixes: * Mobipocket files with EXTH headers but no EXTH records now unpack correctly. Library and Syntax Changes: * Some of the input and output options in the 'ebook' command-line tool have been standardized to '--input' or '-i' and '--output' or '-o'. Check the documentation for exact syntax. * EBook::Tools::Unpack::usedir() has been moved into EBook::Tools as a procedure, not a method. * The known uid check in EBook::Tools::search_knownuids() has been factored out into the twigelt_is_knownuid() twig search procedure. This causes a lot of 'undefined value' warning spew from XML::Twig to be bypassed and has the added advantage of removing a loop It does, however, slightly change the search behaviour -- previously, the highest priority known UID in the array was selected if multiple known UID identifiers were found. Now, the first dc:identifier matching any known good UID is used instead. It's possible to reclaim the old behaviour by sorting the returned array, but on afterthought, it is probably better to let the user file order determine the package id by default. Although there's some overlap with Calibre and MobiPerl (and will probably be more as development continues) I tried to focus on the things that those projects didn't do or didn't do optimally yet. Its most major feature is that it cleans up and standardizes the most hideously mangled OPF data I could find and neatly lines it up into either the OEB 1.2 or OPF 2.0 standard formats (and can easily convert between the two). Also of particular note to Mobipocket users is that the unpacking tool retains substantially more metadata in the unpacked OPF file than either Calibre or MobiPerl (at least as of the last time I checked them). Unpacking a (non-DRM) Mobipocket file, adding a new HTML file to it containing links to other HTML files, and then packing the entire thing up as an EPub is as simple as: Code:
ebook unpack MyBook.prc cd MyBook ebook add newfile.html ebook fix ebook genepub Please feel free to bang on it and send me bug reports or feature requests either here, at the CPAN RT system, or by e-mail. I hope this is useful to you. Last edited by AZed; 04-01-2009 at 08:33 PM. Reason: v0.4.4 release |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,144
Karma: 27110892
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
what metadata is missing from the opf file generated by mobi2oeb?
|
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Connoisseur
![]() ![]() ![]() ![]() Posts: 57
Karma: 307
Join Date: Oct 2008
Device: PalmOS PDA
|
Quote:
Code:
<dc:Contributor>Me Myself</dc:Contributor> <dc:Date event="publication">2008-10</dc:Date> <dc:Type>Testing Type</dc:Type> <Adult>yes</Adult> <DictionaryInLanguage>de-at</DictionaryInLanguage> <DictionaryOutLanguage>es-ar</DictionaryOutLanguage> <Review><P>Review line 1 — <EM>emphasized</EM></P> <P>Review line 2 — <STRONG>a bold move</STRONG></P></Review> <SRP Currency="USD">1.23</SRP> On the plus side, you're generating a NCX file from the filepos information, which I'm not, yet. Mine is just an ultra-primitive one pointing to the spine elements. Also, mobi2oeb supports HuffDic and I don't, yet. Update: Also, I just caught a bug in my testing procedure that also caught one of yours. When you put the ISBN into a dc:identifier element, you strip out hyphens. Mine doesn't, but it's got a different bug: it seems to be creating dc:identifier twice, and stupid me didn't even check the ISBN in my unit tests. I only just now caught it by eye, so it'll have to wait for the next version. :/ Last edited by AZed; 10-27-2008 at 06:28 PM. Reason: ISBN addendum |
|
![]() |
![]() |
![]() |
#4 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
Quote:
I choose GPL3 for MobiPerl because I think that the "TIVO clauses" are important. But I had not thought about possible incompatibilities between GPL2... |
|
![]() |
![]() |
![]() |
#5 | |
Connoisseur
![]() ![]() ![]() ![]() Posts: 57
Karma: 307
Join Date: Oct 2008
Device: PalmOS PDA
|
Quote:
My needs on a Free Software license are pretty simple: if someone else's project uses code from my project, I want to be able to use code from their project -- and just the code I choose, not additional code, legal stipulations, project name changes, or other restrictions. In addition, if someone is willing to send me all improvements (or even 'features' I don't like, as long as I get to choose) they make to my project (that I can then use anywhere on any hardware), I want them to be able to use my code on any hardware of their choosing -- including special-purpose hardware not modifiable by anyone but them. GPL 3 breaks both requirements, and worse, breaks them in a completely unpredictable manner, because there is no way to tell in advance when a project might suddenly add a Section 7 clause with no obvious change to what license they are using. This is a damn shame, because I do like the patent handling improvements (though I know others that find those controversial as well). |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,452
Karma: 7185064
Join Date: Oct 2007
Location: Linköpng, Sweden
Device: Kindle Voyage, Nexus 5, Kindle PW
|
Well, I really think that if somebody use my code and put it in an E-book reader then they should have to provide the tools for installing modified code. When I read GPL3 I thought it was much better then GPL2 since it covered the cases that GPL2 was intended to cover but did not do formally.
Now it seems improbable that somebody would want to put MobiPerl code inside a device so I will think about this. I saw licensing formulated as "GPL2 or later" and that might be an alternative for me... |
![]() |
![]() |
![]() |
#7 | |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,144
Karma: 27110892
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Quote:
|
|
![]() |
![]() |
![]() |
#8 | |
Connoisseur
![]() ![]() ![]() ![]() Posts: 57
Karma: 307
Join Date: Oct 2008
Device: PalmOS PDA
|
Quote:
https://www.mobileread.com/forums/showthread.php?t=31161 |
|
![]() |
![]() |
![]() |
#9 |
Connoisseur
![]() ![]() ![]() ![]() Posts: 57
Karma: 307
Join Date: Oct 2008
Device: PalmOS PDA
|
*nod* My project intentions are to create a general-purpose e-book manipulation library, so I'm trying to retain as much data as possible exactly in the original form it was specified except where it multiple established standards come into conflict (e.g. my dates get converted from Mobipocket MM/DD/YYYY to W3C standard YYYY[-MM-DD] as specified by IDPF).
|
![]() |
![]() |
![]() |
#10 | |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,144
Karma: 27110892
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Quote:
|
|
![]() |
![]() |
![]() |
#11 | |
Connoisseur
![]() ![]() ![]() ![]() Posts: 57
Karma: 307
Join Date: Oct 2008
Device: PalmOS PDA
|
Quote:
![]() As to why, it's because no other code seems to do what I need, which is to be able to unpack my library into a standard format retaining as much metadata as possible, standardize the metadata such that it is consistently searchable, and repack it. So as long as I had to write my own tools, I figured I might as well make it a general library so that my code could be easily re-used, rather than a set of one-off scripts. My next release will probably handle eReader unpacking, since that's the next personal itch I have to scratch, and I don't think anyone has done a one-stop eReader to OEB tool yet. After that, probably hooks for external generators. |
|
![]() |
![]() |
![]() |
#12 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,144
Karma: 27110892
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Oh well, the other two are pretty much dead and they weren't perl projects to start with in any case, so hopefully there wont be naming conflicts. Why are you standardizing on oeb rather than epub? The reason I chose oeb was because the tools are meant to be chained with converters to some output format.
|
![]() |
![]() |
![]() |
#13 | |
Connoisseur
![]() ![]() ![]() ![]() Posts: 57
Karma: 307
Join Date: Oct 2008
Device: PalmOS PDA
|
Quote:
I suppose another way of looking at it is that what I'm providing is an entire toolkit chain, from unpacking to modification to generation, so there has to be an OPF in the middle to edit somewhere in any case, which is effectively an OEB directory. If there's call for it, it wouldn't be too hard to add another command to the command-line tool for a single-line conversion from one format to another, with the unpacking and repacking happening silently in the background, but since I store all of my e-books in unpacked form for easy editing and examination, it was natural for me to think of it in terms of unpacking, then modification, then repacking. |
|
![]() |
![]() |
![]() |
#14 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,144
Karma: 27110892
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The reason I'm asking is that storing ebooks in unpacked form is good if you tend to edit them a lot. However, most users dont do that, so from a user perspective its better to store them in EPUB so that you have a one file per book storage paradigm.
And if you're expecting MOBI to evolve, I suspect you will be disappointed. It looks like MOBI is too open for Amazon. |
![]() |
![]() |
![]() |
#15 | |
Connoisseur
![]() ![]() ![]() ![]() Posts: 57
Karma: 307
Join Date: Oct 2008
Device: PalmOS PDA
|
Quote:
As to Mobipocket, I received a certain amount of encouragement on their forum when I mentioned a related issue to them. |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
eBook Tools software available for download | Icarusbop | Workshop | 11 | 02-26-2011 08:52 AM |
Section for non major ebook tools | Zorz | Feedback | 0 | 01-22-2010 02:09 AM |
Useful tools for the iLiad | LittleTalker | iRex | 24 | 05-15-2006 08:28 AM |
Gmail tools | ignatz | Lounge | 8 | 06-29-2004 11:48 AM |