View Full Version : Explode and Implode an ePub?


wallcraft
09-10-2008, 11:17 AM
The ePub container is just a ZIP file with standard compression, except that the file mimetype must be first and must not be compressed.

So, to "explode" an .epub file all you need is any command or utility that extracts files from a ZIP. To "implode" (is there a better term for this?) it back to a .epub isn't quite as easy. Salty-horse provides a Unix command line "implode" procedure here (http://www.mobileread.com/forums/showpost.php?p=250481&postcount=41). Since it is using Info-ZIP (http://www.info-zip.org/), something similar should work for the Windows command line and on a Mac.

Is there a better 1-step, or GUI-based, method to do this?

Note that the contents of an ePub are fairly rigidly defined, and an explode/implode tool does nothing to check the validity of the contents (I guess the implode could check that the mimetype file exists and had the right contents, or make one if there is none). As in salty-horse's example, it would be most useful when something is wrong with an existing DRM-free ePub and you want to fix it.

Peter Sorotokin
09-10-2008, 04:40 PM
On Windows, just create a "compressed folder" (built-in functionality of Explorer) and add mimetype file as a first step before everything else. It figures out automatically that it is too small to compress, so it adds it as is. Then add everything else.

The same may be true for other GUI Zip utilities.

kovidgoyal
09-10-2008, 07:09 PM
Do any epub reading systems actually insist on the mimetype file being first?

wallcraft
09-11-2008, 11:36 AM
Do any epub reading systems actually insist on the mimetype file being first? Probably not. In principle the Unix file command should be able to identify the file as an ePub if mimetype is first, but under Fedora 9 all I get is "Zip archive". Even FBReader (which looks inside .pdb files for the file type) just keys on the .epub filename extension.

For Linux, the following single command works: zip -vur Author_Title.epub mimetype * The "*" wildcard also includes mimetype but it still ends up 1st in the file and uncompressed. The "-u" does not seem necessary, but it is supposed to only overwrite existing files in the archive with newer versions.

salty-horse
09-12-2008, 09:34 AM
Info-Zip actually checks if the "deflate" algorithm will reduce the file size. In the case of the mimetype file, it's too small for the algorithm to be efficient, so no compression is done.

It also removes duplicates from the file list (such as the case of "mimetype *"), and zips the files by order.

All of those form a very nice trick. :)

HOWEVER (and I just figured this out), this creates a non-standard epub file. :(

The specification states:

http://www.idpf.org/ocf/ocf1.0/download/ocf10.htm

The [mimetype] file MUST be neither compressed nor encrypted and there MUST NOT be an extra field in its ZIP header.

By default, zip adds "extra file attributes". This prevent the text 'application/epub+zip' from being located in byte 38 in the epub file.

The original in my post used the -X flag on the mimetype. That flag strips away the headers, and is what required by the spec.

BTW, the commands were given to me by Hadrien Gardeur. I claim no credit.

salty-horse
09-12-2008, 09:47 AM
Doh!

zip -Xvur Author_Title.epub mimetype *

Yay! The extended file attributes store things like Unix user and group id's. They're really meaningless when zipping and unzipping on different machines, so why not compress all files like this...
The zip command is verbose by default, and -u isn't needed for new files, so you can even make this shorter:

zip -Xr Author_Title.epub mimetype *

:D