View Full Version : Does anyone know the Mobipocket compression?


slayda
03-25-2008, 06:08 PM
Does anyone know the Mobipocket compression algorithms, or are they proprietary? Im interested in compression methods. If you don't know them, do you know where to find documentation on them?

wallcraft
03-25-2008, 06:52 PM
MobiPockect's highest level of compression isn't formally documented anywhere, but the python programs mobi2oeb (http://www.mobileread.com/forums/showthread.php?t=20626) and mobihuff.py (http://darkreverser.wordpress.com/2008/02/13/new-blog/) decompress DRM-free MOBI books.

tompe
03-25-2008, 09:29 PM
MobiPockect's highest level of compression isn't formally documented anywhere, but the python programs mobi2oeb (http://www.mobileread.com/forums/showthread.php?t=20626) and mobihuff.py (http://darkreverser.wordpress.com/2008/02/13/new-blog/) decompress DRM-free MOBI books.

And if somebody took these programs and wrote a high level specification it would be very good ...

igorsk
03-26-2008, 07:49 AM
Python is not that hard to read :)

tompe
03-26-2008, 12:08 PM
Python is not that hard to read :)

No. The hard part is to get a mathematical description of the method. I would like to do a clean room implementation since I am a bit unsure of the legal status of Python code and the legal status of an implementation based in the Python code. Also it would be much easier to re-implement it in Perl from a more high level description.

kovidgoyal
03-26-2008, 02:55 PM
It's just huffman coding, a public domain algorithm, with a few implementation winkles. Frankly, I don't see any difference between "a mathematical description" and a "source code description" since both use a language to describe an algorithm.

tompe
03-26-2008, 02:59 PM
It's just huffman coding, a public domain algorithm, with a few implementation winkles. Frankly, I don't see any difference between "a mathematical description" and a "source code description" since both use a language to describe an algorithm.

The python code is based on copyrighted code that has probably been accessed in a way that somebody can have legal opinions about. If I just copy the Python code it I am not sure it can be GPL:ed. And I am very unsure if the GPL notice in the Python code is correct.

A mathematical description from which I do an implementation were I do not look at existsing copyrighted code is definitely unproblematic.

kovidgoyal
03-26-2008, 03:06 PM
You're assuming the python code was written looking at copyrighted code rather than reverse engineering? On what basis? For a mathematical description, just google huffman coding.

jonj
03-30-2010, 12:38 AM
Thanks for the good start on decompression. I am kind of new to these algorithms and compression in general. My language of choice is C#. If I wanted to learn how to create a .mobi or .prc from original HTML/images/etc. in C# does anyone here have advice on how to begin? :blink: