MobileRead Forums - View Single Post - Programmatically reading mobi EXTH header

KevinH · 02-18-2012, 11:07 AM

Hi,

Since there are many different tools to manipulate Mobi Headers, I have put together a python 2.7 program that will work with Amazon/Mobi ebooks created with the very latest version of Kindlegen.

This program will dump all known and unknown fields and all EXTH metadata in each mobi header that is found in the ebook. This includes the latest KF8 dual mobi style books that Kindlegen now generates which have two separate headers and two EXTH metadata storage areas.

To run the program simply do the following:

python ./DumpMobiHeader.py PATH_TO_YOUR_EBOOK

on Mac or Linux

or

python .\DumpMobiHeader.py PATH_TO_YOUR_EBOOK

when running cmd.exe under Windows.

It should work on both drm and non-drm Amazon/Mobi style ebooks that use the latest header layout since the headers and metadata are not excrypted themselves). Please note that Amazon ties its DRM to many of the metadata fields (watermark, tts, etc) to prevent them from being changed. Also some new metadata values are required for the ebook to be read properly. So be careful exactly what metadata values you change or delete. You may end up breaking the ebook.

I wrote this to document all that is known about based on other tools, the wiki about our Mobi format, and from reversing the latest KF8 format mobis for the Mobi_Unpack program.

Even if you do not read/follow Python, the code itself documents what is known and should be easy enough to follow along.

If anyone knows of *any* corrections or extensions please let me know so we can keep this program updated to help properly document the mobi format.

Hope this helps,

KevinH