MobileRead Forums - View Single Post - KindleUnpack (MobiUnpack): Extracts text, images and metadata from Kindle/Mobi files

KevinH · 12-16-2011, 10:18 AM

Hi All,

You should check out the following links to get copies of the new amazon k8 format files to play around with and test with:

http://www.the-digital-reader.com/20...now-available/

I grabbed the Jerome.mobi and tried unpacking it via mobiunpack.py with all DEBUG turned on.

It seems that Amazon have simply combined two different mobi ebooks into one palm doc container.

The one at the top is simply the normal mobi and mobiunpack works well on it but it generates extra raw pieces. You can find all of these extra raw pieces hidden away as image*.raw files inside the images folder. These include FONT and RESC files plus copies of each section in its own file until the end of the palm doc. So by examining these extra image*.raw files in a text editor we can see what each section of the palmdoc contains.

Immediately after the normal mobi ebook (in the very next section) you can find a whole section that appears to be nothing but the word "BOUNDARY" which seems to be the divider between the older .mobi file format and the new format.

It is followed by what looks like a new section 0 mobi header, and that is followed by all of the raw .xhtml in each section until the end (but unlike true image sections these has been compressed so we will need to uncompress them to see what the new xhtml looks like. So the old format mobi is at the top of the palmdoc container and immediately after the images and FLIS, FCIS (the images appear to by shared by both versions of the ebook) you can see the pieces that make up the new format.

So it appears we can look for things in the first mobi header that indicates that that a KF8 style data is included, and then parse those records using the new section 0 very much like we process the original mobi.

So anyone want to take a shot to modify the latest mobiunpack to unpack both versions of the files for these new K8s?

Volunteers welcome!

12-16-2011, 10:18 AM	#235
KevinH Sigil Developer Posts: 8,893 Karma: 6120478 Join Date: Nov 2009 Device: many	mobiunpack and the new K8 format Hi All, You should check out the following links to get copies of the new amazon k8 format files to play around with and test with: http://www.the-digital-reader.com/20...now-available/ I grabbed the Jerome.mobi and tried unpacking it via mobiunpack.py with all DEBUG turned on. It seems that Amazon have simply combined two different mobi ebooks into one palm doc container. The one at the top is simply the normal mobi and mobiunpack works well on it but it generates extra raw pieces. You can find all of these extra raw pieces hidden away as image.raw files inside the images folder. These include FONT and RESC files plus copies of each section in its own file until the end of the palm doc. So by examining these extra image.raw files in a text editor we can see what each section of the palmdoc contains. Immediately after the normal mobi ebook (in the very next section) you can find a whole section that appears to be nothing but the word "BOUNDARY" which seems to be the divider between the older .mobi file format and the new format. It is followed by what looks like a new section 0 mobi header, and that is followed by all of the raw .xhtml in each section until the end (but unlike true image sections these has been compressed so we will need to uncompress them to see what the new xhtml looks like. So the old format mobi is at the top of the palmdoc container and immediately after the images and FLIS, FCIS (the images appear to by shared by both versions of the ebook) you can see the pieces that make up the new format. So it appears we can look for things in the first mobi header that indicates that that a KF8 style data is included, and then parse those records using the new section 0 very much like we process the original mobi. So anyone want to take a shot to modify the latest mobiunpack to unpack both versions of the files for these new K8s? Volunteers welcome! Last edited by KevinH; 12-16-2011 at 10:22 AM.