KindleUnpack (MobiUnpack): Extracts text, images and metadata from Kindle/Mobi files - Page 20

DiapDealer · 02-02-2012, 03:49 PM

Been playing with the unpacker and not really running into any issues with my limited source material. But I'm curious about the splitter function:

The old MOBI-only portion produced by the splitter doesn't seem to work with the Kindlepreviewer when emulating a Kindle Fire: "Open Error: error opening book.null." But the Kindle Fire Previewer handles other old-style MOBIs just fine (as I'm sure the Kindle Fire does). Is that just a current limitation of the splitter program, or was it something that slipped through the cracks? I Haven't had time to investigate the splitting routines myself, but will certainly do so.

Also, the Previewer will display the KF8-only file when emulating any of the older eInk devices (abeit with no formatting of any kind), but yet the actual device itself will result in a "can't open" error. That could just be a Previewer bug, I suppose.

KevinH · 02-02-2012, 05:21 PM

Hi DiapDealer,

Given how mobi_split.py was written it was easy to modify it to remove all of the KF8 specific metadata elements when writing out the mobi7 file.

So I tried after removing metadata 121 (Boundary), 125 (Count of resources), 129 (kindle:embed masthead/cover image), and 131 (unidentified count).

And then tried to open the mobi7 specific file in Kindle Previewer set for Fire and it still barfs. I wonder if it assumes the Creator Build number when set to 2 will always be dual?

So I am not sure why this is happening unless hidden away somewhere in the FCIS, or mobiheader or some other section is something that is telling the Previewer to assume this is a KF8 mobi.

Until we find and fix this, splitting will is not perfect yet.

Kevin

Quote:

Originally Posted by DiapDealer

Been playing with the unpacker and not really running into any issues with my limited source material. But I'm curious about the splitter function:

The old MOBI-only portion produced by the splitter doesn't seem to work with the Kindlepreviewer when emulating a Kindle Fire: "Open Error: error opening book.null." But the Kindle Fire Previewer handles other old-style MOBIs just fine (as I'm sure the Kindle Fire does). Is that just a current limitation of the splitter program, or was it something that slipped through the cracks? I Haven't had time to investigate the splitting routines myself, but will certainly do so.

Also, the Previewer will display the KF8-only file when emulating any of the older eInk devices (abeit with no formatting of any kind), but yet the actual device itself will result in a "can't open" error. That could just be a Previewer bug, I suppose.

KevinH · 02-02-2012, 07:24 PM

Hi,

I tried removing all metadata that was K8 specific. I tried removing all metadata that talked about creator software versions. I noticed that the FCIS sections where different between old mobis that Kindle Previewer in Fire mode showed well and these split versions and modified the software to create the old style FCIS (FLIS) was the same. I even put back the font and resc files instead of zeroing them out, and still no luck!

Somehow somewhere there must be something in the mobi7 split file that confuses the Kindle Previewer when in Fire mode but I really can not see it.

I dumped the mobi7 header.dat and the header.dat form one version create with a mobi to mobi conversion in calibre and compared them. I simply can not find much different that might matter in any way.

KevinH · 02-02-2012, 08:04 PM

Hi DiapDealer,

It seems the exth_flags value found in the mobi header at bytes 0x80 - 0x83 is the culprit. They seem to have expanded this to have new flags.

In all regular old mobi ebooks with an exth this value is 0x0050 (0x0040 is the bit that indicates there is an exth section).

All of the split mobi7 files have the value 0x1850 here so it appears the first two bytes of this flag field are being used for something.

If I do the following in the right place in mobi_split.py

flgval, = struct.unpack_from('>L', datain_rec0, 0x80)
flgval = flgval & 0x0050
datain_rec0 = datain_rec0[:0x80] + struct.pack('L',flgval) + datain_rec0[0x84:]

just before datain_rec0 is written back near the end of the mobi7 stage.

Then when I fire it up in Kindle Preview set for Fire it shows up with no errors.

So we will probably have to figure out what those extra bits mean in that extended field in order to figure out how to deal with them properly. Something in there is convincing the Kindle Previewer that this is a dual mobi kf8 ebook.

Kevin

Quote:

Originally Posted by KevinH

Hi,

I tried removing all metadata that was K8 specific. I tried removing all metadata that talked about creator software versions. I noticed that the FCIS sections where different between old mobis that Kindle Previewer in Fire mode showed well and these split versions and modified the software to create the old style FCIS (FLIS) was the same. I even put back the font and resc files instead of zeroing them out, and still no luck!

Somehow somewhere there must be something in the mobi7 split file that confuses the Kindle Previewer when in Fire mode but I really can not see it.

I dumped the mobi7 header.dat and the header.dat form one version create with a mobi to mobi conversion in calibre and compared them. I simply can not find much different that might matter in any way.

dch · 02-05-2012, 04:58 PM

Noob here, trying to use this script to pull a pdf out of a .azw4 file. When I run in python 3.2, I get the error:
' C:\Python32>python mobiunpack32.py POM.azw4
' File "mobiunpack32.py", line 323
' print "multiple values: metadata[%s]=%s'" % (name, metadata[name])
' ^
' SyntaxError: invalid syntax

Where I have the mobiunpack32.py and POM.azw4 in the same directory as the python 3.2 installation. Any advice on what I should be doing?

DaleDe · 02-05-2012, 05:49 PM

To dch

A. Python must be below 3.0
B. I don't think it works for azw4 files and will not generate a PDF.

dch · 02-05-2012, 05:57 PM

Quote:

Originally Posted by DaleDe

To dch

A. Python must be below 3.0
B. I don't think it works for azw4 files and will not generate a PDF.

Ahh, Thank you. Installed Python 2.7. It worked perfectly!!

DaleDe · 02-05-2012, 05:58 PM

Glad it worked for you and I learned a bit as well.

pdurrant · 02-06-2012, 05:16 AM

Quote:

Originally Posted by DaleDe

Glad it worked for you and I learned a bit as well.

Yes, I added the azw4/PDF support back in version 0.31.

There's currently a lot of development going on to get it to support the new KF8 format files, but by other people. When it settles down a bit, I'll see about updating the post near the beginning of this thread with the new version.

andreasmarc · 02-07-2012, 11:45 AM

I'm trying to figure out some details regarding the kf8 metadata support and find the decoder very useful in general, so thank you for that!
I'm having a difficulty to find out if some of the OPF attributes in the dc elements are added to the kf8 format. When generating a mobi from an opf file with the previewer and unpacking it with the decoder, it has lost the attributes:

<dc:creator opf:role="this" opf:file-as="that">…</dc:creator>

I thought that mobi supported these opf related schemes, and I get the warnings

Warning: Unknown metadata with id 129 found
Warning: Unknown metadata with id 131 found
Warning: Unknown metadata with id 125 found

from the unpacking process. Could it be that these information get lost during unpacking?

Andreas

KevinH · 02-07-2012, 12:23 PM

Hi,

The original epub opf can and does store that metadata but that metadata gets converted to Mobi style metadata and stored inside the EXTH section of the mobi.

That is all we have to work with.

Try using the very latest version and turn on Debugging and then search in the copious amount of output for the word metadata and soon you will see a list of all of the metadata as key value pairs. The first set is from the mobi7 header and the second set is from the mobi8 header.

If you see some way to recapture the original metadata from the metadata stored inside the mobi, please let us know.

Quote:

Originally Posted by andreasmarc

I'm trying to figure out some details regarding the kf8 metadata support and find the decoder very useful in general, so thank you for that!
I'm having a difficulty to find out if some of the OPF attributes in the dc elements are added to the kf8 format. When generating a mobi from an opf file with the previewer and unpacking it with the decoder, it has lost the attributes:

<dc:creator opf:role="this" opf:file-as="that">…</dc:creator>

I thought that mobi supported these opf related schemes, and I get the warnings

Warning: Unknown metadata with id 129 found
Warning: Unknown metadata with id 131 found
Warning: Unknown metadata with id 125 found

from the unpacking process. Could it be that these information get lost during unpacking?

Andreas

See above. No info is purposely lost during unpacking, it may in fact be lost in the kindlegen conversion process and we can only unpack what is there.

Those unknown metadata values are extra pieces stored inside the mobi7 header that tell it how to process the later mobi8 ebook. They can safely be ignored.

KevinH

KevinH · 02-11-2012, 03:09 PM

Hi All,

Attached is an update to Mobi_Unpack_v0.45, a python 2.X program which works with both older mobi and newer KF8 mobi formats. It includes a Graphical User Interface frontend.

Windows Users need to fully install the free community editions of Active State Active Python 2.7.X to ensure that the graphical interface toolkit is properly installed. Linux users with python 2.7 installed and Mac OS X 10.5 and later users should work out of the box.

Bug fixes since the previous version include:

- fix a potential bug where the epub zip archive was not closed properly

- added new internal classes to make it easier to interface to internal calibre code

If you run into any problems or issues, please report them in this forum.

KevinH

Edited: Updated to the latest version 0.45

DiapDealer · 02-11-2012, 04:04 PM

I don't know if this was an issue before (I haven't had much time to devote to this), but the KF8-only portion created by the split feature no longer recognizes its own inline (html) TOC. Which means the Go To->ToC feature of the Kindles (or the KindlePreviewer) doesn't work. The html ToC is still physically there, of course... the Kindle just doesn't know it's a ToC. (Note: this would be related to the "toc" reference item in the guide portion of the OPF, not the NCX—which works as expected).

The mobi-only portion of the split feature doesn't seem to suffer from the same problem. I'll look, but the splitting code is all brand new to me. It might take me a bit to absorb what's going on. In the meantime, let me know if you'd like a sample book that exhibits the problem in KindlePreviewer after splitting.

KevinH · 02-11-2012, 04:37 PM

Hi DiapDealer,

Please e-mail me your problem test case. I split both Jerome.mobi and the Cherokee language testcase.mobi and then unpacked the mobi8 only version of each and in the guide section of the content.opf there is indeed the proper toc pointing to the right file.

Is there someplace other than the guide section of content.opf that needs to be properly set to allow it to find its toc?

For the KF8 part, all of the guide is stored in the oth_tbl and my tests show it properly there in both versions.

For the old mobi sections, the guide info is stored inside the rawml file and just needs to be copied to the right place in the content.opf

Thanks,

KevinH

Quote:

Originally Posted by DiapDealer

I don't know if this was an issue before (I haven't had much time to devote to this), but the KF8-only portion created by the split feature no longer recognizes its own inline (html) TOC. Which means the Go To->ToC feature of the Kindles (or the KindlePreviewer) doesn't work. The html ToC is still physically there, of course... the Kindle just doesn't know it's a ToC. (Note: this would be related to the "toc" reference item in the guide portion of the OPF, not the NCX—which works as expected).

The mobi-only portion of the split feature doesn't seem to suffer from the same problem. I'll look, but the splitting code is all brand new to me. It might take me a bit to absorb what's going on. In the meantime, let me know if you'd like a sample book that exhibits the problem in KindlePreviewer after splitting.

KevinH · 02-11-2012, 05:04 PM

Hi,

Okay, I can recreate this with KindlePreviewer and Jerome.mobi.

The problem is that if I unpack Jerome.mobi and the mobi8-Jerome.mobi and compare the content.opf files and the contents of the oth_tbl (which is used to build the guide elements), they are identical (byte for byte).

So something inside the mobi8 only version must indicate the presence or absence of a toc that we do not know about. Perhaps one of the fields that we don't know in the mobi8 header or something in the damn FLIS, FCIS, or DATP.

This is going to be a real bugger to find.

KevinH

Quote:

Originally Posted by KevinH

Hi DiapDealer,

Please e-mail me your problem test case. I split both Jerome.mobi and the Cherokee language testcase.mobi and then unpacked the mobi8 only version of each and in the guide section of the content.opf there is indeed the proper toc pointing to the right file.

Is there someplace other than the guide section of content.opf that needs to be properly set to allow it to find its toc?

For the KF8 part, all of the guide is stored in the oth_tbl and my tests show it properly there in both versions.

For the old mobi sections, the guide info is stored inside the rawml file and just needs to be copied to the right place in the content.opf

Thanks,

KevinH

02-02-2012, 03:49 PM	#286
DiapDealer Grand Sorcerer Posts: 27,550 Karma: 193191846 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD	Been playing with the unpacker and not really running into any issues with my limited source material. But I'm curious about the splitter function: The old MOBI-only portion produced by the splitter doesn't seem to work with the Kindlepreviewer when emulating a Kindle Fire: "Open Error: error opening book.null." But the Kindle Fire Previewer handles other old-style MOBIs just fine (as I'm sure the Kindle Fire does). Is that just a current limitation of the splitter program, or was it something that slipped through the cracks? I Haven't had time to investigate the splitting routines myself, but will certainly do so. Also, the Previewer will display the KF8-only file when emulating any of the older eInk devices (abeit with no formatting of any kind), but yet the actual device itself will result in a "can't open" error. That could just be a Previewer bug, I suppose. Last edited by DiapDealer; 02-02-2012 at 04:00 PM.

02-05-2012, 04:58 PM	#290
dch Junior Member Posts: 2 Karma: 10 Join Date: Feb 2012 Location: Virginia, US Device: kindle 2.5, Asus Transformer Prime	Noob here, trying to use this script to pull a pdf out of a .azw4 file. When I run in python 3.2, I get the error: ' C:\Python32>python mobiunpack32.py POM.azw4 ' File "mobiunpack32.py", line 323 ' print "multiple values: metadata[%s]=%s'" % (name, metadata[name]) ' ^ ' SyntaxError: invalid syntax Where I have the mobiunpack32.py and POM.azw4 in the same directory as the python 3.2 installation. Any advice on what I should be doing? Last edited by dch; 02-05-2012 at 04:59 PM. Reason: correcting format

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Can i rotate text and insert images in Mobi and EPUB?	JanGLi	Kindle Formats	5	02-02-2013 04:16 PM
PDF to Mobi with text and images	pocketsprocket	Kindle Formats	7	05-21-2012 07:06 AM
Mobi files - images	DWC	Introduce Yourself	5	07-06-2011 01:43 AM
pdf to mobi... creating images rather than text	Dumhed	Calibre	5	11-06-2010 12:08 PM
Transfer of images on text files	anirudh215	PDF	2	06-22-2009 09:28 AM

02-02-2012, 07:24 PM	#288
KevinH Sigil Developer Posts: 7,644 Karma: 5433388 Join Date: Nov 2009 Device: many	Hi, I tried removing all metadata that was K8 specific. I tried removing all metadata that talked about creator software versions. I noticed that the FCIS sections where different between old mobis that Kindle Previewer in Fire mode showed well and these split versions and modified the software to create the old style FCIS (FLIS) was the same. I even put back the font and resc files instead of zeroing them out, and still no luck! Somehow somewhere there must be something in the mobi7 split file that confuses the Kindle Previewer when in Fire mode but I really can not see it. I dumped the mobi7 header.dat and the header.dat form one version create with a mobi to mobi conversion in calibre and compared them. I simply can not find much different that might matter in any way.

02-05-2012, 05:49 PM	#291
DaleDe Grand Sorcerer Posts: 11,470 Karma: 13095790 Join Date: Aug 2007 Location: Grass Valley, CA Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7	To dch A. Python must be below 3.0 B. I don't think it works for azw4 files and will not generate a PDF.

02-05-2012, 05:58 PM	#293
DaleDe Grand Sorcerer Posts: 11,470 Karma: 13095790 Join Date: Aug 2007 Location: Grass Valley, CA Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7	Glad it worked for you and I learned a bit as well.

02-07-2012, 11:45 AM	#295
andreasmarc Junior Member Posts: 1 Karma: 10 Join Date: Nov 2011 Device: iPad	I'm trying to figure out some details regarding the kf8 metadata support and find the decoder very useful in general, so thank you for that! I'm having a difficulty to find out if some of the OPF attributes in the dc elements are added to the kf8 format. When generating a mobi from an opf file with the previewer and unpacking it with the decoder, it has lost the attributes: <dc:creator opf:role="this" opf:file-as="that">…</dc:creator> I thought that mobi supported these opf related schemes, and I get the warnings Warning: Unknown metadata with id 129 found Warning: Unknown metadata with id 131 found Warning: Unknown metadata with id 125 found from the unpacking process. Could it be that these information get lost during unpacking? Andreas

02-11-2012, 04:04 PM	#298
DiapDealer Grand Sorcerer Posts: 27,550 Karma: 193191846 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD	I don't know if this was an issue before (I haven't had much time to devote to this), but the KF8-only portion created by the split feature no longer recognizes its own inline (html) TOC. Which means the Go To->ToC feature of the Kindles (or the KindlePreviewer) doesn't work. The html ToC is still physically there, of course... the Kindle just doesn't know it's a ToC. (Note: this would be related to the "toc" reference item in the guide portion of the OPF, not the NCX—which works as expected). The mobi-only portion of the split feature doesn't seem to suffer from the same problem. I'll look, but the splitting code is all brand new to me. It might take me a bit to absorb what's going on. In the meantime, let me know if you'd like a sample book that exhibits the problem in KindlePreviewer after splitting.

Advert

Advert