I took a look at this just to see if the test sample had any interesting metadata that might help explain how it was created. The only interesting things I could see were the following:
Key: "Input_Source_Type_(534)"
Value: "kjw"
I think the above means the book was a kindle ebook of some sort. Other values I have seen for this metadata value are "epub" and "mobi". I am not sure what a "kjw" really means?
Key: "547 (hex)"
Value: 0x496e4d656d6f7279 InMemory
Key: "548 (hex)"
Value: 0x496e4d656d6f7279 InMemory
Both of the above are new metadata values to me. They are ascii code for the string "InMemory". I have no idea what that is a flag for.
Key: "kindlegen_Source-Target_(529)"
Value: "Source-Target:c1-c2 KT_Version:2.9 Build:1202-3f1a435"
Key: "Unknown_(526)"
Value: "kindletool2.9 Build:1202-3f1a435"
And based on above, what is kindletool2.9? anyway? Is it the same as what is used at KDP or is it an early version of kindlegen or ?
BTW, from the all of the span tags with the inline styles that are font related, this code really looks like it was converted from html3.2 (old mobi 7 or earlier) that used inline styles to replace the old font related tags. And that is probably why there was no targets for the ncx entries as the old mobis stripped those out and replaced them with just file positions.
Anyway, it would be interesting to see if other books with this problem had similar values for their metadata. If you have one of these books, you can run DumpMobiHeader_v018.py on it and see if any of these metadata values can be properly interpreted. BTW: You can dump the metadata even if the book has DRM since the metadata and header are stored in the clear.
I did look at the ncx index tags and there were no new ones, they just used the standard ncx tags we already knew about.
Interesting.
I will release a KindleUnpack update with DiapDealer's/Kovid's fix this weekend.
Thanks,
KevinH
|