Invalid TOC links in some azw3 files - Page 4

Notjohn · 05-01-2015, 05:13 AM

Quote:

Originally Posted by darryl

Kovid. Thanks for taking the time to explain.

There are some levels of explanation that leave my mind spinning hopelessly, unable to gain any traction. That happens fairly often on this forum.

@Hitch: It's actually now Step 7 in the publishing process where one downloads the converted "mobi" file.

KevinH · 05-07-2015, 04:38 PM

I took a look at this just to see if the test sample had any interesting metadata that might help explain how it was created. The only interesting things I could see were the following:

Key: "Input_Source_Type_(534)"
Value: "kjw"

I think the above means the book was a kindle ebook of some sort. Other values I have seen for this metadata value are "epub" and "mobi". I am not sure what a "kjw" really means?

Key: "547 (hex)"
Value: 0x496e4d656d6f7279 InMemory

Key: "548 (hex)"
Value: 0x496e4d656d6f7279 InMemory

Both of the above are new metadata values to me. They are ascii code for the string "InMemory". I have no idea what that is a flag for.

Key: "kindlegen_Source-Target_(529)"
Value: "Source-Target:c1-c2 KT_Version:2.9 Build:1202-3f1a435"

Key: "Unknown_(526)"
Value: "kindletool2.9 Build:1202-3f1a435"

And based on above, what is kindletool2.9? anyway? Is it the same as what is used at KDP or is it an early version of kindlegen or ?

BTW, from the all of the span tags with the inline styles that are font related, this code really looks like it was converted from html3.2 (old mobi 7 or earlier) that used inline styles to replace the old font related tags. And that is probably why there was no targets for the ncx entries as the old mobis stripped those out and replaced them with just file positions.

Anyway, it would be interesting to see if other books with this problem had similar values for their metadata. If you have one of these books, you can run DumpMobiHeader_v018.py on it and see if any of these metadata values can be properly interpreted. BTW: You can dump the metadata even if the book has DRM since the metadata and header are stored in the clear.

I did look at the ncx index tags and there were no new ones, they just used the standard ncx tags we already knew about.

Interesting.

I will release a KindleUnpack update with DiapDealer's/Kovid's fix this weekend.

Thanks,

KevinH

darryl · 05-08-2015, 08:10 AM

Kevin - I've attached a header dump of a similarly affected book called Fatal Flowers by Enes Smith, also an Amazon .azw3. Of interest similar to your post include:

Key: "Input_Source_Type_(534)"
Value: "kpr"

Key: "Unknown_but_changes_with_file_name_only_(542) "
Value: "JkCn"

Key: "Unknown_but_changes_with_file_name_only_(542) "
Value: "JkCn"

Key: "kindlegen_Source-Target_(529)"
Value: "Source-Target:c1-c2 KT_Version:2.9 Build:0218-9566cc5"

Warning: Unknown metadata with id 547 found
Key: "547 (hex)"
Value: 0x49006e004d0065006d006f0072007900

Warning: Unknown metadata with id 548 found
Key: "548 (hex)"
Value: 0x496e4d656d6f7279

Key: "Unknown_(526)"
Value: "kindletool2.9 Build:0806-f6e82bf"

KevinH · 05-08-2015, 09:14 AM

Hi darryl,

Thanks for posting that. Very interesting. Your book seems to be using a earler build of the same KindleTool 2.9 and has the same strange Metadata values of 547 and 548. The 547 metadata element appears to be using utf-16 encoding of the text which is a bug that was fixed in a later build. Interestingly, both your ebook and mine have the strange "InMemory" value for those two new metadata values.

Your source is "kpr" which is probably a kindle prc or some other early mobi type (that a pure wag on my part). It is certain neither were from epubs.

Does anyone know if KindleTool is an internal Amazon tool or not?

KevinH

Quote:

Originally Posted by darryl

Kevin - I've attached a header dump of a similarly affected book called Fatal Flowers by Enes Smith, also an Amazon .azw3. Of interest similar to your post include:

Key: "Input_Source_Type_(534)"
Value: "kpr"

Key: "Unknown_but_changes_with_file_name_only_(542) "
Value: "JkCn"

Key: "Unknown_but_changes_with_file_name_only_(542) "
Value: "JkCn"

Key: "kindlegen_Source-Target_(529)"
Value: "Source-Target:c1-c2 KT_Version:2.9 Build:0218-9566cc5"

Warning: Unknown metadata with id 547 found
Key: "547 (hex)"
Value: 0x49006e004d0065006d006f0072007900

Warning: Unknown metadata with id 548 found
Key: "548 (hex)"
Value: 0x496e4d656d6f7279

Key: "Unknown_(526)"
Value: "kindletool2.9 Build:0806-f6e82bf"

eschwartz · 05-08-2015, 11:02 AM

Well, there is a kindletool, by NiLuJe -- its purpose is to decrypt and unpack Kindle software updates (or create them) and I am confident that that isn't going on here.

Aside from that, nothing. So it is almost definitely an internal Amazon tool.

darryl · 05-12-2015, 12:01 PM

The Saga continues. I wrote to Andreas Christensen, the author of Exodus, who was good enough to reply. The second edition of Exodus, which was our "test book", was written in Word and the Word file was uploaded to KDP. There was also a first edition published through BookBaby which does not appear to be relevant here. Andreas now mainly writes in Scrivener so it is likely that his later works will be much better formatted as EBooks.

This is interesting since the input source was shown as "kjw", which Kevin speculated quite reasonably was some type of Kindle EBook of some sort. I now wonder if the "w" stands for Word? In this case, and it is really clutching at straws, could the "r" in the "kpr" input format in the second problem book signify "rich text format" Could these be some sort of intermediate file produced by KDP from Word or RTF before being input to Kindlegen? Perhaps even the mysterious Kindletool? It may be that we will never know the answer. If the problem has now been fixed in KDP we will not be able to reproduce it.

DiapDealer · 05-12-2015, 01:51 PM

Quote:

Originally Posted by darryl

In this case, and it is really clutching at straws, could the "r" in the "kpr" input format in the second problem book signify "rich text format" Could these be some sort of intermediate file produced by KDP from Word or RTF before being input to Kindlegen? Perhaps even the mysterious Kindletool?

Makes as much sense as anything else to me.

Quote:

Originally Posted by darryl

It may be that we will never know the answer. If the problem has now been fixed in KDP we will not be able to reproduce it.

In all fairness ... there's technically not anything to "fix" in KDP (WRT this particular issue, anyway). Whatever processing is going on, it's producing a valid Kindlebook with a working internal NCX ToC--functional on all Kindle devices/apps. Problems only arose because KindleUnpack and calibre didn't (then) have a way of unbinary-ing them while retaining the functional ncx.

KevinH · 05-12-2015, 03:20 PM

Yes, and that occurred because there was no target id (or name) attributes to be the target for the toc/ncx link. So all we know for sure is that whatever was input into the conversion was not a valid epub of any sort.

KindleUnpack run on azw3 files *always* assumes kindlegen/KDP was given a valid epub as input (which is what people normally do) but given other input (mobi 6, word files, invalid epub, etc) you would get a non-valid epub back out of KindleUnpack which is exactly what happened.

Luckily Kovid and DiapDealer were able to work around it to fix this type of inconsistency for Calibre and KindleUnpack.

On a side note: kpr could easily be a KindlePrintReplica (pdf) ebook. I wonder what would happen if kindlegen were given a .azw4 or a pdf as input with the proper opf entries?

KevinH

darryl · 05-12-2015, 08:11 PM

Quote:

Originally Posted by DiapDealer

In all fairness ... there's technically not anything to "fix" in KDP (WRT this particular issue, anyway). Whatever processing is going on, it's producing a valid Kindlebook with a working internal NCX ToC--functional on all Kindle devices/apps. Problems only arose because KindleUnpack and calibre didn't (then) have a way of unbinary-ing them while retaining the functional ncx.

Thanks. This is of course correct. So, not being anything to "fix", there is no reason for them to fix it. So there is still hope for anyone curious enough to run a test case or two through KDP. At the moment we know the original input to KDP for Exodus was from Word and little else. I'm curious and will keep an eye on this thread in case there are further developments, but I can't think of any good reason beyond simple curiousity for anyone to waste further time now this anomaly has been fixed in Calibre and KindleUnpack.

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
epub --> azw3 links loss	kerliza	Conversion	9	09-26-2014 01:09 AM
"invalid start byte" when trying to open a azw3 file with calibre	berlineirn06	Conversion	4	12-26-2012 01:44 PM
Generated TOC links back to TOC page in the book	Caleb666	Sigil	7	08-17-2011 11:58 AM
Redundant/Invalid TOC entries	Stinger	Kobo Reader	4	06-26-2010 09:02 PM
patch: LrfError: page id invalid in toc	grimborg	Calibre	0	04-07-2010 05:22 AM

05-07-2015, 04:38 PM	#47
KevinH Sigil Developer Posts: 8,770 Karma: 6000000 Join Date: Nov 2009 Device: many	I took a look at this just to see if the test sample had any interesting metadata that might help explain how it was created. The only interesting things I could see were the following: Key: "Input_Source_Type_(534)" Value: "kjw" I think the above means the book was a kindle ebook of some sort. Other values I have seen for this metadata value are "epub" and "mobi". I am not sure what a "kjw" really means? Key: "547 (hex)" Value: 0x496e4d656d6f7279 InMemory Key: "548 (hex)" Value: 0x496e4d656d6f7279 InMemory Both of the above are new metadata values to me. They are ascii code for the string "InMemory". I have no idea what that is a flag for. Key: "kindlegen_Source-Target_(529)" Value: "Source-Target:c1-c2 KT_Version:2.9 Build:1202-3f1a435" Key: "Unknown_(526)" Value: "kindletool2.9 Build:1202-3f1a435" And based on above, what is kindletool2.9? anyway? Is it the same as what is used at KDP or is it an early version of kindlegen or ? BTW, from the all of the span tags with the inline styles that are font related, this code really looks like it was converted from html3.2 (old mobi 7 or earlier) that used inline styles to replace the old font related tags. And that is probably why there was no targets for the ncx entries as the old mobis stripped those out and replaced them with just file positions. Anyway, it would be interesting to see if other books with this problem had similar values for their metadata. If you have one of these books, you can run DumpMobiHeader_v018.py on it and see if any of these metadata values can be properly interpreted. BTW: You can dump the metadata even if the book has DRM since the metadata and header are stored in the clear. I did look at the ncx index tags and there were no new ones, they just used the standard ncx tags we already knew about. Interesting. I will release a KindleUnpack update with DiapDealer's/Kovid's fix this weekend. Thanks, KevinH

05-08-2015, 11:02 AM	#50
eschwartz Ex-Helpdesk Junkie Posts: 19,421 Karma: 85400180 Join Date: Nov 2012 Location: The Beaten Path, USA, Roundworld, This Side of Infinity Device: Kindle Touch fw5.3.7 (Wifi only)	Well, there is a kindletool, by NiLuJe -- its purpose is to decrypt and unpack Kindle software updates (or create them) and I am confident that that isn't going on here. Aside from that, nothing. So it is almost definitely an internal Amazon tool.

05-12-2015, 12:01 PM	#51
darryl Wizard Posts: 3,108 Karma: 60231510 Join Date: Nov 2011 Location: Australia Device: Kobo Aura H2O, Kindle Oasis, Huwei Ascend Mate 7	The Saga continues. I wrote to Andreas Christensen, the author of Exodus, who was good enough to reply. The second edition of Exodus, which was our "test book", was written in Word and the Word file was uploaded to KDP. There was also a first edition published through BookBaby which does not appear to be relevant here. Andreas now mainly writes in Scrivener so it is likely that his later works will be much better formatted as EBooks. This is interesting since the input source was shown as "kjw", which Kevin speculated quite reasonably was some type of Kindle EBook of some sort. I now wonder if the "w" stands for Word? In this case, and it is really clutching at straws, could the "r" in the "kpr" input format in the second problem book signify "rich text format" Could these be some sort of intermediate file produced by KDP from Word or RTF before being input to Kindlegen? Perhaps even the mysterious Kindletool? It may be that we will never know the answer. If the problem has now been fixed in KDP we will not be able to reproduce it.

05-12-2015, 03:20 PM	#53
KevinH Sigil Developer Posts: 8,770 Karma: 6000000 Join Date: Nov 2009 Device: many	Yes, and that occurred because there was no target id (or name) attributes to be the target for the toc/ncx link. So all we know for sure is that whatever was input into the conversion was not a valid epub of any sort. KindleUnpack run on azw3 files always assumes kindlegen/KDP was given a valid epub as input (which is what people normally do) but given other input (mobi 6, word files, invalid epub, etc) you would get a non-valid epub back out of KindleUnpack which is exactly what happened. Luckily Kovid and DiapDealer were able to work around it to fix this type of inconsistency for Calibre and KindleUnpack. On a side note: kpr could easily be a KindlePrintReplica (pdf) ebook. I wonder what would happen if kindlegen were given a .azw4 or a pdf as input with the proper opf entries? KevinH