MobileRead Forums - View Single Post - KindleUnpack (MobiUnpack): Extracts text, images and metadata from Kindle/Mobi files

KevinH · 09-16-2011, 02:25 PM

Hi Kovid,

Quote:

Originally Posted by kovidgoyal

I just came across this thread. Some tips:
1) Do not use the INDX entries to build an NCX. INDX entries can have a maximum depth of two for books and 3 for periodicals. This is a limitation of the MOBI format. Instead parse the inline TOC, calculate the left indents and reconstruct the NCX from that. See code in mob/reader.py in calibre to do that.

I think the idea is to eventually use "mobiunpack.py" as a way for people to take mobi's generated by KindleGen, unpack them making the fewest changes as possible", Allow the user to make whatever changes they want and then pass the whole thing back through KindleGen to get back a mobi.

So I think the idea is to generate the NCX that is stored inside the mobi and pass it back in so that it get's regenerated in the exact same way.

Thus the idea to look at the internal ncx and not parse to create one of our own.

Quote:

2) If you still want to decompile indx entries and are looking at calibre code to figure out index entries:

a) note that currently indexer.py does not generate depth 2 indx entries for books, primarily because I got tired figuring out the TBS indexing for depth to book nodes.

Used your indexer.py code to verify what the tag values are and what they mean (parent, first_child, last_child, class, etc). Our code already handle's reading in depth 2 for ebooks (tested with books from Kindlegen, etc). But I have not tried it with a Periodical at all.

Quote:

b) you should look at the code in mobi/debug.py which is designed to decompile arbitrary MOBI files including the index and TBS information. You can run that code with calibre-debug --inspect-mobi filename.mobi

Great tool. Will do.

Thanks,

KevinH