MobileRead Forums - View Single Post - KindleUnpack (MobiUnpack): Extracts text, images and metadata from Kindle/Mobi files

KevinH · 07-08-2014, 02:49 PM

Hi tkeo,

After looking at what you have done some more, I would like to adopt your idea of passing more information in filenames[] to make creating the opf easier.

So I propose replacing filenames[dir,filename] with something along the lines of:

fileinfo[key, dir, filename]

The "key" will be one of the following:
- skelid (skeleton number/partno converted to string) to match with RESC skelid
- "coverpage" - used when we create a coverpage
- None

This fileinfo will be passed to the opf code along with k8resc (the much simpler version I proposed) and the spine and manifest will be built as it was originally "on the fly", with the key used to access the spine_order, spine_idrefs, spine_properties, as we build it.

I have modified my mobi_k8resc.py version to add a "x_" prefix to the given idrefs from the RESC. I have also offloaded most of the RESC header and extraction processing from kindleunpack.py into the new mobi_k8resc.py and then changed the resc returned to simply be the k8resc object (as it will have all of the other info you stored in the resc[] list.

The code in the opf then does the following:

For the metadata, we use what you have but teach it to grok the new k8resc extra metadata format instead.

For the manifest:

we use the imgnames, fileinfo, and used_map information as before but now we looks up the original idref in the k8resc.spine_idrefs dictionary as needed otherwise we use our itemXXXXX style idrefs.

For the spine:

if k8resc exists and length of k8res.spine_order >= number of parts:

- we can use k8resc.spine_order to create the proper order and get all idrefs, and page properties from the k8resc object for the spine

else

- we build the spine in the order given by the fileinfo array which should match the k8proc.partInfo order as we always did previously.

How does that sound?

To better explain what I am thinking, I have thrown together some changes with a new mobi_k8resc.py and some associated changes in the opf but this is only tested briefly for epub2! It will most likely die under epub3 but I think it illustrates the approach I was thinking about. If you agree, we would then try to reduce the redundancy using this mobi_opf.py and fix it to work with epub3 and also remove all the dependencies on mobi_taglist.py since it should no longer be needed.

So please see KindleUnpack_v072x_test.zip that is attached.

This is not a public release!!!!!

This version is just meant to demonstrate the approach and ideas so we can decide how best to move forward.

Take care,

KevinH