MobileRead Forums - View Single Post - KindleUnpack (MobiUnpack): Extracts text, images and metadata from Kindle/Mobi files

KevinH · 06-19-2014, 09:50 AM

Quote:

Originally Posted by tkeo

Hi Kevin,
I feel easier to understand what are extracted and built when calling of write() are placed in single function( or in single file). In addition, I prefer to avoid reading from written file during processing.

No, that just makes the code harder to modify as everything must be collected and then written out in one code block. Let each object decide how to write itself out, you just pass in the data it needs to build itself properly, and where you want it to go. This encapsulates the all knowledge of the object in the objects own code.

Quote:

Another, I am considering to make an option to converting to epub only and output no other files. Holding datalist and partlist might be helpful to do so, but not necessary if there are other ways.

We have no guarantees a valid epub2 or epub3 was even passed in. If you give kindlegen a valid old mobi it will create a horrible mobi8 version that looks like one giant file. So the output of KindleUnpack is guaranteed to be only epub-like. Serious editing might be needed. Having the actual epub structure fully unpacked can allow people to easily edit and change things and pass the resulting content.opf right back into kindlegen until they get the results they want. So I don't think we need an option to only create an epub. It is easy enough to extract it fom the output.

Quote:

I will modify it; however, I am not fully understand your intention. Does it mean adding a parameter to K8RESCProcessor?

Yes, pass in the k8proc object when you create the k8resc object and allow it to store it away and then move most of the processing you are doing in the main KindleUnpack routine back into mobi_k8resc.py to better encapsulate it and prevent the needless copying of data structures to create partslist[].

Quote:

As you mentioned, storing actual data makes memory usage larger; therefore secno and offset are stored to extract from sect, except fonts.

But everything has already been extracted to that point. You seem to want to run a two path algorithm where the first pass finds out what every section is for and where it should go, and then a second pass to do the extraction itself.

Why make two passes when one will do. Simply intelligently walk the sections once while writing images and fonts out as you go along (they will all be needed so no wasted space) and collecting any data that needs to be collected for later processing of ncx, nav, opf, xhtml,etc
[/QUOTE]

Quote:

Yes, it is needed to create the svg based cover page.

I found that exact extract image height and width routine on the internet but did not see license information associated with it. Are you sure we have permission to incorporate it into our code?

Quote:

The number in image names are correspond to section numbers currently, so keeping the numbers might help to know where the image is come from; however, personally, I am not oppose to renaming (or write out) HD images to correspond to low resolution images.
It makes easer to switch the images in epub. I prefer to add an option to switch it by user.

So we add an option to use HDImage if available and if checked we walk the CONT sections in order and replace the corresponding image file in the Images folder. No need for changes in OPF or anything else.

Also, please remove all of the flags you pass in for epub version support. It just means passing around flag bits that no one else will follow. The Tk GUI calls unpackBook directly, and should not need to build up flags. Simply pass in one character based option --epub_version= and use 2 or 3 or A as desired, default to 2 if nothing is passed in but please do not make a flags field, it just makes the code unreadable.

Quote:

For conclusion, I am not stick to the code I have written, but please consider and give feedback to me about adding an option to convert to epub only though I do not plan to this immediately.
And I would like to hear considerations from others too.

I am not in favour of that. This should be a general unpack tool and should help people see and understand all of the new sections and headers as Amazon changes them. And Amazon has kept changing them and will probably continue to change them to add new features and section types.

Also, it is not a general epub creator as we can only guarantee an epub-like structure and not a valid epub2 or even 3 depending on just what was passed into kindlegen in the first place.

Take care,

KevinH