MobileRead Forums - View Single Post - KindleUnpack (MobiUnpack): Extracts text, images and metadata from Kindle/Mobi files

KevinH · 07-07-2014, 10:41 PM

Hi tkeo,

Please hold off on both your g and h versions for now. Let's focus on the resc code changes and then work from there.

FWIW, I am a firm believer in KISS engineering, especially for software. So I always focus on minimalist code and simplicity. So I want to change and simplify greatly the K8RESCProcessor code and remove taglist and make some other changes. The idea was to design something that could keep sequence/order as it parsed the resc data as well as something that only kept the actual information we needed and not try to keep the entire spine.

I have posted the equivalent parseRESC.py program (see attachment) to play with. Just pass it in any RESC file. It handles your test case just fine.

If you look at the main routine you will see how I would like to interface with the resc code. There is a spine_order list that keeps the keys (mainly skeleton file numbers) as strings, and these skelids are also used as keys into spine_idrefs, and etc. The extra metadata is extracted and ready to process as well.

We can change the opf generation code to add in all extra EXTH metadata (ASIN, etc,) inside a commented out section which will be ignored when parsed in the RESC code if you pass the file back through kindlegen after unpacking.

Once we agree on that or similar approach to parsing the resc, we can go back to figuring out the best way to simplify the opf generation code without off-loading code back to kindleunpack.py and to make it closer to what it was originally without all of the setid, regular expressions, taglist code, and all the associated overhead.

So please grab the latest parseRESC.py.zip and look it over and see if it does what we want in a minimal fashion. Please test it with your harder RESCXXXXX.dat files. Please let me know if you think it is missing some key information. Once we agree on how the resc code should work, we can move forward with reworking the opf generation code to take advantage of these changes.

Thanks,

KevinH