MobileRead Forums - View Single Post

siebert · 07-18-2011, 02:15 PM

Quote:

Originally Posted by KevinH

Hi Steffen,
I just wanted to say very nice job with your new mobiunpack.py version!

Thanks

Quote:

I diffed your speedup changes against the original and all looks great except for one thing, why did you remove the imghdr code that detects the proper image type so that it creates a file with the proper extension?

It's one of the speed optimization things. My image handling is generic, if I have a reference in the html for the image stored in section x, I don't have to look up the file but the name is just 0000y.jpg, where y is calculated from x.

Quote:

Are you using fake "image" files to store extra sections (non-html, non-image) from the original mobi file? Perhaps index information from the dictionaries?

No, only images are needed. In the mobiunpack version I've published some non-image sections will be written as image files, but they won't be referenced by the html source, so it doesn't matter.

I have an improved version which is not yet published which detects and ignores these non-images, but that is only for cosmetic reasons.

Quote:

Also, it would be nice to grab all of the string concats and file writes into one function that passes in the "big-file" flag, and new data and handles it, just to make the code look cleaner.

Feel free to provide an improved version

Quote:

That said, I find it hard to think that even a 26 meg mobi file fills up memory in todays multi gb machines.

First of all the 20MB dictionary mobi file uncompresses into 100MB html text.

And into this 100MB thousands of strings have to be inserted all over the html text, which means for each insert all the 100MB of data must be copied at least once.

Even the decompression of the compressed texts is much faster if I append each block to a temporary disk file instead of handling everything in memory.

Quote:

It might simply be that the string concatenation needs to be replaced with simply adding string pieces to a list and then doing a "".join(list) at the end.

Maybe, I didn't test that, but I doubt it would be as fast. But feel free to do a test...

Ciao,
Steffen