Hi Steffen,
> Feel free to provide an improved version
Thanks, I already helped write the original version you adapted and my interest is in the additional code that converts the old mobi raw html into normal html for archival purposes. So having the extensions on the images is useful. I will add that back in.
> First of all the 20MB dictionary mobi file uncompresses into 100MB html text.
> And into this 100MB thousands of strings have to be inserted all over the html text, which means for each insert all the 100MB of data must be copied at least once.
Or as I said, we could try using lists of string segments and inserting segments into position via list insertion and then doing a join to put it all together.
If that works, then I will rewrite it that way, if not I will pull all of the pieces that do the write to a file versus concatenating strings into a separate function to clean the code up and make it more readable.
> Even the decompression of the compressed texts is much faster if I append each block to a temporary disk file instead of handling everything in memory.
Good point.
> But feel free to do a test...
Will do.
Thanks,
Kevin