Hi Steffen,
I made all of the code changes and created a FastConcat class that hides all of the hugeFile temp file creation and string lists appending. It is simple to use and it uses the python tempfile module.
fc = FastConcat(hugeFile)
...
fc.concat(data)
...
fc.getresult()
That all seemed to work fine. Then I reverted your image file name extension changes and now I can see why you decided to ignore the file extensions on images! ;-)
Your approach allows you to update all image links with one regular expressions substitution which is much faster than doing one for each image.
I had one old dictionary to play/test with and it unfortunately uses the older unsupported inflection rules but it did let me play with things and it used over 9000 gifs and jpegs.
It would indeed take a very long run time to process all of those image links one by one.
So I will have to try something else to speed it up. When I get a workable solution, I will post it.
|