View Single Post
Old 05-27-2014, 03:07 PM   #57
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi,

Looking more closely, we could mmap the file and in that way create the equivalent of a mutable string so we would not have the issues with having multiple copies of the data at the same time.

Using mmap should keep memory usage quite close to the original 400 meg.

Alternatively, we can use direct access file io operations seek and read and write to build the output file on the fly reading it in in small chunks and writing it out as we go.

Either approach would eliminate the need to deal with the memory allocation and deallocation of python's immutable strings.

Do a google search on python and mmap
or on python and random/direct access files using seek

I personally think that using mmap would be fastest and easiest with 1X type memory usage (ie. kept around 400 meg for this file) but that using fileio approaches would have the smallest memory footprint but would be slower.

Let me know what you think.

KevinH
KevinH is offline   Reply With Quote