Hi,
For fun ... I ran mobiunpack_v0.28.py on my one dictionary (file size is 27,585,020 bytes) and timed it (clock time from date in shell script both before and after mobiunpack) and then hard coded hugeFile to False and re-ran.
With hugeFile set as True: (uses file IO to temporary files)
Run Start Stop Elapsed Time
1 12:25:21 12:26:39 1 minute 18 seconds
2 12:26:45 12:28:02 1 minute 17 seconds
With hugeFile set as False (uses lists of strings and "".join(strlist)
Run Start Stop Elapsed Time
1 12:29:18 12:30:32 1 minute 14 seconds
2 12:30:38 12:31:53 1 minute 15 seconds
It was as I expected. There is no "memory issue" when using lists of strings.
In most OS's File IO has overhead and typically writes data to large memory buffers (buffered io) and does not actually flush them to disk unless pushed or until closed. So any slight savings in memory use is offset by the disk overhead.
So it appears there is no real advantage for using temporary file IO over using lists of strings and a final join.
Please try the same thing with your dictionaries and see if you get the same results. If so, we can probably remove the file io approach and remove FactConcat and just go with the string list approach.
Thanks,
Kevin
|