MobileRead Forums - View Single Post - KindleUnpack (MobiUnpack): Extracts text, images and metadata from Kindle/Mobi files

KevinH · 07-20-2011, 12:50 PM

Hi,

For fun ... I ran mobiunpack_v0.28.py on my one dictionary (file size is 27,585,020 bytes) and timed it (clock time from date in shell script both before and after mobiunpack) and then hard coded hugeFile to False and re-ran.

With hugeFile set as True: (uses file IO to temporary files)

Run Start Stop Elapsed Time
1 12:25:21 12:26:39 1 minute 18 seconds
2 12:26:45 12:28:02 1 minute 17 seconds

With hugeFile set as False (uses lists of strings and "".join(strlist)

Run Start Stop Elapsed Time
1 12:29:18 12:30:32 1 minute 14 seconds
2 12:30:38 12:31:53 1 minute 15 seconds

It was as I expected. There is no "memory issue" when using lists of strings.
In most OS's File IO has overhead and typically writes data to large memory buffers (buffered io) and does not actually flush them to disk unless pushed or until closed. So any slight savings in memory use is offset by the disk overhead.

So it appears there is no real advantage for using temporary file IO over using lists of strings and a final join.

Please try the same thing with your dictionaries and see if you get the same results. If so, we can probably remove the file io approach and remove FactConcat and just go with the string list approach.

Thanks,

Kevin

07-20-2011, 12:50 PM	#83
KevinH Sigil Developer Posts: 8,964 Karma: 6361444 Join Date: Nov 2009 Device: many	Hi, For fun ... I ran mobiunpack_v0.28.py on my one dictionary (file size is 27,585,020 bytes) and timed it (clock time from date in shell script both before and after mobiunpack) and then hard coded hugeFile to False and re-ran. With hugeFile set as True: (uses file IO to temporary files) Run Start Stop Elapsed Time 1 12:25:21 12:26:39 1 minute 18 seconds 2 12:26:45 12:28:02 1 minute 17 seconds With hugeFile set as False (uses lists of strings and "".join(strlist) Run Start Stop Elapsed Time 1 12:29:18 12:30:32 1 minute 14 seconds 2 12:30:38 12:31:53 1 minute 15 seconds It was as I expected. There is no "memory issue" when using lists of strings. In most OS's File IO has overhead and typically writes data to large memory buffers (buffered io) and does not actually flush them to disk unless pushed or until closed. So any slight savings in memory use is offset by the disk overhead. So it appears there is no real advantage for using temporary file IO over using lists of strings and a final join. Please try the same thing with your dictionaries and see if you get the same results. If so, we can probably remove the file io approach and remove FactConcat and just go with the string list approach. Thanks, Kevin