Quote:
Originally Posted by ekaser
Sorry, I'm not a Python expert (just starting on it, really, I'm a long-time C guy), so I can't help you with Python. But the old, tried and true method, is to not read the whole thing in all at once, read it in chunks, process that chunk, when you get "close to the end" of the chunk, move it up and refill the queue with the next chunk from the file and keep going. Of course, that works better with some 'things' than others, but I would think it would work reasonably well with .rtf text files, which are pretty linear beasts. You might have to keep around a 'stack' of "open blocks" for text that's long since been flushed from the processing queue, so that you know what's pending when you reach the end of that block in the queue, but probably not. If you make the processing queue sufficiently large (4M? 8M? 16M 32M? any of those would probably be plenty big and would avoid the "memory issues"), then you could update/refill the queue at opportune moments. In managing the queue, you can either move the unused portion up and then refill from there to the end of the queue, or just keep pointers to the start and end of the unprocessed portion, and refilling the queue then involves two reads, the first to fill the tail-end portion of the unfilled queue and the second to fill the front-end unused portion. If/when speed of processing is not an issue (which I don't think it is in your case), then move-and-fill is preferred, because it makes the rest of the code MUCH simpler. With "rotating pointers", you're constantly checking for reaching the end of the queue and whether the end pointer is greater or lesser than the start pointer and such. PITA.
|
I might have to resort to that... I'm just irked that I have to do any workaround for code that clearly works fine with smaller files, and even with the larger files are not actually exceeding (or even approaching) my hardware's memory limits.
But you are right. It shouldn't be too hard to do.
- Ahi