Quote:
Originally Posted by ApK
Just because I'll do almost anything to avoid real work, I just created an email list with 1.2 million name/address entries.
There is a lot of repetition, but I did make random changes to groups of a hundred thousand or so at a time.
The raw text file is 47.8 MB. 7Zip's Ultra level compression gets it down to UNDER 8 KILOBYTES. Pretty impressive. I'm going to spend a few minutes trying with one of my actual huge log files and see what happens.
|
Yea - if you have 100,000 lines the same, they will essentially be replaced by one marker saying "repeat this 100,000 times". If you made your file 10 million lines long, it would still compress to 8k

. Artificial cases like that aren't a terribly good test, because they will compress in a way that "real" data doesn't.