View Single Post
Old 05-17-2012, 10:54 AM   #240
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,557
Karma: 93980341
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by ApK View Post
Just because I'll do almost anything to avoid real work, I just created an email list with 1.2 million name/address entries.
There is a lot of repetition, but I did make random changes to groups of a hundred thousand or so at a time.

The raw text file is 47.8 MB. 7Zip's Ultra level compression gets it down to UNDER 8 KILOBYTES. Pretty impressive. I'm going to spend a few minutes trying with one of my actual huge log files and see what happens.
Yea - if you have 100,000 lines the same, they will essentially be replaced by one marker saying "repeat this 100,000 times". If you made your file 10 million lines long, it would still compress to 8k . Artificial cases like that aren't a terribly good test, because they will compress in a way that "real" data doesn't.
HarryT is offline   Reply With Quote