MobileRead Forums - View Single Post - Ruling: IP address does not identify person

HarryT · 05-17-2012, 08:15 AM

Quote:

Originally Posted by JoeD

It's very very possible with text. I won't pretend to know anything about the algorithms used in detail, but if you google huffman encoding, that's one of the ways they can compress so heavily.

The numbers I used are real. I made a fresh copy of my apache log, 5.8MB in size (yes I rounded to 6MB in my OP

. After gzip -9 compression, 180KB gzip'd, or if bzip2'd instead 121KB.

That implies massive redundancy, as I said in my previous post. If you compress a typical piece of "real world" text (eg a book) you'll typically get a factor of 2 or 3 from it in the way of compression.