Quote:
Originally Posted by DMcCunney
Correct. They've stated that they couldn't simply use something like the Zip "Deflate" algorithm because they wanted to be able to start decompressing at any specified point in the file (like the dictionary definition for a particular word), and Zip always started at the beginning.
|
What they did was just compress source HTML in chunks so that each chunk after compression fits inside a PDB record. They also added an index for mapping an uncompressed text position to record number so that only necessary records need to be decompressed. None of this actually requires Huffman, it can be done with deflate or lzw or lzma or whatnot.