View Single Post
Old 02-12-2013, 10:19 PM   #13
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
I want to clarify a few points I made.

When I'm referring to the loss of formatting I'm looking at the starting file size. A file with more information will typically be larger than a file with less information. Think of comparing a blu-ray to a vhs tape. The content may be the same but there is a huge difference in quality and amount of information. Less data will compress better than more data. So the comparison between the given formats is not a good apples to apples comparison in this regard.

This test really should look at the overall compression ratio. That is the percentage shrunk form the original size. Any other comparison isn't really valid.

My binary format comment has a few facets. you also need to keep in mind that some formats are already compressed. This can lead to reduced compression when compressed again vs if the data itself was uncompressed. Compression (typically) looks for repeated patterns. Compressing once will remove many patterns making subsequent compressions less performant until it cannot find any patterns and will not be able to reduce the size any more.

Which leads to the issue of binary formats and testings only with the gzip (gz) format. This only one compression format. It works great for text and is an all around good compression format. However, there are other compression formats that work better than gzip for binary data. There are other compression formats that work better in general but that's beside the point. You're only looking at one compression format and while one ebook format due to the nature of that ebook format compresses very well with gzip you can hardly say that ebook format has the best compression. Another compression format that works better with binary data could compress some of the other formats better than gzip can. I don't mean by producing a smaller files but by producing a better compression ratio for the given files.

Finally there is a difference between a compression format and a compression algorithm. gzip is a compression format not a compression algorithm. gzip uses the deflate compression algorithm. Which just so happens to be one of the (the main and the one required by the epub standard) algorithms used by the zip format. Which leads to the fact that a gzip and zip compressed file even using the same algorithm will end up with different sizes because they have different header/structural components. To truely compare you need to take this into account. But this becomes complicated when you look at formats like TCR that are both a compression format, ebook format, and algorithm.

So really all that's been shown in this test is the smallest file format which is known to be a format that compresses will with deflate ends up giving the smallest compressed file size. Larger files, with more data, in a format that does not have as good compression with deflate yield a larger file size.
user_none is offline   Reply With Quote