View Single Post
Old 08-04-2024, 03:12 PM   #15
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,980
Karma: 147448039
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by Eric Muller View Post
IIRC, it's actually 1024 bytes of the uncompressed HTML file that make up a page. The idea is to be able to compute a number of pages solely from the zip headers, without having to look inside (i.e. uncompress) the HMTL files, and consequently to be able to navigate to a certain page by opening solely the corresponding HTML file. Zip headers indicate the uncompressed size, so there is no reason to use the compressed size; in fact, once you go "inside" an HTML, you need uncompressed bytes. This approximation works very well; after all, two paper editions of the same text can have very different page count, so there is no single truth (if you have to make it; of course EPUB also supports explicit page numbers, to match a given paper edition). Even the byte vs. character is not much of a problem in practice (although you can fool that by using NCRs in UTF-32 to get 32 bytes per character: "€" 8 characters per NCR and 4 bytes per character!)

Kindle took a different and more complicated approach. In particular, it better accounts for images. Also, remember that the notion of page comes into play for paying authors, so it needs to be more accurate; and Amazon is practically between every author and every reader, so it can spend some time determining page numbers.
Some say it's 1024 compressed characters and you say 1024 uncompressed characters. Do you have a link to show if it's compressed or uncompressed? I cannot seem to find anything.
JSWolf is offline   Reply With Quote