MobileRead Forums - View Single Post

user_none · 02-19-2011, 12:09 PM

The non-technical answers: More accurate does a better job of keeping the distance between pages consistant with a paperback book.

The technical answer: The less accurate output pulls out the total uncompressed size of the HTML contained within the MOBI file. It then divides pages into blocks of 2300 bytes. This process is very fast because the uncompressed length is stored in the MOBI header and just needs to be read. The issue here is it counts non-visible markup and doesn't take into account short paragraphs. Every 2300 bytes is a page no mater how much if anything is visible. You can end up with pages that are longer or shorter than others.

The more accurate parser decompresses the entire MOBI documents HTML. It then parses the HTML and counts pages based on visible text. So pages end up being closer to a paperback book. The major issue with the more accurate parser is it has to decompress and parse. In certain types of MOBI books (PalmDoc compressed) this will about double the time transfer time. In other types (HUFF/CDIC compressed) it could take upwards of a few minutes per book.

02-19-2011, 12:09 PM	#2
user_none Sigil & calibre developer Posts: 2,488 Karma: 1063785 Join Date: Jan 2009 Location: Florida, USA Device: Nook STR	The non-technical answers: More accurate does a better job of keeping the distance between pages consistant with a paperback book. The technical answer: The less accurate output pulls out the total uncompressed size of the HTML contained within the MOBI file. It then divides pages into blocks of 2300 bytes. This process is very fast because the uncompressed length is stored in the MOBI header and just needs to be read. The issue here is it counts non-visible markup and doesn't take into account short paragraphs. Every 2300 bytes is a page no mater how much if anything is visible. You can end up with pages that are longer or shorter than others. The more accurate parser decompresses the entire MOBI documents HTML. It then parses the HTML and counts pages based on visible text. So pages end up being closer to a paperback book. The major issue with the more accurate parser is it has to decompress and parse. In certain types of MOBI books (PalmDoc compressed) this will about double the time transfer time. In other types (HUFF/CDIC compressed) it could take upwards of a few minutes per book.