Thanks for the helpful and informative reply KevinH.
I'm writing a Linux command line tool to create .apnx page number files. By examining the Calibre apnx.py source code I saw that to calculate the number of chars per page the Mobi text length header value is divided by the number of pages (from the print edition of the book). It won't map perfectly of course but that doesn't matter - the idea is to get a reasonable approximation.
I'm assuming that it does not matter if the mobi file is compressed or not, the mapping of page positions will still be valid. Otherwise the Calibre APNX file generator would not work. Or am I missing something? Anyway I'll be finding out when I test with both compressed and uncompressed mobi files (of the same book).
As far as I can tell from the Calibre header code... To find the start of the mobi header all I need do is to seek 78 bytes into the file and then read a 4 byte big endian - that value will be the offset from the beginning of the file to the start of the mobi file's header. By doing that, then skipping the next 4 bytes ('compression_type' and 'fill0' in your header code above), and then reading the next 4 bytes as a big endian, my code is getting the same 'text_length' values as DumpMobiHeader.py gives me for the 3 mobi files I tested so far. What I'd like to know is if seeking 78 bytes to get the big endian which is the offset to the start of the header will work with ALL mobi files or if there are variations? Do you have any idea about that?