View Single Post
Old 08-23-2012, 11:05 PM   #44
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,630
Karma: 5433388
Join Date: Nov 2009
Device: many
inclusion of pagemap

Hi,

Thanks for posting LOREM2.epub. I used it with Kindlegen 2.5 and found that the page map information from page-map.xml info is somehow encoded (into position or byte offset info) and included in *both* the Mobi6 Header and the Mobi8 header inside the mobi.

I had never actually seen that before. The SRCS offset and count were never typically set in the Mobi8 header. But that makes sense as the formats are different enough that the Mobi8 version would need different page map information.

Here is what the latest version of DumpMobiHeader_v010.py shows for the kindlegen generated mobi (note the Section Map at the end as well):

kbhend$ python DumpMobiHeader_v010.py LOREM2.mobi
DumpMobiHeader v010
LOREM2.mobi .MOBI


First Header Dump from Section 0
Header Version is: 0x6
Header start position is: 0x0
Header Length is: 0xf8
Field: compression_type Offset: 0x000 Width: 2 Value: 0x02
Field: fill0 Offset: 0x002 Width: 2 Value: 0x00
Field: text_length Offset: 0x004 Width: 4 Value: 0x1796
Field: text_records Offset: 0x008 Width: 2 Value: 0x02
Field: max_section_size Offset: 0x00a Width: 2 Value: 0x1000
Field: crypto_type Offset: 0x00c Width: 2 Value: 0x00
Field: fill1 Offset: 0x00e Width: 2 Value: 0x00
Field: magic Offset: 0x010 Width: 4 Value: MOBI
Field: header_length Offset: 0x014 Width: 4 Value: 0x00f8
Field: type Offset: 0x018 Width: 4 Value: 0x0002
Field: codepage Offset: 0x01c Width: 4 Value: 0xfde9
Field: unique_id Offset: 0x020 Width: 4 Value: 0xaa53c38e
Field: version Offset: 0x024 Width: 4 Value: 0x0006
Field: metaorthindex Offset: 0x028 Width: 4 Value: 0xffffffff
Field: metainflindex Offset: 0x02c Width: 4 Value: 0xffffffff
Field: index_names Offset: 0x030 Width: 4 Value: 0xffffffff
Field: index_keys Offset: 0x034 Width: 4 Value: 0xffffffff
Field: extra_index0 Offset: 0x038 Width: 4 Value: 0xffffffff
Field: extra_index1 Offset: 0x03c Width: 4 Value: 0xffffffff
Field: extra_index2 Offset: 0x040 Width: 4 Value: 0xffffffff
Field: extra_index3 Offset: 0x044 Width: 4 Value: 0xffffffff
Field: extra_index4 Offset: 0x048 Width: 4 Value: 0xffffffff
Field: extra_index5 Offset: 0x04c Width: 4 Value: 0xffffffff
Field: first_nontext Offset: 0x050 Width: 4 Value: 0x0003
Field: title_offset Offset: 0x054 Width: 4 Value: 0x0238
Field: title_length Offset: 0x058 Width: 4 Value: 0x000a
Field: language_code Offset: 0x05c Width: 4 Value: 0x0009
Field: dict_in_lang Offset: 0x060 Width: 4 Value: 0x0000
Field: dict_out_lang Offset: 0x064 Width: 4 Value: 0x0000
Field: min_version Offset: 0x068 Width: 4 Value: 0x0006
Field: first_resc_offset Offset: 0x06c Width: 4 Value: 0x0006
Field: huff_offset Offset: 0x070 Width: 4 Value: 0x0000
Field: huff_num Offset: 0x074 Width: 4 Value: 0x0000
Field: huff_tbl_offset Offset: 0x078 Width: 4 Value: 0x0000
Field: huff_tbl_len Offset: 0x07c Width: 4 Value: 0x0000
Field: exth_flags Offset: 0x080 Width: 4 Value: 0x0858
Field: fill3_a Offset: 0x084 Width: 4 Value: 0x0000
Field: fill3_b Offset: 0x088 Width: 4 Value: 0x0000
Field: fill3_c Offset: 0x08c Width: 4 Value: 0x0000
Field: fill3_d Offset: 0x090 Width: 4 Value: 0x0000
Field: fill3_e Offset: 0x094 Width: 4 Value: 0x0000
Field: fill3_f Offset: 0x098 Width: 4 Value: 0x0000
Field: fill3_g Offset: 0x09c Width: 4 Value: 0x0000
Field: fill3_h Offset: 0x0a0 Width: 4 Value: 0x0000
Field: drm_offset Offset: 0x0a8 Width: 4 Value: 0xffffffff
Field: drm_count Offset: 0x0ac Width: 4 Value: 0x0000
Field: drm_size Offset: 0x0b0 Width: 4 Value: 0x0000
Field: drm_flags Offset: 0x0b4 Width: 4 Value: 0x0000
Field: fill4_a Offset: 0x0b8 Width: 4 Value: 0x0000
Field: fill4_b Offset: 0x0bc Width: 4 Value: 0x0000
Field: first_content Offset: 0x0c0 Width: 2 Value: 0x01
Field: last_content Offset: 0x0c2 Width: 2 Value: 0x06
Field: unknown0 Offset: 0x0c4 Width: 4 Value: 0x0001
Field: fcis_offset Offset: 0x0c8 Width: 4 Value: 0x0008
Field: fcis_count Offset: 0x0cc Width: 4 Value: 0x0001
Field: flis_offset Offset: 0x0d0 Width: 4 Value: 0x0007
Field: flis_count Offset: 0x0d4 Width: 4 Value: 0x0001
Field: unknown1 Offset: 0x0d8 Width: 4 Value: 0x0000
Field: unknown2 Offset: 0x0dc Width: 4 Value: 0x0000
Field: srcs_offset Offset: 0x0e0 Width: 4 Value: 0x0009
Field: srcs_count Offset: 0x0e4 Width: 4 Value: 0x0002
Field: unknown3 Offset: 0x0e8 Width: 4 Value: 0xffffffff
Field: unknown4 Offset: 0x0ec Width: 4 Value: 0xffffffff
Field: fill5 Offset: 0x0f0 Width: 2 Value: 0x00
Field: traildata_flags Offset: 0x0f2 Width: 2 Value: 0x03
Field: ncx_index Offset: 0x0f4 Width: 4 Value: 0x0003
Field: unknown5 Offset: 0x0f8 Width: 4 Value: 0xffffffff
Field: unknown6 Offset: 0x0fc Width: 4 Value: 0xffffffff
Field: datp_offset Offset: 0x100 Width: 4 Value: 0xffffffff
Field: unknown7 Offset: 0x104 Width: 4 Value: 0xffffffff
Extra Region Length: 0x0
EXTH Region Length: 0x2130
EXTH MetaData

Key: "Published"
Value: "2012-08-20"

Key: "Creator"
Value: "E X Ample"

Key: "Subject"
Value: "Sample Text"

Key: "Description"
Value: "Sample Text"

Key: "Language_(524)"
Value: "en"

Key: "TextDirection"
Value: "horizontal-lr"

Key: "K8(129)_Masthead/Cover_Image"
Value: "kindle:embed:0001"

Key: "K8(131)_Unidentified_Count"
Value: 0x0000

Key: "StartOffset"
Value: 0x027b

Key: "Font Signature (hex)"
Value: 0x010000000000000000000000000000800000000000000000 0000000000000000bef4edec

Key: "Creator Software"
Value: 0x00ca

Key: "Creator Major Version"
Value: 0x0002

Key: "Creator Minor Version"
Value: 0x0005

Key: "Kindlegen_BuildRev_Number"
Value: "0626-3a91e28"

Key: "Creator Build Number"
Value: 0x0000

Key: "K8(125)_Count_of_Resources_Fonts_Images"
Value: 0x0001

Key: "K8(121)_Boundary_Section"
Value: 0x000c


Mobi Ebook uses the new dual mobi/KF8 file format

Second Header Dump from Section 12
Header Version is: 0x8
Header start position is: 0xc
Header Length is: 0xf8
Field: compression_type Offset: 0x000 Width: 2 Value: 0x02
Field: fill0 Offset: 0x002 Width: 2 Value: 0x00
Field: text_length Offset: 0x004 Width: 4 Value: 0x19df
Field: text_records Offset: 0x008 Width: 2 Value: 0x02
Field: max_section_size Offset: 0x00a Width: 2 Value: 0x1000
Field: crypto_type Offset: 0x00c Width: 2 Value: 0x00
Field: fill1 Offset: 0x00e Width: 2 Value: 0x00
Field: magic Offset: 0x010 Width: 4 Value: MOBI
Field: header_length Offset: 0x014 Width: 4 Value: 0x00f8
Field: type Offset: 0x018 Width: 4 Value: 0x0002
Field: codepage Offset: 0x01c Width: 4 Value: 0xfde9
Field: unique_id Offset: 0x020 Width: 4 Value: 0xaa53c38e
Field: version Offset: 0x024 Width: 4 Value: 0x0008
Field: metaorthindex Offset: 0x028 Width: 4 Value: 0x0004
Field: metainflindex Offset: 0x02c Width: 4 Value: 0xffffffff
Field: index_names Offset: 0x030 Width: 4 Value: 0xffffffff
Field: index_keys Offset: 0x034 Width: 4 Value: 0xffffffff
Field: extra_index0 Offset: 0x038 Width: 4 Value: 0xffffffff
Field: extra_index1 Offset: 0x03c Width: 4 Value: 0xffffffff
Field: extra_index2 Offset: 0x040 Width: 4 Value: 0xffffffff
Field: extra_index3 Offset: 0x044 Width: 4 Value: 0xffffffff
Field: extra_index4 Offset: 0x048 Width: 4 Value: 0xffffffff
Field: extra_index5 Offset: 0x04c Width: 4 Value: 0xffffffff
Field: first_nontext Offset: 0x050 Width: 4 Value: 0x0004
Field: title_offset Offset: 0x054 Width: 4 Value: 0x0238
Field: title_length Offset: 0x058 Width: 4 Value: 0x000a
Field: language_code Offset: 0x05c Width: 4 Value: 0x0009
Field: dict_in_lang Offset: 0x060 Width: 4 Value: 0x0000
Field: dict_out_lang Offset: 0x064 Width: 4 Value: 0x0000
Field: min_version Offset: 0x068 Width: 4 Value: 0x0008
Field: first_resc_offset Offset: 0x06c Width: 4 Value: 0x000f
Field: huff_offset Offset: 0x070 Width: 4 Value: 0x0000
Field: huff_num Offset: 0x074 Width: 4 Value: 0x0000
Field: huff_tbl_offset Offset: 0x078 Width: 4 Value: 0x0000
Field: huff_tbl_len Offset: 0x07c Width: 4 Value: 0x0000
Field: exth_flags Offset: 0x080 Width: 4 Value: 0x0058
Field: fill3_a Offset: 0x084 Width: 4 Value: 0x0000
Field: fill3_b Offset: 0x088 Width: 4 Value: 0x0000
Field: fill3_c Offset: 0x08c Width: 4 Value: 0x0000
Field: fill3_d Offset: 0x090 Width: 4 Value: 0x0000
Field: fill3_e Offset: 0x094 Width: 4 Value: 0x0000
Field: fill3_f Offset: 0x098 Width: 4 Value: 0x0000
Field: fill3_g Offset: 0x09c Width: 4 Value: 0x0000
Field: fill3_h Offset: 0x0a0 Width: 4 Value: 0x0000
Field: unknown0 Offset: 0x0a4 Width: 4 Value: 0xffffffff
Field: drm_offset Offset: 0x0a8 Width: 4 Value: 0xffffffff
Field: drm_count Offset: 0x0ac Width: 4 Value: 0x0000
Field: drm_size Offset: 0x0b0 Width: 4 Value: 0x0000
Field: drm_flags Offset: 0x0b4 Width: 4 Value: 0x0000
Field: fill4_a Offset: 0x0b8 Width: 4 Value: 0x0000
Field: fill4_b Offset: 0x0bc Width: 4 Value: 0x0000
Field: fdst_offset Offset: 0x0c0 Width: 4 Value: 0x1000e
Field: fdst_flow_count Offset: 0x0c4 Width: 4 Value: 0x0001
Field: fcis_offset Offset: 0x0c8 Width: 4 Value: 0x0010
Field: fcis_count Offset: 0x0cc Width: 4 Value: 0x0001
Field: flis_offset Offset: 0x0d0 Width: 4 Value: 0x000f
Field: flis_count Offset: 0x0d4 Width: 4 Value: 0x0001
Field: unknown1 Offset: 0x0d8 Width: 4 Value: 0x0000
Field: unknown2 Offset: 0x0dc Width: 4 Value: 0x0000
Field: srcs_offset Offset: 0x0e0 Width: 4 Value: 0x0011
Field: srcs_count Offset: 0x0e4 Width: 4 Value: 0x0001
Field: unknown3 Offset: 0x0e8 Width: 4 Value: 0xffffffff
Field: unknown4 Offset: 0x0ec Width: 4 Value: 0xffffffff
Field: fill5 Offset: 0x0f0 Width: 2 Value: 0x00
Field: traildata_flags Offset: 0x0f2 Width: 2 Value: 0x03
Field: ncx_index Offset: 0x0f4 Width: 4 Value: 0x000c
Field: fragment_index Offset: 0x0f8 Width: 4 Value: 0x0004
Field: skeleton_index Offset: 0x0fc Width: 4 Value: 0x0007
Field: datp_offset Offset: 0x100 Width: 4 Value: 0x0012
Field: guide_index Offset: 0x104 Width: 4 Value: 0x0009
Extra Region Length: 0x0
EXTH Region Length: 0x213c
EXTH MetaData

Key: "Published"
Value: "2012-08-20"

Key: "Creator"
Value: "E X Ample"

Key: "Subject"
Value: "Sample Text"

Key: "Description"
Value: "Sample Text"

Key: "Language_(524)"
Value: "en"

Key: "TextDirection"
Value: "horizontal-lr"

Key: "K8(129)_Masthead/Cover_Image"
Value: "kindle:embed:0001"

Key: "K8(131)_Unidentified_Count"
Value: 0x0000

Key: "StartOffset"
Value: 0x027b

Key: "StartOffset"
Value: 0x0314

Key: "Font Signature (hex)"
Value: 0x010000000000000000000000000000800000000000000000 0000000000000000bebcaff0

Key: "Creator Software"
Value: 0x00ca

Key: "Creator Major Version"
Value: 0x0002

Key: "Creator Minor Version"
Value: 0x0005

Key: "Kindlegen_BuildRev_Number"
Value: "0626-3a91e28"

Key: "Creator Build Number"
Value: 0x0000

Key: "K8(125)_Count_of_Resources_Fonts_Images"
Value: 0x0000

Map of Palm DB Sections
Dec - Hex : Description
---- - ---- -----------
0000 - 0000: HEADER 6
0001 - 0001: Text Record 0
0002 - 0002: Text Record 1
0003 - 0003: NCX Index 0
0004 - 0004: NCX Index 1
0005 - 0005: NCX Index CNX
0006 - 0006: RESC
0007 - 0007: FLIS
0008 - 0008: FCIS
0009 - 0009: Source Archive 0
0010 - 000a: Source Archive 1
0011 - 000b: BOUNDARY
0012 - 000c: HEADER 8
0013 - 000d: Text Record 0
0014 - 000e: Text Record 1
0015 - 000f: 0000
0016 - 0010: Fragment Index 0
0017 - 0011: Fragment Index 1
0018 - 0012: Fragment Index CNX
0019 - 0013: Skeleton Index 0
0020 - 0014: Skeleton Index_Index 1
0021 - 0015: Guide Index 0
0022 - 0016: Guide Index 1
0023 - 0017: Guide Index CNX
0024 - 0018: NCX Index 0
0025 - 0019: NCX Index 1
0026 - 001a: NCX Index CNX
0027 - 001b: FLIS
0028 - 001c: FCIS
0029 - 001d: Source Archive 0
0030 - 001e: DATP
0031 - 001f: EOF_RECORD


So this is interesting indeed. So we need to figure out how that page-map.xml tag entries are converted and stored in the new kindlegen page map that is stored in the PAGE sections in both the Mobi6 and Mobi8 parts of the .mobi file.

Once we add grok that we should be able to add support for unpacking that information using Mobi_Unpack and then figure out a way for Calibre to generate that page information as well for its joint kf8 and .azw3 files.

Thanks for pointing this out.

Kevin
KevinH is offline   Reply With Quote