12-06-2013, 10:55 AM | #106 |
Sigil Developer
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
|
More information on Decoding Page Info stored in PAGE sections in SRCS
Hi All,
I have spent time reading the calibre source on apnx and the pages in the wiki and from usernone's work, and what was discovered by dilo_sec in this thread here: https://www.mobileread.com/forums/sho...5&postcount=45 So I spent some more time and think I have reached the point of understanding the page-map information more fully especially when a document uses more than one page numbering scheme. Here is my analysis for the record in case anyone else is interested: Code:
Actual epub page-map.xml <page-map xmlns="http://www.idpf.org/2007/opf"> <page name="i" href="chapter_01.html#page_i"/> <page name="ii" href="chapter_01.html#page_ii"/> <page name="1" href="chapter_01.html#page_1"/> <page name="2" href="chapter_01.html#page_2"/> <page name="3" href="chapter_01.html#page_3"/> <page name="4" href="chapter_01.html#page_4"/> <page name="5" href="chapter_01.html#page_5"/> <page name="A-1" href="chapter_01.html#page_A1"/> <page name="A-2" href="chapter_01.html#page_A2"/> <page name="I-1" href="chapter_01.html#page_I1"/> </page-map> Kindlegen PAGE map info stored at the front of the SRCS section for both Mobi 7 and Mobi 8 parts. Below is the information from the Mobi 8 (KF8) PAGE information: PAGE^@^@^@^H^@^A^@^A^@^@^@*^@^@^@^^{ "fileRevisionId" : "1" } ^@^A^@n^@ ^@^P{ "description" : "PageMap from source by kindlegen", "pageMap" : "(1,r,1),(3,a,1),(8,c,A-1|A-2|I-1)" } ^C\236^F^L^H{ \257^L\304^Oi^Q\327^S]^V^_^X\201 Here is the Hex Representation of this Page info from the Mobi 8 part 87654321 0011 2233 4455 6677 8899 aabb ccdd eeff 0123456789abcdef 00000000: 5041 4745 0000 0008 0001 0001 0000 002a PAGE...........* 00000010: 0000 001e 7b0a 2020 2022 6669 6c65 5265 ....{. "fileRe 00000020: 7669 7369 6f6e 4964 2220 3a20 2231 220a visionId" : "1". 00000030: 7d0a 0001 006e 000a 0010 7b0a 2020 2022 }....n....{. " 00000040: 6465 7363 7269 7074 696f 6e22 203a 2022 description" : " 00000050: 5061 6765 4d61 7020 6672 6f6d 2073 6f75 PageMap from sou 00000060: 7263 6520 6279 206b 696e 646c 6567 656e rce by kindlegen 00000070: 222c 0a20 2020 2270 6167 654d 6170 2220 ",. "pageMap" 00000080: 3a20 2228 312c 722c 3129 2c28 332c 612c : "(1,r,1),(3,a, 00000090: 3129 2c28 382c 632c 412d 317c 412d 327c 1),(8,c,A-1|A-2| 000000a0: 492d 3129 220a 7d0a 039e 060c 087b 0aaf I-1)".}......{.. 000000b0: 0cc4 0f69 11d7 135d 161f 1881 ...i...].... Analysis --------- 00000000 - 0000000f Section header PAGE 00000010 - 00000011 0 00000012 - 00000013 30: Length of rev string in bytes (Big Endian Half Word) { "fileRevisionId" : "1" } 00000032 - 00000033 1: Always 1? 00000034 - 00000035 110: Length of PageMap in bytes (Big Endian Half Word) 00000036 - 00000037 10: Number of Page names (Big Endian Half Word) 00000038 - 00000039 16: Number of bits used in offsets to page href destination - typically this is 32 (0x20) but my example was small enough Kindlegen used only 16 bit offsets 0000003A - 000000A7 PageMap showing a tupple for each numbering scheme used in the document with the following format: (entry_number, numbering_scheme, values) where: - entry_number is which entry in page-map.xml (starting with 1) - numbering_scheme is c - character, r - roman, a - arabic - values is starting page number for "r" and "a" schemes otherwise it is a pipe-separated list "|" of page names { "description" : "PageMap from source by kindlegen", "pageMap" : "(1,r,1),(3,a,1),(8,c,A-1|A-2|I-1)" } 000000A8 - 000000BB Table of 16 bit offsets (see above for bit widths) into assembled text (Big Endian Half Words - 16 bits or Big Endian Words - 32bits) 0x039e - offset in bytes to page i anchor 0x060c - offset in bytes to page ii anchor 0x087b - offset in bytes to page 1 anchor 0x0aaf - offset in bytes to page 2 anchor 0x0cc4 - offset in bytes to page 3 anchor 0x0f69 - offset in bytes to page 4 anchor 0x11d7 - offset in bytes to page 5 anchor 0x135d - offset in bytes to page A-1 anchor 0x161f - offset in bytes to page A-2 anchor 0x1881 - offset in bytes to Page I-1 anchor More importantly, I think we could modify KindleUnpack to recreate the page-map.xml from Kindlegen generated joint mobis that have PAGE sections or alternatively possibly create the page-map.xml from the APNX file and AZW3 if someone had access to both. Would any of this functionality be of interest to anyone in Kindlestrip or KindleUnpack? Thanks, KevinH |
12-06-2013, 11:05 AM | #107 | |
Sigil Developer
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
|
Hi,
I am not quite sure what you mean by "HTML TOC". In Kindle Mobis the toc info is mapped into a set of toc Index records which are pointed to by the Mobi header offset fields as follows: From the DumpMobiHeader output for the KF8 (Mobi 8) part of a joint mobi (similar things can be done with the older Mobi 7 part) The field you want is: Field: ncx_index Offset: 0x0f4 Width: 4 Value: 0x000c Which says the NCX exists at an offset of 12 sections from the header section which for the Mobi 8 part in this case began at section 12, so the NCX section should be found at section 24. As you can see from the section map it exists there. 0012 - 000c: HEADER 8 0013 - 000d: Text Record 0 0014 - 000e: Text Record 1 0015 - 000f: 0000 0016 - 0010: Fragment Index 0 0017 - 0011: Fragment Index 1 0018 - 0012: Fragment Index CNX 0019 - 0013: Skeleton Index 0 0020 - 0014: Skeleton Index_Index 1 0021 - 0015: Guide Index 0 0022 - 0016: Guide Index 1 0023 - 0017: Guide Index CNX 0024 - 0018: NCX Index 0 0025 - 0019: NCX Index 1 0026 - 001a: NCX Index CNX That said, as far as I know, a pure HTML TOC if included is simply inlined into the text records stored inside the palm database and there is no way to tell if it exists without dumping the text information (ie. basically unpacking it using KindleUnpack) and looking in the text info near the beginning or end to see if an html code is used to provide table of contents info). Quote:
KevinH |
|
Advert | |
|
12-06-2013, 02:01 PM | #108 |
Groupie
Posts: 195
Karma: 42216
Join Date: Oct 2013
Location: Poland
Device: Kindles: KOA1, KV
|
Mobi can contains two distinct ToCs: NCX TOC and "HTML" TOC. Kindle Previewer calls the first one as "NCX" and the second one as "Table of Contents". Check an attachment.
I know how to find existence of "NCX" digging into header dump (grep "NCX Index") but I don't know how to find existence of "Table of Contents". As far as I understand your answer finding it require unpacking entire mobi and looking into content.opf for <reference type="toc" title="Table of Contents" href="example.html#filepos657133" />. Sad… I thought that it's possible by simple grep of header dump… I'm writing sort of bash quality mobi batch script that runs quickly through hundreds of mobis looking for missing covers, missing ncx, missing ToC which are critical important for good book experience so unpacking entire mobi slows down process dramatically. |
12-06-2013, 03:03 PM | #109 | |
Sigil Developer
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
|
Hi,
So your HTML toc is stored in a text section but it has a guide element with type toc to specify its location. As far as I know (so someone please correct me if I am wrong) in older Mobis the guide is stored inside the first text sections at the start. For newer KF8 there is actually a separate guide index section, but you would have to decode it to see if it contains an entry for toc and reading kindle index records is not trivial. Perhaps something might be encoded in the flags info in the header but I am not sure if any of those bits indicate the presence of an html toc or not. Maybe DiapDealer, pdurrant, or hitch may know the answer to that. I suppose at least for KF8 mobis a grep of the guide CNX records could be easily done to see if "toc" exists or not. For older Mobis extracting or decompressing the first text section nd grepping for guide and toc might work. Sorry I can't be more help. KevinH Quote:
|
|
12-06-2013, 03:38 PM | #110 |
Groupie
Posts: 195
Karma: 42216
Join Date: Oct 2013
Location: Poland
Device: Kindles: KOA1, KV
|
|
Advert | |
|
12-06-2013, 04:01 PM | #111 | |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Quote:
Dale |
|
12-07-2013, 11:52 AM | #112 |
Sigil Developer
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
|
|
12-07-2013, 02:21 PM | #113 | |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Quote:
Dale |
|
12-30-2013, 07:54 AM | #114 | |
Groupie
Posts: 195
Karma: 42216
Join Date: Oct 2013
Location: Poland
Device: Kindles: KOA1, KV
|
Quote:
|
|
12-31-2013, 04:12 PM | #115 |
Sigil Developer
Posts: 7,506
Karma: 5433350
Join Date: Nov 2009
Device: many
|
Hi,
No one has reported any bugs with this version, so please assume it is final. I will try to track down the Applescript container and get it to work with this version in the near future when I have more time available. KevinH |
03-26-2014, 12:57 PM | #116 | |
Enthusiast
Posts: 28
Karma: 12500
Join Date: Mar 2014
Device: Kindle Paperwhite gen 2
|
Quote:
Could you convert Kindlestrip 1.36 py file into Kindlestrip1.36.app for running stand-alone application for OSX like version 1.35? I really need that. How to convert py into app? Help me to build this. Thanks a lót |
|
03-26-2014, 01:38 PM | #117 |
KCC Co-Author
Posts: 845
Karma: 765434
Join Date: Mar 2013
Location: Poland
Device: Kindle Oasis 2
|
Just replace kindlestrip.py inside .app with new kindlestrip.py.
|
03-26-2014, 01:47 PM | #118 |
Enthusiast
Posts: 28
Karma: 12500
Join Date: Mar 2014
Device: Kindle Paperwhite gen 2
|
Thanks a lot! I got your point!
|
04-24-2014, 12:29 PM | #119 |
Grand Sorcerer
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
I've found out by chance that Amazon has added a new KindleGen parameter that will prevent KindleGen from adding the source files: -dont_append_source
I've only tested this with the latest Windows version (V2.9 build 0731-890adc2), though. When this parameter is specified, KindleGen will display the following message: Code:
Info:I9018:option: -donotaddsource: Source files will not be added |
04-24-2014, 12:36 PM | #120 |
KCC Co-Author
Posts: 845
Karma: 765434
Join Date: Mar 2013
Location: Poland
Device: Kindle Oasis 2
|
Nice find. Very nice find. Could you shed some light how you find it?
EDIT: Build 2.9 0523-9bd8a95 have it too. EDIT 2: Linux and OSX version have it too. EDIT 3: Ahh it is documented in Amazon Kindle Publishing Guidelines 2014.1.1. Last edited by AcidWeb; 04-24-2014 at 12:45 PM. |
Tags |
k5 tools, mobi2mobi |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Applescript Wrapper Application for Kindlegen | pdurrant | Kindle Formats | 50 | 02-18-2020 01:16 AM |
how to use python script with windows xp | tuufbiz1 | Other formats | 12 | 01-08-2011 08:22 AM |
How do I get a shortcut for a Python script onto the taskbar in W7? | Sydney's Mom | Workshop | 6 | 03-28-2010 08:11 PM |
Nedd a little help with a python script | gandor62 | Calibre | 1 | 08-07-2008 09:59 PM |
Python script to create collections | gwynevans | Sony Reader Dev Corner | 2 | 03-13-2008 12:29 PM |