MobileRead Forums - View Single Post - KindleUnpack (MobiUnpack): Extracts text, images and metadata from Kindle/Mobi files

Doitsu · 08-30-2014, 07:00 PM

Hi KevinH,

Unfortunately, changing

Code:

if hordt2 is not None:

to

Code:

if hordt2 is not None and len(text) > 0:

didn't have the desired effect. The French dictionary still failed to unpack, but this time Kindleunpack displayed a longer error message:

Spoiler:

Code:

Error: Dictionary contains multiple inflection index sections, which is not yet supported
Parsing inflIndexData
ocnt 0, oentries 0, op1 0, op2 0, otagx 0
parsed INDX header:
len C0 nul1 2 type 1 gen 0 start EDA0 count 726 code FFFFFFFF lng FFFFFFFF total 0 ordt 0 ligt 0 nligt 0 nctoc 0
{'count': 1830, 'nctoc': 0, 'code': 4294967295L, 'nul1': 2, 'len': 192, 'ligt': 0, 'start': 60832, 'nligt': 0, 'ordt': 0, 'lng': 4294967295L, 'total': 0, 'type': 1, 'gen': 0} None None
inflectionTagTable: [(5, 1, 3, 0), (26, 1, 12, 0), (27, 1, 48, 0), (0, 0, 0, 1)]
Parsing metaOrthIndex
ocnt 1, oentries 177, op1 1124, op2 1308, otagx 192
parsed INDX header:
len C8 nul1 0 type 0 gen 2 start 3E4 count 3D code FDEA lng 40C total 21DCA ordt 0 ligt 0 nligt 0 nctoc 0
{'count': 61, 'nctoc': 0, 'code': 65002, 'nul1': 0, 'len': 200, 'ligt': 0, 'start': 996, 'nligt': 0, 'ordt': 0, 'lng': 1036, 'total': 138698, 'type': 0, 'gen': 2} (0, 0, 0, 21, 65, 74, 95, 7, 94, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 26, 29, 32, 37, 40, 43, 45, 48, 51, 54, 56, 57, 59, 68, 70, 73, 76, 82, 83, 85, 88, 90, 93, 21, 26, 29, 32, 37, 40, 43, 45, 48, 51, 54, 56, 57, 59, 65, 68, 70, 73, 74, 76, 82, 83, 85, 88, 90, 93, 21, 0, 82, 37, 37, 0, 65, 21, 37, 48, 48, 29, 65, 21, 21, 82, 65, 37, 21, 21, 48, 59, 37, 65, 0, 0, 90, 82) (0, 37, 95, 97, 111, 115, 12354, 32, 12353, 12355, 12356, 12357, 12358, 12359, 12360, 12361, 12362, 12363, 12364, 12365, 12366, 12367, 12368, 12369, 12370, 12371, 12372, 12373, 12374, 12375, 12376, 12377, 12378, 12379, 12380, 12381, 12382, 12383, 12384, 12385, 12386, 12387, 12388, 12389, 12390, 12391, 12392, 12393, 12394, 12395, 12396, 12397, 12398, 12399, 12400, 12401, 12402, 12403, 12404, 12405, 12406, 12407, 12408, 12409, 12410, 12411, 12412, 12413, 12414, 12415, 12416, 12417, 12418, 12419, 12420, 12421, 12422, 12423, 12424, 12425, 12426, 12427, 12428, 12429, 12430, 12431, 12432, 12433, 12434, 12435, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 112, 113, 114, 116, 117, 118, 119, 120, 121, 122, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 224, 45, 251, 233, 232, 39, 244, 226, 234, 239, 238, 231, 2, 4, 227, 252, 246, 235, 228, 225, 237, 241, 201, 243, 40, 41, 255, 249)
orthIndexCount is 61
orthTagTable: [(1, 1, 1, 0), (2, 1, 2, 0), (42, 1, 4, 0), (8, 2, 8, 0), (0, 0, 0, 1)]
Read dictionary index data
Parsing dictionary index data 12200
ocnt 0, oentries 0, op1 0, op2 0, otagx 0
parsed INDX header:
len C0 nul1 0 type 1 gen 0 start E82C count 9D8 code FFFFFFFF lng FFFFFFFF total 0 ordt 0 ligt 0 nligt 0 nctoc 0
{'count': 2520, 'nctoc': 0, 'code': 4294967295L, 'nul1': 0, 'len': 192, 'ligt': 0, 'start': 59436, 'nligt': 0, 'ordt': 0, 'lng': 4294967295L, 'total': 0, 'type': 1, 'gen': 0} None None
Error: unpack requires a string argument of length 0

Error: Unpacking Failed

(I've got similar longer messages for many of my other test files.)

I've created a small German-English proof-of-concept test dictionary based on the latest recommendations from the Kindle Publishing Guidelines for you that I compiled with both Mobigen and Kindlegen without error messages. Hopefully, this will make it easier for you to reverse-engineer the binary files, since you have access to the actual source files.
The .zip file contains the source files and 3 binaries each generated without compression (c0), with standard compression (c1) and maximum compression (c2).

BTW, I believe one reason why the Swedish dictionary decompiled fine, is because I used a redundant inflection syntax:

I used for example:

Code:

<idx:entry>
	<b><idx:orth>positiv
	<idx:infl><idx:iform value="positivs"/></idx:infl>
	<idx:infl><idx:iform value="positivet"/></idx:infl>
	<idx:infl><idx:iform value="positivets"/></idx:infl>
	<idx:infl><idx:iform value="positiven"/></idx:infl>
	<idx:infl><idx:iform value="positivens"/></idx:infl>
	<idx:infl><idx:iform value="positiv-"/></idx:infl>
	</idx:orth> </b> 
	<i>subst</i> <br/>
	absolute
</idx:entry>

instead of the syntax recommended in the guidelines:

Code:

<idx:entry>
	<b><idx:orth>positiv
	<idx:infl>
		<idx:iform value="positivs"/>
		<idx:iform value="positivet"/>
		<idx:iform value="positivets"/>
		<idx:iform value="positiven"/>
		<idx:iform value="positivens"/>
		<idx:iform value="positiv-"/>
	</idx:infl>
	</idx:orth> </b> 
	<i>subst</i> <br/>
	absolute
</idx:entry>