MobileRead Forums - View Single Post - KindleUnpack (MobiUnpack): Extracts text, images and metadata from Kindle/Mobi files

Doitsu · 09-01-2014, 06:42 AM

Hi KevinH,

Quote:

Originally Posted by KevinH

Please verify that the code immediately after the piece we have been working on

tagMap = getTagMap(controlByteCount, tagTable, data, startPos+1+textLength, endPos)

You were correct, I messed up the indentation. After correcting it, the default French and English monolingual dictionaries also unpacked fine. Without any inflections, that is.

In case you want to have another look at dictionaries with multiple inflection groups, I've created another test dictionary that contains two entries with two inflection groups and two entries with one inflection group.

This test file decompiled fine, which surprised me a bit, since I had expected that I'd get an "Error: Dictionary contains multiple inflection index sections, which is not yet supported" for my test file." message.

I'm wondering what kind of dictionary syntax actually triggers this error message.

Since the OP that started this part of the thread reported issues with Asian characters, I also tested the updated mobi_dict.py version with a Japanese test dictionary.

Unfortunately, your updated version seems to have problems with non-Latin characters.

For example, the original entry definition was:

Code:

<idx:entry name="japanese" scriptable="yes">
	<idx:orth>猫
		<idx:infl>
			<idx:iform value="貓"/>
			<idx:iform value="ねこ"/>
			<idx:iform value="ネコ"/>
		</idx:infl>
	</idx:orth><br/>
	chat (m)
</idx:entry>

and the reverse-engineered version looked like this:

Code:

<idx:entry scriptable="yes">
	<idx:orth value="s+">
		<idx:infl>
		</idx:infl>
	</idx:orth>猫<br/> 
	chat (m) 
</idx:entry>

The relevant part of the error log was:

Spoiler:

A similar problem also occurred with a Greek-English dictionary.

Kindleunpack reported: "Error: Dictionary contains multiple inflection index sections, which is not yet supported" and wrote garbage characters in idx:orth throughout the file. For example:

Code:

<idx:orth value="しえ-À-¹-¼-*-»-µ-¹-±">