View Single Post
Old 09-01-2014, 06:42 AM   #960
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Hi KevinH,

Quote:
Originally Posted by KevinH View Post
Please verify that the code immediately after the piece we have been working on

tagMap = getTagMap(controlByteCount, tagTable, data, startPos+1+textLength, endPos)
You were correct, I messed up the indentation. After correcting it, the default French and English monolingual dictionaries also unpacked fine. Without any inflections, that is.

In case you want to have another look at dictionaries with multiple inflection groups, I've created another test dictionary that contains two entries with two inflection groups and two entries with one inflection group.

This test file decompiled fine, which surprised me a bit, since I had expected that I'd get an "Error: Dictionary contains multiple inflection index sections, which is not yet supported" for my test file." message.

I'm wondering what kind of dictionary syntax actually triggers this error message.

Since the OP that started this part of the thread reported issues with Asian characters, I also tested the updated mobi_dict.py version with a Japanese test dictionary.

Unfortunately, your updated version seems to have problems with non-Latin characters.

For example, the original entry definition was:

Code:
<idx:entry name="japanese" scriptable="yes">
	<idx:orth>猫
		<idx:infl>
			<idx:iform value="貓"/>
			<idx:iform value="ねこ"/>
			<idx:iform value="ネコ"/>
		</idx:infl>
	</idx:orth><br/>
	chat (m)
</idx:entry>
and the reverse-engineered version looked like this:

Code:
<idx:entry scriptable="yes">
	<idx:orth value="s+">
		<idx:infl>
		</idx:infl>
	</idx:orth>猫<br/> 
	chat (m) 
</idx:entry>
The relevant part of the error log was:

Spoiler:
Code:
Read dictionary index data
Parsing dictionary index data 3
ocnt 0, oentries 0, op1 0, op2 0, otagx 0
parsed INDX header:
len C0 nul1 0 type 1 gen 0 start D0 count 2 code FFFFFFFF lng FFFFFFFF total 0 ordt 0 ligt 0 nligt 0 nctoc 0
{'count': 2, 'nctoc': 0, 'code': 4294967295L, 'nul1': 0, 'len': 192, 'ligt': 0, 'start': 208, 'nligt': 0, 'ordt': 0, 'lng': 4294967295L, 'total': 0, 'type': 1, 'gen': 0} None None
0x03: s+ 01 e8 b2 93 04 e7 8c ab ç s
Error: Delete operation of inflection rule failed
0x03: s+ 01 e3 81 ad e3 81 93 04 e7 8c ab ç s
Error: Delete operation of inflection rule failed
0x03: s+ 01 e3 83 8d e3 82 b3 04 e7 8c ab ç s
Error: Delete operation of inflection rule failed
0x03: ŽÊ 01 e3 81 8f e3 82 8b e3 81 be 04 e8 bb 8a è Â
Error: Delete operation of inflection rule failed
0x03: ŽÊ 01 e3 82 af e3 83 ab e3 83 9e 04 e8 bb 8a è Â
Error: Delete operation of inflection rule failed


A similar problem also occurred with a Greek-English dictionary.

Kindleunpack reported: "Error: Dictionary contains multiple inflection index sections, which is not yet supported" and wrote garbage characters in idx:orth throughout the file. For example:
Code:
<idx:orth value="しえ-À-¹-¼-*-»-µ-¹-±">
Attached Files
File Type: zip SampleDict2.zip (164.6 KB, 176 views)

Last edited by Doitsu; 09-01-2014 at 06:45 AM.
Doitsu is offline   Reply With Quote