Hi lglgaigogo,
I really am not good at the dictionary stuff at all. I am not its original author. I did however make a change to mobi_dict.py to try and capture and read the ORDT table info. And although I really have no idea what the markup for dictionaries is supposed to look like, the idx
rth info now looks like it is deciphering properly.
Will you please download the attached mobi_dict.py.zip, unzip it and use it to replace its namesake in KindleUnpack_v073/lib/.
Then try and unpack your dictionary and see if you can see any improvement and let me know what else if anything remains to be fixed.
Thanks,
KevinH
Quote:
Originally Posted by lglgaigogo
Thank you for paying attention on my issue. I am now try to understand the non western character encoding pattern.
Thank you.
For now, I figure out:
1.Every character has 2 bytes index
2.For western letters it should be like 00 XX ,for example, 'a' is 00 03, 'b' is 00 64, and look up the table ORDT:
ORDT[3*2+1] is 'a'
ORDT[64*2+1] is 'b'
3.For non western letters, it should be like XX XX, for example, '潘' is 6F 58, and in python:
Code:
print u"\u6F58" # is exactly the character '潘'
|