View Single Post
Old 08-26-2014, 04:29 PM   #948
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,654
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi lglgaigogo,

I really am not good at the dictionary stuff at all. I am not its original author. I did however make a change to mobi_dict.py to try and capture and read the ORDT table info. And although I really have no idea what the markup for dictionaries is supposed to look like, the idxrth info now looks like it is deciphering properly.

Will you please download the attached mobi_dict.py.zip, unzip it and use it to replace its namesake in KindleUnpack_v073/lib/.

Then try and unpack your dictionary and see if you can see any improvement and let me know what else if anything remains to be fixed.

Thanks,

KevinH

Quote:
Originally Posted by lglgaigogo View Post
Thank you for paying attention on my issue. I am now try to understand the non western character encoding pattern.
Thank you.

For now, I figure out:

1.Every character has 2 bytes index
2.For western letters it should be like 00 XX ,for example, 'a' is 00 03, 'b' is 00 64, and look up the table ORDT:
ORDT[3*2+1] is 'a'
ORDT[64*2+1] is 'b'
3.For non western letters, it should be like XX XX, for example, '潘' is 6F 58, and in python:
Code:
 print u"\u6F58" # is exactly the character '潘'
Attached Files
File Type: zip mobi_dict.py.zip (3.4 KB, 174 views)
KevinH is online now   Reply With Quote