igorsk : thanks a lot for sharing what you discovered. I just took a quick look, tried dumping a pre-installed dictionary and a home made one. I saw the pre-installed dictionary is indeed encoded in unicode and the "home-made" with BBeB Dict Studio is encoded in Shift JIS (I thought BBeB Dict Studio managed Shift JIS encoding for the input CSV, but converted in UNICODE....but apparently that's not the case).
It seems you went far in understanding MSD format (how did you achieve that ?), so is your goal, as porkupan said to "only" correct the failing indexing of BBeB Dict Studio, or is it to make a complete MSD builder ?
OK, I'll take a deeper look at bbeb_dic.py now.
|