Originally Posted by tshering
Today I run into a problem. I made a French dictionary. To my disappointment, no word with an accented vowel in the first two letters did show up. The reason for this was evidently the encoding of the filenames inside the zip file. This was a surprise for me, since my Japanese dictionaries work perfectly, so why should not the French dictionary? Anyhow, I tried zipping the file under Ubuntu, rather than under Windows, and the dictionary was working fine.
The culprit is a strange default behavior of 7-zip under Windows (or maybe I should rather say the culprit was me not being aware of it). 7-zip encodes only those filenames in utf-8 that contain characters not supported by the local codepage. Therefore, one has to enforce utf-8 encoding, under Windows.
7z a -tzip -mcu=on dicthtml.zip *.html words