Quote:
Originally Posted by hansel
Aren't there a LOT of those glyphs? How would such an IME work? Is there a table that translates such combinations to utf8? That would probably be easy to implement...
|
Roughly speaking, the number of Chinese glyphs may be over 60 thousand, but only 5000 glyphs are commonly used. But due to political and history factors, there are two (only two) main streams: traditional and simplified Chinese glyphs. The previous is used in Taiwan, Hongkong, and maybe Singapore, Makao. The later is used in Mainland China.
People have created various (over 10) IMEs to input Chinese text using the standard keyboard. And there are IMEs for mobile phone. There are even opensource IME on Linux: XCIM and GCIN. In fact, XCIM and GCIN are engines of IME, and each IME can be plugged into XCIM or GCIN by providing its translation table. I think MiniPad can imitate them. The only difference: MiniPad provides application level IME, but GCIN and XCIM provide system-wide IME.
Those translation tables may be not human readable. I will try looking for some csv format table. If you need any further explanation, just ask. I am glad to be of help.
Added:
Take the most common Zhuying IME used in Taiwan for example. The first attached image is a soft keyboard for Zhuying IME. Each key is representing a symbol of Zhuying (on the lower right part of each key). Type "J", then type "I", and then "3" will input a tradition Chinese glyph "我". This is only the case of Zhuying IME.
If using Chanjei IME, to input the same glyph, the combination is "H"+"Q"+"I". The second attached image is the soft keyboard for Changjei IME.
Added 2:
I just found that the translation table used by GCIN are all utf-8 format. It's easy to read. The attached "phone.zip" is the table of Zhuying IME used by GCIN. Each line is a translation of a glyph. The first part of a line is the combination of key(or keys) . The second part is the target glyph. There are two "space" between them. If you have Chinese font installed in your system, you can view the file without any problem.
When you read the file, the lines after "%keyname begin" are the buttons on keyboard and their respective basic symbols used by the IME, and the lines after "%chardef begin" are translation of each glyph.