View Single Post
Old 08-03-2011, 11:21 PM   #2348
readingglasses
Zealot
readingglasses can eat soup with a fork.readingglasses can eat soup with a fork.readingglasses can eat soup with a fork.readingglasses can eat soup with a fork.readingglasses can eat soup with a fork.readingglasses can eat soup with a fork.readingglasses can eat soup with a fork.readingglasses can eat soup with a fork.readingglasses can eat soup with a fork.readingglasses can eat soup with a fork.readingglasses can eat soup with a fork.
 
Posts: 124
Karma: 9252
Join Date: Jul 2011
Device: (prospective) kobo touch
Quote:
Originally Posted by kartu View Post
readingglasses

Let's turn it into some plan for PRS+.
Drawing on the screen to find similar looking chars, while certainly possible, is highly unlikely to be implemented.

1) Keyboard + usual dictionary lookup - will certainly be implemented
2) Keyboard where multiple chars produce single one + dictionary lookup - isn't much more work to do.

Doesn't 2) also cover Chinese? Have in mind, that different lookup methods are actually different dictionaries. (by radical, by number of strokes etc)

PS
It's amazing how inefficient hieroglyphic system is, yet it is still used. Kudos to Koreans for dropping it.
Well, jumping from a request to devising a plan is kind of a leap for me. I'm currently "educating" myself on all the required skills/data. So I'll come up with some ideas but it might be a few more days.
"2" does cover Chinese and it covers at least a couple of different methods of input. But, as I'm sure you're aware, this isn't like mere combining of characters in latin character based multinational "postfix" or "prefix" notation where apostrophe + letter 'e' = acute accented e.
Again, I haven't looked at the code for the open source IME's to see an example of how they do it but there is sure to be an intermediate stage of set selection based on some kind of indexing. The input methods that currently exist just don't seem specific enough to be able to select an exact character, there always has to be a list. Even the phonetic tone notation isn't unique. There may be several characters with the pronunciation guo3 or any other sound.
So, you could have one massive set of characters with field entries (definition, pronunciation, component glyphs or radicals, even etymology of the character and anything else you can think of, applicable to any language, not just chinese). A font has the characters (w/o the definitions, obvsly.). But a "dictionary" would be defined by how that massive list is indexed or sliced up. So it isn't enough to have a dictionary for lookup, by itself. Another kind of dictionary, would be already running for the chinese input. It wouldn't have definitions but it would have, say, what kind of strokes in what position apply to some subset of the massive total list (the wubi method) or what component characters of a "chinese keyboard" might add up to a set of possible entries etc.

This could work with following multiple node "trees" of an index and set selection, leading down to a final (set of) characters.
Either as a fairly complex "table of contents" or "conditional statements" in xml or javascript (which seems to be? what the prs-plus uses)

... or, if the battery can provide the power, an sql daemon and database running. Either started up each time you are using the chinese dictionary (a slow start up, maybe have a option to start and stop it as the user wishes, not each time the chinese keyboard is invoked or the dictionary started) or running all the time (no idea if, once it's "loaded in memory" and passively waiting, it uses much memory). Also, if memory constraints/file sizes for apps are an issue, I have no idea which method is better, at this point.

Although chinese users of prs-plus (and enterprising chinese language learners) would be comfortable using such a keyboard and would likely use the entire device then as an electronic dictionary for any writing they encounter (in the street, at work, entertainment), I see little value in this method beyond being an exciting challenge to implement, if the dictionary is "meant especially for" documents on the device, where tap and automatic look up are far more convenient in the case of a touchscreen. But who says there should be limits on how a device is used?


Tap and automatic lookup could just order a set of , for example, unicode codes for each character by line number. And then a kind of process like:

go to {unicode code of tapped character == dictionary line number).

The displayed dictionary of definitions would still be either a multi level contents ebook file in that case.

Everything about an unordered (or mostly unordered) set of characters, rather than alphanumeric total ordering from alpha to omega, means that set indexing, in more than one way, must be dealt with, one way or another. We're way beyond the ascii alphanumeric dilemma above, obvsly.

A terribly inefficient thing from a programming point of view. And for writing by typing. But for readers who are good, the writing itself can be read much faster than an alphabet system. I assume, because I can only test it for the few characters I know.
The fancy writing is fun, too. And for "sloppy" cursive writing by hand (not neat printing), before computers, also faster than spelling out a word like an alphabet does.

This would be portable to japanese. And Japanese users would use it the same way. Even koreans could use it for their rarer encounters with chinese characters in older texts, names, the rare newspaper use, or a fancy sign.

It's funny to think that this might bring a japanese dictionary to a japanese product by sony. And brings up the confusing questions about open source hacking. But that's another discussion.

Last edited by readingglasses; 08-03-2011 at 11:31 PM.
readingglasses is offline   Reply With Quote