MobileRead Forums - View Single Post

geek1011 · 09-24-2020, 12:33 PM

Quote:

Originally Posted by geek1011

Nevertheless, one unique (mis)use of the new prefix_exceptions trie I just came up with is for languages with too many prefixes for the zip (c.f. the complaints about Chinese dictionaries in the dictutil thread). I could use prefix_exceptions to map multiple words with different prefixes into a single file. I'll see what I can do with an implementation of this when I have more time.

I looked into this a bit more, and found that this won't be possible.

prefix_exceptions is somewhat of a misnomer, since it doesn't actually make exceptions for prefixes. Instead, it should be called word_redirects, since it just changes the word being looked up to another if it matches exactly. The target file must still have a variant/word matching the new one, and the original file won't be looked in at all.

This also means that there's already a bug in prefix_exceptions, albeit the inverse of the reason why prefix_exceptions was created. Previously, with v2 dictionaries, variants with a different headword prefix wouldn't be found (I worked around this in dictutil by duplicating the entries). Now, if you have a headword named after a redirected variant, it won't be found. For example, with the previous v2 behaviour, the entry for go/went would need to be duplicated into go.html and we.html, and you could also have another unique definition titled went in we.html. With the new v3 behaviour, you can just define it go/went in go.html and add a redirect entry like "went\tgo" to redirect it. But, this is where the new bug happens. Now, if you had a second entry in we.html named "went" (remember that Kobo dictionaries support multiple entries for a word), it won't be found since the words was redirected to "go". I can work around this bug by duplicating the headwords into the redirected files...which is just the counterpart to my previous workaround.

This bug could be fixed on Kobo's side by also looking up the original search term in addition to the redirected one.

Once Kobo releases the new dictionaries, I'm going to see if they have this bug (note: the current v2 dictionaries have the old bug with variant prefixes which are different than the headword prefix).

@davidfor, do you know anything about this behaviour/bug?

Edit: I just thought of a way to make the chinese dictionary hack possible even with the behaviour I discovered. I can hash each entry as a hidden variant, then add redirects for the real headwords to that hash (essentially limiting the number of prefixes to 2 ASCII alphanumeric characters).