|
|
Thread Tools | Search this Thread |
10-08-2021, 07:57 AM | #1 |
Junior Member
Posts: 3
Karma: 10
Join Date: Sep 2020
Device: Kindle Touch 4
|
Help with dictionary lookup feature for non-latin scripts (cyrillic)
Hi all,
I'm working on a conversion a large monolingual Russian dictionary for use on my old Kindle Touch using the old Mobipocket Creator. So far the html is happy, conversion runs smoothly and everything is displaying well, however the index lookup function doesn't work, which obviously I'd like to fix so I can actually use the thing. Yes, there are many Russian dictionaries already out there for kindle, however they seem to almost universally lack detailed stress and inflection information (I am less interested in working inflection tags than actually having this information in a visible form). From looking through older posts on this forum I can see that this lack of lookup functionality used to be a big issue for non-latin languages, however I cannot find anything directly addressing my issue. I have actually been using several Russian dictionaries from the net without lookup problems (my KT runs the final version of the firmware available for that model). The searches seem to employ some sort of transliteration, does anyone know if this would be coming from the device firmware or the compiler (most dics I've seen are compiled using dsl2mobi)? Unpacking with kindleunpack has so far not given me any new clues as to what I might be missing if it's a compiler-side feature, there are certainly no transliterations in any of the entries. I can't work out why it works for some dictionaries and not for others. I'll share some of the tags I've used in case this is of relevance. I previously converted a monolingual Danish dictionary (using Kindle Previewer that time) with a fully working lookup. In that I used the following in the head section: Code:
<reference title="Look Up Word" type="Find" onclick="index_search('', 'Alphabetical lookup', '', 'none')"/> Code:
<idx:entry name="headword" scriptable="yes" spell="yes" id="1"> Code:
<reference title="Look Up Word" type="Find" onclick="()"/> [...] <idx:entry name="headword" scriptable="yes" id="1"> Cheers |
10-08-2021, 10:08 AM | #2 |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
If you upgrade your Kindle Touch 4 to at least a Paperwhite 2 and register it, you can download the following Russian dictionaries for free:
ABBYY Lingvo Большой Русско-Английский Словарь (Russian->English) ABBYY Lingvo Большой Англо-Русский Словарь (English->Russian) ABBYY Lingvo Большой Толковый Словарь Русского Языка (Russian<->Russian) If you really want to convert your file, have a look at the sample dictionary source file in the .zip file. (It only contains inflections for слово and книга.) To generate the dictionary, use the following command line: Code:
kindlegen.exe russian_dict.epub -dont_append_source Last edited by Doitsu; 10-08-2021 at 05:33 PM. |
Advert | |
|
10-08-2021, 05:44 PM | #3 |
Junior Member
Posts: 3
Karma: 10
Join Date: Sep 2020
Device: Kindle Touch 4
|
Thank you for your quick response, Doitsu. As I mentioned in my initial post, I was hoping to gain some insight into the problem itself.
Incidentally, I did actually buy a Paperwhite 4 at the start of the year, partly for the Russian dictionaries. Unfortunately these only indicate stress in the basic form (headword), as you can see. That is, they do not show the minimum information necessary to determine shifting stress or indeed ambiguous or unusual inflections for newly encountered words. It is true that nominal inflections are occasionally given in the illustrative examples however this is by no means systematic. Verbal inflections are rare in the examples and usually only give on of an aspectual pairs (which are grouped together for the most part). Of all the dictionaries I have tested, the otherwise very good Smirnitsky Ru-En dictionary showed most inflection information, but still no stress. One version I found online of the Ru-En Lingvo Universal dictionary showed decent stress information, but then only patchy inflection (obviously more targeted to Russian speakers learning English)... Unfortunately this renders these dictionaries less useful to my purpose than even Wiktionary (which in any case I do not have in a conversion-ready format for). Happily, I already found a file for the Малый академический словарь showing both pieces of information and covering a sufficiently wide range of vocabularly. It was a breeze to format that into conversion-ready html and everything works until the lookup problem on the KT (works fine on PW, see below why that doesn't help). To clarify, the lookup shows a list of words (in cyrillic as opposed to the expected latin transliteration), but they do not change with further input although the list does vary in relation to the initial input letter (but it is not an obvious relation). At the end of the day I actually vastly prefer the UI experience on the KT funnily enough (word lookup, highlighting, pop-up menus). Especially irritating is the PW lookup, which has a short delay after the initial keystroke in which the keyboard momentarily disappears, meaning a search "заниматься" will be entered "зниматься" or even "зиматься". If it wasn't for the higher DPI enabling me to now read fullscreen PDFs, I would have sold the device again and just kept the KT. As my KT does not handle PDFs well, it has become my dictionary device, which is why it also does not help me that my file works fine on my PW lookup. For the reasons outlined above and in my initial post, I'd like to get the KT lookup to work on my own dictionary. I was hoping to shed some light on why the lookup fails for my user generated dictionary when it clearly works for many other user generated dictionaries on the same device. I know it can be done, I'd like to understand how. I was curious if anyone had any ideas of a more elegant solution than my transliteration idea or if that really how it's done? Cheers, |
10-09-2021, 02:37 PM | #4 | |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Have you already looked at it and tested it? |
|
10-14-2021, 07:02 AM | #5 | |
Junior Member
Posts: 3
Karma: 10
Join Date: Sep 2020
Device: Kindle Touch 4
|
Quote:
Having more or less solved the problem now at this point, I'll summarise the findings in case it is of interest to anyone out there. The following approaches showed varying levels of success: • Compiling with only the cyrillic forms works great on PW with the Russian keyboard enabled, but obviously can't be accessed via keyboard lookup on stock KT; • Compiling with a fully transliterated index forms works on KT but then I can't use the more convenient cyrillic keyboard on PW - it's an edge case for me admittedly, but after all the effort, I want it to be as forward compatible as possible. • Finally, coming back to my idea of the dual from index (trying cyrillic headword with transliterations in inflection tags) – the index would just default to cyrillic even with a latin key input, I just couldn't get it working... well, until I decided to just throw in the towel and use the dsl2mobi script to see what it spat out, since I could see they were essentially going for the same approach (plus, it generates the inflection tags for free!). It turns out the opposite order – using transliterated headword – does work. And the key piece of the puzzle seems to be using the cyrillic headword as the unique entry id. Here is my original approach with dual scripts, but only the cyrillic index works: Code:
<?xml version="1.0" encoding="utf-8"?> <html xmlns:idx="www.mobipocket.com" xmlns:mbp="www.mobipocket.com" xmlns:xlink="http://www.w3.org/1999/xlink"> <head> <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" /> </head> <body> <mbp:frameset> <idx:entry name="headword" scriptable="yes" id="66154"> <idx:short><a id="66154"></a> <idx:orth value="сняться"> <idx:infl> <idx:iform value="snyatsya"> </idx:infl> </idx:orth> <b>сня́ться </b> <div>сниму́сь, сни́мешься; <i>прош.</i> сня́лся, -ла́сь, -ло́сь; <i>сов.</i> (<i>несов.</i> снима́ться). [...]</div> </idx:short> </idx:entry> <hr> </mbp:frameset> </body> </html> Code:
<?xml version="1.0" encoding="utf-8"?> <html xmlns:idx="www.mobipocket.com" xmlns:mbp="www.mobipocket.com" xmlns:xlink="http://www.w3.org/1999/xlink"> <head> <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" /> </head> <body> <mbp:frameset> <a name="#сняться"/> <idx:entry name="headword" scriptable="yes"> <idx:orth> <b>сня́ться</b> </idx:orth> <idx:orth value="snyatsya"/> <div>сниму́сь, сни́мешься; <i>прош.</i> сня́лся, -ла́сь, -ло́сь; <i>сов.</i> (<i>несов.</i> снима́ться). [...]</div> </idx:entry> <hr> </mbp:frameset> </body> </html> |
|
Advert | |
|
Tags |
cyrillic, dictionary, kindle, lookup |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Reference Whitaker's Words Latin Dictionary | pruss | Kindle Books | 5 | 01-26-2018 03:45 AM |
Feature request: smart dictionary lookup for French | holymadness | Marvin | 8 | 01-08-2015 06:50 PM |
902 latin dictionary on pb902 | teofrast | PocketBook | 14 | 02-27-2011 12:27 PM |
Any eReader with dictionary lookup feature? | bthoven | Which one should I buy? | 19 | 10-06-2009 02:37 PM |