Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Amazon Kindle > Kindle Developer's Corner

Notices

Reply
 
Thread Tools Search this Thread
Old 01-23-2013, 07:36 AM   #16
wakawaka
Junior Member
wakawaka knows the difference between 'who' and 'whom'wakawaka knows the difference between 'who' and 'whom'wakawaka knows the difference between 'who' and 'whom'wakawaka knows the difference between 'who' and 'whom'wakawaka knows the difference between 'who' and 'whom'wakawaka knows the difference between 'who' and 'whom'wakawaka knows the difference between 'who' and 'whom'wakawaka knows the difference between 'who' and 'whom'wakawaka knows the difference between 'who' and 'whom'wakawaka knows the difference between 'who' and 'whom'wakawaka knows the difference between 'who' and 'whom'
 
Posts: 2
Karma: 10214
Join Date: Jan 2013
Device: kindle
Quote:
Originally Posted by PoP View Post
I found it from a link posted here. I think that the dictionary has to be created with all the possible inflexions for the declensions and conjugations to be searchable. AFAIK the Kindle does not lookup "closest matches'... closest or partial match would certainly be useful... searching *all* dictionaries too... I'm afraid, it would require a rewrite of the Kindle framework.
That dictionary is perfect, just what I was looking for - thanks! Somehow it does work with various declensions/conjugations, though I'm not exactly sure how, it looks like there's only a single entry per word. Other Ru-En dictionaries I've found so far, for example http://www.the-ebook.org/forum/viewt...=483630#483630, haven't worked with declensions/conjugations, interested to figure out what the differences between the two are. Anyway, thanks again!
wakawaka is offline   Reply With Quote
Old 02-04-2013, 08:32 PM   #17
PoP
 curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ
PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.
 
PoP's Avatar
 
Posts: 3,002
Karma: 50506927
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3₃.₄.₃ PW3&4₅.₁₃.₃
Quote:
Originally Posted by wakawaka View Post
[snip]... Somehow it does work with various declensions/conjugations, though I'm not exactly sure how, it looks like there's only a single entry per word... [snip]... interested to figure out what the differences... [snip]
Humm, me too! Maybe a more knowledgeable dictionary developer could shed some light into this?
PoP is offline   Reply With Quote
Old 02-05-2013, 09:22 PM   #18
PoP
 curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ
PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.
 
PoP's Avatar
 
Posts: 3,002
Karma: 50506927
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3₃.₄.₃ PW3&4₅.₁₃.₃
perplexed

Quote:
Originally Posted by wakawaka View Post
[snip]Other Ru-En dictionaries I've found so far, for example http://www.the-ebook.org/forum/viewt...=483630#483630, haven't worked with declensions/conjugations [snip]
Agreed, the Smirnitsky dictionary seems to have a single entry per definition but still appears to resolve inflections... I couldn't download your previous problematic dictionary, the URL gives me
Click image for larger version

Name:	Clip1.jpg
Views:	655
Size:	12.5 KB
ID:	100763
and I can't test further. Any chance for another public or PM link?

Last edited by PoP; 02-06-2013 at 03:02 PM. Reason: mispelling... bad in a dictionary thread
PoP is offline   Reply With Quote
Old 02-06-2013, 03:01 PM   #19
PoP
 curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ
PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.
 
PoP's Avatar
 
Posts: 3,002
Karma: 50506927
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3₃.₄.₃ PW3&4₅.₁₃.₃
inflections

Read a bit more, posting status.

The AmazonKindlePublishingGuidelines.pdf available from the Kindle Publishing Programs describes in section 7 how to code inflections in dictionaries.

As I thought:
Spoiler:

7.3 Inflections for Dictionaries
When building dictionaries, you may have multiple inflected forms of a single root word that should access the same entry. However, adding all of these inflected forms under the orthography (pronunciation) of a single entry leads to the generation of a large index, which negatively affects performance and user experience. Kindle has a disinflection engine that uses a set of rules for disinflecting any given word to its headword. The index then has only the headword to look up.
To generate the set of disinflection rules for the dictionary, the input must include some information about the inflections. There are two ways to provide this information: simplified inflection syntax and advanced inflection syntax.
7.3.1 Advanced inflection syntax
Inflections are handled by the inflection index, which is built into the dictionary based on the inflected forms which are tagged in the content using the <idx:infl> tag. Inflections are attached to the orthography of the entry. They must be specified inside of an <idx: orth> tag. If an entry has multiple orthographies, each must have its own inflections.
Example:
Code:
<idx:orth>record 
   <idx:infl inflgrp="noun"> 
      <idx:iform name="plural" value="records" /> 
   </idx:infl> 
   <idx:infl inflgrp="verb"> 
      <idx:iform name="present participle" value="recording" /> 
      <idx:iform name="past participle" value="recorded" /> 
      <idx:iform name="present 3ps" value="records" /> 
   </idx:infl> 
</idx:orth>
The inflgrp and name attributes are optional. The idx:infl, idx:iform, and value attributes are mandatory.
7.3.2 Simplified inflection syntax
For English dictionaries, simplified inflection syntax is a very simple way of giving information about the inflections. Previous versions of the file format supported using the infl attribute in either the <idx: orth> or the <idx:gramgrp> tag and specifying a comma-separated list of inflected forms. This syntax is now deprecated, as it is not as accurate when disinflecting, particularly for non-English languages.

So it must be that the Smirnitsky dictionary has these defined. I am attempting to decompile so I can verify by inspecting the source .opf

So far, Calibre conversion .mobi to .htmlz shows single entries and Calibre conversion .mobi to .epub never completes

To be continued...

Last edited by PoP; 02-06-2013 at 04:43 PM. Reason: mispelling... bad in a dictionary thread
PoP is offline   Reply With Quote
Old 02-06-2013, 05:23 PM   #20
PoP
 curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ
PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.
 
PoP's Avatar
 
Posts: 3,002
Karma: 50506927
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3₃.₄.₃ PW3&4₅.₁₃.₃
...Continued

I used Kindle Mobi Unpack to successfully extract the source .html from the .mobi. Yay!

For the lampshade entry search
Click image for larger version

Name:	lampshade.jpg
Views:	624
Size:	69.8 KB
ID:	100804

Here is the extracted html. Please note the value= field
Click image for larger version

Name:	entry extracted.jpg
Views:	640
Size:	48.9 KB
ID:	100805

Since file is UTF-8 encoded, here are the escaped UNICODE values \u0430\u0431\u0430\u0436\u0443\u0440 for абажур:
Click image for larger version

Name:	extracted entry Unicode.jpg
Views:	616
Size:	53.1 KB
ID:	100808

According the Kindle Publishing Guidelines previous document, The value= is the hidden label to store in the index -- what the user enters in the search box to pop up the dictionary reference. Shown in hex, one sees that it matches the UNICODE :
Click image for larger version

Name:	extracted entry Hex.jpg
Views:	634
Size:	198.0 KB
ID:	100811

Humm, all entries in the dictionary are similar and I see no trace of html <idx:infl> inflection tags.

I am still puzzled

To be continued...
PoP is offline   Reply With Quote
Old 02-08-2013, 02:02 PM   #21
PoP
 curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ
PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.
 
PoP's Avatar
 
Posts: 3,002
Karma: 50506927
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3₃.₄.₃ PW3&4₅.₁₃.₃
Simple guess

... Continued

I Read some more on How to make dictionaries and indexes

For example, take the search for фунтов which pops up the фунт entry
Click image for larger version

Name:	search.gif
Views:	490
Size:	40.9 KB
ID:	100917

Following the logic of my previous post, фунтов which is in UNICODE \u0444\u0443\u043D\u0442\u043E\u0432 must have a <idx: orth value="04440443043D0442043E0432₁₆"> tag matching entry, and pointing at the book position where фунт is defined.

I checked : this is not the case. No value= hidden entry for it.

Opening the dictionary itself and searching for фунт shows the *index* of the dictionary:
Click image for larger version

Name:	index of dictionary.gif
Views:	860
Size:	13.7 KB
ID:	100918
and фунтов is not displayed.

So my reasonable (but unverified) guess at this point is that search code in the framework returns the entry that sorts immediately prior to the searched term, if there is no perfect match. As simple as that! This takes care of most inflections without the need of explicitely defining them (via <idx: orth value=...> or <idx:infl...> tags)

But that doesn't explain why wakawaka's other dictionary did not show inflections... I wish I could just inspect it.

Ah well, maybe should I try to decompile the java code from the framework... Looks like a daunting task with the obfuscation and my limited java background.

To be continued...
PoP is offline   Reply With Quote
Old 02-11-2013, 12:37 AM   #22
The_Dew2000
Junior Member
The_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheese
 
Posts: 3
Karma: 1000
Join Date: Feb 2013
Device: Kindle 3
Quote:
Originally Posted by PoP View Post
2.2 Mobipocket "Creator software uses a powerful algorithm to build the inflection index ... : inflections are not stored as entries in the index, but are deduced from a set of rules, which are automatically generated based on the inflected forms contained in the publication."

Perhaps this is why there is "no trace of html <idx:infl> inflection tags"?

Quote:
Originally Posted by PoP View Post
Opening the dictionary itself and searching for фунт shows the *index* of the dictionary:
Attachment 100918
and фунтов is not displayed.
Perhaps the index sorts by latin characters, partially supporting non-latin characters? It also seems that inflections are not present in the index.
The_Dew2000 is offline   Reply With Quote
Old 02-11-2013, 06:37 AM   #23
baf
Evangelist
baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.
 
Posts: 404
Karma: 2200000
Join Date: May 2012
Device: kt
Quote:
Originally Posted by PoP View Post
Humm, all entries in the dictionary are similar and I see no trace of html <idx:infl> inflection tags.

I am still puzzled
Kindle Unpack may not unpack dictionaries correctly. See this post.
On the other hand I can confirm that using orth and infl tags works ok, as I built such a dictionary.
baf is offline   Reply With Quote
Old 02-12-2013, 06:38 AM   #24
PoP
 curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ
PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.
 
PoP's Avatar
 
Posts: 3,002
Karma: 50506927
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3₃.₄.₃ PW3&4₅.₁₃.₃
Quote:
Originally Posted by The_Dew2000 View Post
2.2 Mobipocket "Creator software uses a powerful algorithm to build the inflection index ... : inflections are not stored as entries in the index, but are deduced from a set of rules, which are automatically generated based on the inflected forms contained in the publication."

Perhaps this is why there is "no trace of html <idx:infl> inflection tags"?


Perhaps the index sorts by latin characters, partially supporting non-latin characters? It also seems that inflections are not present in the index.
I Agree with you The_Dew2000, must be a set of rules, since the dictionary has no inflection tags.

Quote:
Originally Posted by baf View Post
Kindle Unpack may not unpack dictionaries correctly. See this post.
On the other hand I can confirm that using orth and infl tags works ok, as I built such a dictionary.
Good to know Baf, but when I unpacked there was no error message. So unless Unpack did not process inflections at all and decided to remain silent about it, I conclude none are present.

Thanks to you both for helping.
PoP is offline   Reply With Quote
Old 02-12-2013, 02:30 PM   #25
The_Dew2000
Junior Member
The_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheese
 
Posts: 3
Karma: 1000
Join Date: Feb 2013
Device: Kindle 3
Quote:
Originally Posted by PoP View Post
... search code in the framework returns the entry that sorts immediately prior to the searched term, if there is no perfect match ...
Only some searches for non perfect matches work for me. The pertinent entry is in some cases omitted from the results list.

Have you searched for many non perfect terms starting with different chatterers, PoP? Or perhaps I am doing something else wrong.
The_Dew2000 is offline   Reply With Quote
Old 02-12-2013, 04:36 PM   #26
PoP
 curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ
PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.
 
PoP's Avatar
 
Posts: 3,002
Karma: 50506927
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3₃.₄.₃ PW3&4₅.₁₃.₃
After wakawaka's <<That dictionary is perfect>> statement, I only tried a very limited set of russian words that "seemed" to work. But I don't speak russian and I can't realy tell.

Maybe could you give some search examples of what works and what should have worked but did not?
PoP is offline   Reply With Quote
Old 02-12-2013, 08:15 PM   #27
The_Dew2000
Junior Member
The_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheeseThe_Dew2000 can extract oil from cheese
 
Posts: 3
Karma: 1000
Join Date: Feb 2013
Device: Kindle 3
It seems my problem cases are explained by my searching for an inflection, and an improperly accented term.
The_Dew2000 is offline   Reply With Quote
Old 02-13-2013, 03:49 AM   #28
baf
Evangelist
baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.
 
Posts: 404
Karma: 2200000
Join Date: May 2012
Device: kt
Quote:
Originally Posted by PoP View Post
Good to know Baf, but when I unpacked there was no error message. So unless Unpack did not process inflections at all and decided to remain silent about it, I conclude none are present.
When I unpack Smirnitsky dictionary with KindleUnpack I got this error message. Maybe you use different tool.
baf is offline   Reply With Quote
Old 02-13-2013, 07:03 AM   #29
PoP
 curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ
PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.
 
PoP's Avatar
 
Posts: 3,002
Karma: 50506927
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3₃.₄.₃ PW3&4₅.₁₃.₃
Good point. I used the more recent the Kindle Unpack Calibre plugin. Thanks for reporting. Now, that I don't trust it for mobi dictionaries
Spoiler:
Code:
H:\>mobi_unpack.py russian.mobi
MobiUnpack 0.34
  Copyright (c) 2009 Charles M. Hannum <root@ihack.net>
  With Additions by P. Durrant, K. Hendricks, S. Siebert, fandrieu and DiapDealer.
Unpacking Book...
Palm DB type:  BOOKMOBI
Mobi Version:  7
Codec:  utf-8
Title:  Smirnitsky (Ru-En)
Palmdoc compression

Processing Mobi format Ebook ...
Unpack raw markup language
Unpacking images, resources, fonts, etc
    extracting image:  image00001.gif
    extracting image:  image00002.jpeg
Warning: Section 7741 contains no image or an unknown image format
Info: Document contains orthographic index, handle as dictionary
Error: Dictionary contains multiple inflection index sections, which is not yet supported
orthIndexCount is 46
Read dictionary index data
Find link anchors
Insert data into html
Insert hrefs into html
Remove empty anchors from html
Insert image references into html
Write opf
Error: Cover Thumbnail image 2 was not recognized as a valid image
Completed

H:\
I shall resume my hunt for another decompilation tool. Any suggestion?

Last edited by PoP; 02-13-2013 at 07:26 AM.
PoP is offline   Reply With Quote
Old 02-13-2013, 08:21 AM   #30
baf
Evangelist
baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.baf ought to be getting tired of karma fortunes by now.
 
Posts: 404
Karma: 2200000
Join Date: May 2012
Device: kt
Quote:
Originally Posted by PoP View Post
I shall resume my hunt for another decompilation tool. Any suggestion?
I didn't find any tool able to decompile this kind of index.
Why do you want to unpack it?
baf is offline   Reply With Quote
Reply

Tags
cyrillic, dictionary, kindle 4.1.0, russian

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Hacks Russian-English dictionary llark Amazon Kindle 40 10-19-2013 09:44 AM
PRS-T2 Russian Dictionary johanp Sony Reader 1 12-26-2012 05:23 AM
[Kindle Touch] Russian dictionary? tomsem Kindle Developer's Corner 3 04-21-2012 08:11 AM
Can I get a Russian dictionary on a Kindle DXG with a font hack? QU2C371FcY Amazon Kindle 2 04-17-2012 05:56 PM
looking for a English/Russian dictionary kaas Reading Recommendations 16 12-10-2010 03:13 AM


All times are GMT -4. The time now is 12:17 AM.


MobileRead.com is a privately owned, operated and funded community.