![]() |
#1 |
Member
![]() Posts: 10
Karma: 10
Join Date: Sep 2024
Device: Kindle Oasis
|
Kindlegen gets stuck on second xhtml for dictionary creation
As the title suggests, kindlegen gets stuck after parsing the first xhtml (I have 90 because the dictionary is very big). Here's the stuck terminal screen:
Code:
.\kindlegen.exe .\dictionary.opf -c0 -verbose -locale en -o "russian dict.mobi" ************************************************************* Amazon kindlegen(Windows) V2.9 build 1029-0897292 A command line e-book compiler Copyright Amazon.com and its Affiliates 2014 ************************************************************* Info:I9005:option: -c0: No compression Info:I9014:option: -verbose: Verbose output Info(prcgen):I1047: Added metadata dc:Title "Russian-English dictionary (Open Russian)" Info(prcgen):I1047: Added metadata BASICCode "REF008000" Info(prcgen):I1047: Added metadata dc:Subject "Dictionaries" Info(prcgen):I1002: Parsing files 0000091 Info(prcgen):I1003: Parsing file URL: dictionary_1.xhtml Warning(index build):W15008: language not supported. Using default phonetics for spellchecker: english. Warning(parser8):W26001: Index not supported for enhanced mobi. Info(prcgen):I1003: Parsing file URL: dictionary_2.xhtml If I try to run kindlegen with a single xhtml reference in the opf file, then it works and it doest create a mobi (I also attached it). But of course it's uncomplete. Unfortunately, if I try to create the dictionary from a complete hxtml file (125 MB), kindlegen just explodes, saying: Code:
************************************************************* A command line e-book compiler Copyright Amazon.com and its Affiliates 2014 ************************************************************* Info:I9005:option: -c0: No compression Info:I9014:option: -verbose: Verbose output Info(prcgen):I1047: Added metadata dc:Title "Russian-English dictionary (Open Russian)" Info(prcgen):I1047: Added metadata BASICCode "REF008000" Info(prcgen):I1047: Added metadata dc:Subject "Dictionaries" Info(prcgen):I1002: Parsing files 0000001 This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. What can I do to make kindlegen create the dictionary out of multiple xhtml files? Many thanks. |
![]() |
![]() |
![]() |
#2 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 49
Karma: 500000
Join Date: Oct 2011
Device: KINDLE 3
|
Kindlegen 2.9 is not stable (at least for me), please try again with Kindlegen 1.2.
|
![]() |
![]() |
![]() |
#3 |
Member
![]() Posts: 10
Karma: 10
Join Date: Sep 2024
Device: Kindle Oasis
|
|
![]() |
![]() |
![]() |
#4 | |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 49
Karma: 500000
Join Date: Oct 2011
Device: KINDLE 3
|
Quote:
kindlegen1.2 |
|
![]() |
![]() |
![]() |
#5 |
Member
![]() Posts: 10
Karma: 10
Join Date: Sep 2024
Device: Kindle Oasis
|
Omg, now not only it isn't stuck, it also works perfectly in kindle!!!! (I managed to create some dictionaries before but only some words were picked up, at random). Many thanks! I was a bit frustrated
![]() Could I use this same thread for another issue? For some reason, in my entries, when they are built into a .mobi, at the end of each entry appears the word for the next entry, which is a little bugging (see attached images) I checked and the words are indeed the ones that come afterwards in the xhtml files. For reference, that's how they look in the xhtml: Code:
<idx:entry name="russian" scriptable="yes" spell="yes"> <idx:orth>вре́мя<idx:infl><idx:iform value="время"></idx:iform><idx:iform value="времени"></idx:iform><idx:iform value="времени"></idx:iform><idx:iform value="время"></idx:iform><idx:iform value="временем"></idx:iform><idx:iform value="времени"></idx:iform><idx:iform value="времена"></idx:iform><idx:iform value="времён"></idx:iform><idx:iform value="временам"></idx:iform><idx:iform value="времена"></idx:iform><idx:iform value="временами"></idx:iform><idx:iform value="временах"></idx:iform></idx:infl></idx:orth> <p>Sg.: <span>вре́мя <span>(<i>nom</i>)</span></span>, <span>вре́мени <span>(<i>gen</i>)</span></span>, <span>вре́мени <span>(<i>dat</i>)</span></span>, <span>вре́мя <span>(<i>acc</i>)</span></span>, <span>вре́менем <span>(<i>inst</i>)</span></span>, <span>вре́мени <span>(<i>prep</i>)</span></span></p> <p>Pl.: <span>времена́ <span>(<i>nom</i>)</span></span>, <span>времён <span>(<i>gen</i>)</span></span>, <span>времена́м <span>(<i>dat</i>)</span></span>, <span>времена́ <span>(<i>acc</i>)</span></span>, <span>времена́ми <span>(<i>inst</i>)</span></span>, <span>времена́х <span>(<i>prep</i>)</span></span></p> <p>1. time, times, tense, weather</p> <p>(Stem: времен-)</p> <p>Usage: в то вре́мя - at that time\nво вре́мя войны́ - during the war + genitive\nмно́го вре́мени - much of the time\nза вре́мя - while + genitive\nсо временем: over time\nв настоящее время: at present\nв последнее время: lately</p> <p><i>Я не зна́ю, будет ли у меня вре́мя.</i> | I don't know if I'll have time.</p> <p><i>Вре́мя всегда можно найти́.</i> | One can always find time.</p> <p><i>Э́то займёт у меня слишком много вре́мени, чтобы объясни́ть, почему э́то не будет рабо́тать.</i> | It would take me too much time to explain to you why it's not going to work.</p> <p><i>Зима́ — моё люби́мое вре́мя года.</i> | Winter is the season I like best.</p> <p><i>У меня уйдёт слишком много вре́мени на объясне́ние, почему э́то не срабо́тает.</i> | It would take me too much time to explain to you why it's not going to work.</p> </idx:entry> <idx:entry name="russian" scriptable="yes" spell="yes"> <idx:orth>рука́<idx:infl><idx:iform value="рука"></idx:iform><idx:iform value="руки"></idx:iform><idx:iform value="руке"></idx:iform><idx:iform value="руку"></idx:iform><idx:iform value="рукой"></idx:iform><idx:iform value="руке"></idx:iform><idx:iform value="руки"></idx:iform><idx:iform value="рук"></idx:iform><idx:iform value="рукам"></idx:iform><idx:iform value="руки"></idx:iform><idx:iform value="руками"></idx:iform><idx:iform value="руках"></idx:iform></idx:infl></idx:orth> <p>Sg.: <span>рука́ <span>(<i>nom</i>)</span></span>, <span>руки́ <span>(<i>gen</i>)</span></span>, <span>руке́ <span>(<i>dat</i>)</span></span>, <span>ру́ку <span>(<i>acc</i>)</span></span>, <span>руко́й <span>(<i>inst</i>)</span></span>, <span>руке́ <span>(<i>prep</i>)</span></span></p> <p>Pl.: <span>ру́ки <span>(<i>nom</i>)</span></span>, <span>ру́к <span>(<i>gen</i>)</span></span>, <span>рука́м <span>(<i>dat</i>)</span></span>, <span>ру́ки <span>(<i>acc</i>)</span></span>, <span>рука́ми <span>(<i>inst</i>)</span></span>, <span>рука́х <span>(<i>prep</i>)</span></span></p> <p>1. hand</p> <p>2. arm</p> <p>3. handwriting</p> <p><i>Надо ду́мать, что чте́ние бы́ло одною из его боле́зненных привы́чек, так как он с одина́ковою жа́дностью набра́сывался на всё, что попадало ему под руки.</i> | It must be supposed that reading was one of his morbid habits, as he fell upon anything that came into his hands with equal avidity.</p> <p><i>Я обжёг ру́ку утюго́м.</i> | I burned my hand with an iron.</p> <p><i>Я пойма́л её за ру́ку.</i> | I caught her by the hand.</p> <p><i>Она никак не могла́ наложи́ть на себя руки.</i> | It is impossible that she should have killed herself.</p> <p><i>То, что в мое́й руке́,—э́то окаменевшая раку́шка.</i> | What I have in my hand is a fossil seashell.</p> </idx:entry> I truly have no words, thanks for allowing me not to ditch the work of a few hours and days. |
![]() |
![]() |
![]() |
#6 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 49
Karma: 500000
Join Date: Oct 2011
Device: KINDLE 3
|
1. Insert a
Code:
<hr/> 2. Set the attribute ‘value’ on each orth element. |
![]() |
![]() |
![]() |
#7 |
Member
![]() Posts: 10
Karma: 10
Join Date: Sep 2024
Device: Kindle Oasis
|
Many thanks, now that's solved.
Another issue, however, arose, so I will ask it if it's not too much of a hassle. It's about displaying all the entries of words that without diacritics look the same. See the images: in v5 only нёбо gets picked up, and in v4 only нёбо. The ideal would be that in v5 both get picked up and I can scroll them as it's usually possible when faced with multiple smilar words. (нёбо has also небо as an iform because in books that diacritic is often not written). The codes are as in the image. I don't get why it doesn't show the typical 1/2 and the arrows. |
![]() |
![]() |
![]() |
#8 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 49
Karma: 500000
Join Date: Oct 2011
Device: KINDLE 3
|
I'm wondering if a Kindle/Kindlegen treats 'ё' and 'è' as the letter 'e', and if that behavior would cause it to overwrite the second entry.
|
![]() |
![]() |
![]() |
#9 |
Member
![]() Posts: 10
Karma: 10
Join Date: Sep 2024
Device: Kindle Oasis
|
After some testing I found out the issue and I think I solved it.
The key thing is the orth value. If нёбо has <idx ![]() ![]() If нёбо has <idx ![]() ![]() If both have <idx ![]() So I think it may be safe to assume that values in orth tags are to be written without any diacritics. |
![]() |
![]() |
![]() |
#10 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,825
Karma: 104662271
Join Date: Apr 2011
Device: pb360
|
Quote:
|
|
![]() |
![]() |
![]() |
#11 | |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 49
Karma: 500000
Join Date: Oct 2011
Device: KINDLE 3
|
I found something informative in the Amazon docs.
https://kdp.amazon.com/en_US/help/to...HXJS944GL88DNV Quote:
|
|
![]() |
![]() |
![]() |
#12 | |
Member
![]() Posts: 10
Karma: 10
Join Date: Sep 2024
Device: Kindle Oasis
|
Quote:
After more testing I'm encountering more issues. Apparently, some inflections don't get picked up in some words, whereas an inflection with the same form of another word gets picked up. For example, the word капли can be both an inflection of ка́пнуть and ка́пля, but in my kindle only ка́пнуть gets picked up, and it makes no sense because in both cases it's just an iform, not even the orth. I tried deleting orth values but it change nothing. The order in the xhtml files can't explain it, as ка́пнуть (picked up) is 06 6696 (xhtml, line) and ка́пля (not picked up) is 01 7778. The same happens for example with есть. It's a form of быть and есть, and even removing orth values doesn't change only есть being picked up, and есть (picked up) comes later in the files, 00 955, whereas быть (not picked up) is 00 113. Guess I'm yet again stuck. |
|
![]() |
![]() |
![]() |
#13 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 49
Karma: 500000
Join Date: Oct 2011
Device: KINDLE 3
|
1. First, you need to check whether your requirement is actually feasible. The way to do this is by confirming whether Amazon’s official dictionary (Download in kindle store) supports this feature.
2. If Amazon’s official dictionary does support it, you can try using tools like mobiunpack to extract the dictionary and see how the official dictionary implements it. 3. If even Amazon’s official dictionary doesn’t have this feature, it means Kindle doesn’t support it. A possible workaround is to avoid using the inflection function and instead convert all inflected forms into orth entries. In other words, create one entry per inflection (copy the definition of the root word). Although this will make the source file much larger, since most of the content is repetitive, Kindle’s compression should handle it well, so the final MOBI file size likely won’t increase too much. |
![]() |
![]() |
![]() |
#14 |
Member
![]() Posts: 10
Karma: 10
Join Date: Sep 2024
Device: Kindle Oasis
|
Thanks for pointing out the possible route.
I tested the official dictionaries and the same happens. The dictionary only picks up the entries that have есть as their orth value (which are 3 in the RU-EN official dictionary), it ignores the entry for быть, which contains an iform value="есть". So it's safe to assume that kindle gives priority to orth values over iforms in the sense that if there's an orth value with the same value as an iform value, the entry for the iform value doesn't get picked up. I'll test option 3 when I have time, seems like the only possible workaround. Considering each entry has between 10 and 20 iforms, this will turn the 76 MB mobi intro a monster, so maybe it's not super feasible. |
![]() |
![]() |
![]() |
#15 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 49
Karma: 500000
Join Date: Oct 2011
Device: KINDLE 3
|
1. You should apply the “-c1” compression, with about 50% compression ratio and affortable time, or the “-c2” compression, with about 30% compression ratio but taking much longer.
2. Only the duplicate iform needs to be extracted separately as orth; there’s no need to extract all of them. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
kindlegen/mobi dictionary inflections limit? | Daltoonik | Kindle Formats | 4 | 05-21-2025 11:39 PM |
How long does it take kindlegen to generate a kindle dictionary? | njpig | Kindle Formats | 4 | 04-20-2022 07:43 PM |
KindleGen and XHTML/CSS2 | pdurrant | Amazon Kindle | 33 | 09-12-2010 01:50 PM |
Content Stuck on Dictionary | andavane | Amazon Kindle | 2 | 09-02-2010 03:30 PM |