Kindlegen gets stuck on second xhtml for dictionary creation

Musrar · 09-17-2025, 05:39 PM

As the title suggests, kindlegen gets stuck after parsing the first xhtml (I have 90 because the dictionary is very big). Here's the stuck terminal screen:

Code:

.\kindlegen.exe .\dictionary.opf -c0 -verbose -locale en -o "russian dict.mobi"

*************************************************************
 Amazon kindlegen(Windows) V2.9 build 1029-0897292
 A command line e-book compiler
 Copyright Amazon.com and its Affiliates 2014
*************************************************************

Info:I9005:option: -c0: No compression
Info:I9014:option: -verbose: Verbose output
Info(prcgen):I1047: Added metadata dc:Title        "Russian-English dictionary (Open Russian)"
Info(prcgen):I1047: Added metadata BASICCode       "REF008000"
Info(prcgen):I1047: Added metadata dc:Subject      "Dictionaries"
Info(prcgen):I1002: Parsing files  0000091
Info(prcgen):I1003: Parsing file     URL: dictionary_1.xhtml
Warning(index build):W15008: language not supported. Using default phonetics for spellchecker: english.
Warning(parser8):W26001: Index not supported for enhanced mobi.
Info(prcgen):I1003: Parsing file     URL: dictionary_2.xhtml

It just doesn't progress any further. I attached the opf file, and some xhtml files in a zip.

If I try to run kindlegen with a single xhtml reference in the opf file, then it works and it doest create a mobi (I also attached it). But of course it's uncomplete.

Unfortunately, if I try to create the dictionary from a complete hxtml file (125 MB), kindlegen just explodes, saying:

Code:

*************************************************************
 A command line e-book compiler
 Copyright Amazon.com and its Affiliates 2014
*************************************************************

Info:I9005:option: -c0: No compression
Info:I9014:option: -verbose: Verbose output
Info(prcgen):I1047: Added metadata dc:Title        "Russian-English dictionary (Open Russian)"
Info(prcgen):I1047: Added metadata BASICCode       "REF008000"
Info(prcgen):I1047: Added metadata dc:Subject      "Dictionaries"
Info(prcgen):I1002: Parsing files  0000001

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

What can I do to make kindlegen create the dictionary out of multiple xhtml files?

Many thanks.

cdhigh · 09-18-2025, 09:19 AM

Kindlegen 2.9 is not stable (at least for me), please try again with Kindlegen 1.2.

Musrar · 09-18-2025, 09:21 AM

Quote:

Originally Posted by cdhigh

Kindlegen 2.9 is not stable (at least for me), please try again with Kindlegen 1.2.

Could you send me the file for kindlegen 1.2.? At the moment that's all I have available to use.

cdhigh · 09-18-2025, 11:03 AM

Quote:

Originally Posted by Musrar

Could you send me the file for kindlegen 1.2.? At the moment that's all I have available to use.

kindlegen1.2

Musrar · 09-18-2025, 11:48 AM

Quote:

Originally Posted by cdhigh

kindlegen1.2

Omg, now not only it isn't stuck, it also works perfectly in kindle!!!! (I managed to create some dictionaries before but only some words were picked up, at random). Many thanks! I was a bit frustrated

Could I use this same thread for another issue? For some reason, in my entries, when they are built into a .mobi, at the end of each entry appears the word for the next entry, which is a little bugging (see attached images)

I checked and the words are indeed the ones that come afterwards in the xhtml files. For reference, that's how they look in the xhtml:

Code:

    <idx:entry name="russian" scriptable="yes" spell="yes">
        <idx:orth>вре́мя<idx:infl><idx:iform value="время"></idx:iform><idx:iform value="времени"></idx:iform><idx:iform value="времени"></idx:iform><idx:iform value="время"></idx:iform><idx:iform value="временем"></idx:iform><idx:iform value="времени"></idx:iform><idx:iform value="времена"></idx:iform><idx:iform value="времён"></idx:iform><idx:iform value="временам"></idx:iform><idx:iform value="времена"></idx:iform><idx:iform value="временами"></idx:iform><idx:iform value="временах"></idx:iform></idx:infl></idx:orth>
        <p>Sg.: <span>вре́мя <span>(<i>nom</i>)</span></span>, <span>вре́мени <span>(<i>gen</i>)</span></span>, <span>вре́мени <span>(<i>dat</i>)</span></span>, <span>вре́мя <span>(<i>acc</i>)</span></span>, <span>вре́менем <span>(<i>inst</i>)</span></span>, <span>вре́мени <span>(<i>prep</i>)</span></span></p>
        <p>Pl.: <span>времена́ <span>(<i>nom</i>)</span></span>, <span>времён <span>(<i>gen</i>)</span></span>, <span>времена́м <span>(<i>dat</i>)</span></span>, <span>времена́ <span>(<i>acc</i>)</span></span>, <span>времена́ми <span>(<i>inst</i>)</span></span>, <span>времена́х <span>(<i>prep</i>)</span></span></p>
        <p>1. time, times, tense, weather</p>
        <p>(Stem: времен-)</p>
        <p>Usage: в то вре́мя - at that time\nво вре́мя войны́ - during the war + genitive\nмно́го вре́мени - much of the time\nза вре́мя - while + genitive\nсо временем: over time\nв настоящее время: at present\nв последнее время: lately</p>
        <p><i>Я не зна́ю, будет ли у меня вре́мя.</i> | I don't know if I'll have time.</p>
        <p><i>Вре́мя всегда можно найти́.</i> | One can always find time.</p>
        <p><i>Э́то займёт у меня слишком много вре́мени, чтобы объясни́ть, почему э́то не будет рабо́тать.</i> | It would take me too much time to explain to you why it's not going to work.</p>
        <p><i>Зима́ — моё люби́мое вре́мя года.</i> | Winter is the season I like best.</p>
        <p><i>У меня уйдёт слишком много вре́мени на объясне́ние, почему э́то не срабо́тает.</i> | It would take me too much time to explain to you why it's not going to work.</p>
      </idx:entry>
      <idx:entry name="russian" scriptable="yes" spell="yes">
        <idx:orth>рука́<idx:infl><idx:iform value="рука"></idx:iform><idx:iform value="руки"></idx:iform><idx:iform value="руке"></idx:iform><idx:iform value="руку"></idx:iform><idx:iform value="рукой"></idx:iform><idx:iform value="руке"></idx:iform><idx:iform value="руки"></idx:iform><idx:iform value="рук"></idx:iform><idx:iform value="рукам"></idx:iform><idx:iform value="руки"></idx:iform><idx:iform value="руками"></idx:iform><idx:iform value="руках"></idx:iform></idx:infl></idx:orth>
        <p>Sg.: <span>рука́ <span>(<i>nom</i>)</span></span>, <span>руки́ <span>(<i>gen</i>)</span></span>, <span>руке́ <span>(<i>dat</i>)</span></span>, <span>ру́ку <span>(<i>acc</i>)</span></span>, <span>руко́й <span>(<i>inst</i>)</span></span>, <span>руке́ <span>(<i>prep</i>)</span></span></p>
        <p>Pl.: <span>ру́ки <span>(<i>nom</i>)</span></span>, <span>ру́к <span>(<i>gen</i>)</span></span>, <span>рука́м <span>(<i>dat</i>)</span></span>, <span>ру́ки <span>(<i>acc</i>)</span></span>, <span>рука́ми <span>(<i>inst</i>)</span></span>, <span>рука́х <span>(<i>prep</i>)</span></span></p>
        <p>1. hand</p>
        <p>2. arm</p>
        <p>3. handwriting</p>
        <p><i>Надо ду́мать, что чте́ние бы́ло одною из его боле́зненных привы́чек, так как он с одина́ковою жа́дностью набра́сывался на всё, что попадало ему под руки.</i> | It must be supposed that reading was one of his morbid habits, as he fell upon anything that came into his hands with equal avidity.</p>
        <p><i>Я обжёг ру́ку утюго́м.</i> | I burned my hand with an iron.</p>
        <p><i>Я пойма́л её за ру́ку.</i> | I caught her by the hand.</p>
        <p><i>Она никак не могла́ наложи́ть на себя руки.</i> | It is impossible that she should have killed herself.</p>
        <p><i>То, что в мое́й руке́,—э́то окаменевшая раку́шка.</i> | What I have in my hand is a fossil seashell.</p>
      </idx:entry>

I also attached the terminal log for the successful mobi.

I truly have no words, thanks for allowing me not to ditch the work of a few hours and days.

cdhigh · 09-18-2025, 01:04 PM

1. Insert a

Code:

<hr/>

tag between each entry.
2. Set the attribute ‘value’ on each orth element.

Musrar · 09-18-2025, 02:05 PM

Many thanks, now that's solved.

Another issue, however, arose, so I will ask it if it's not too much of a hassle.

It's about displaying all the entries of words that without diacritics look the same. See the images: in v5 only нёбо gets picked up, and in v4 only нёбо. The ideal would be that in v5 both get picked up and I can scroll them as it's usually possible when faced with multiple smilar words. (нёбо has also небо as an iform because in books that diacritic is often not written).

The codes are as in the image.

I don't get why it doesn't show the typical 1/2 and the arrows.

cdhigh · 09-18-2025, 03:38 PM

I'm wondering if a Kindle/Kindlegen treats 'ё' and 'è' as the letter 'e', and if that behavior would cause it to overwrite the second entry.

Musrar · 09-18-2025, 04:32 PM

After some testing I found out the issue and I think I solved it.
The key thing is the orth value.
If нёбо has <idx

rth value="нёбо"> and небо has <idx

rth value="не́бо">, only нёбо gets picked up.
If нёбо has <idx

rth value="нёбо"> and небо has <idx

rth value="небо"> only небо gets picked up.
If both have <idx

rth value="небо">, both get picked up and display 1/2 with the arrows.
So I think it may be safe to assume that values in orth tags are to be written without any diacritics.

j.p.s · 09-18-2025, 05:50 PM

Quote:

Originally Posted by Musrar

After some testing I found out the issue and I think I solved it.
The key thing is the orth value.
If нёбо has <idx:orth value="нёбо"> and небо has <idx:orth value="не́бо">, only нёбо gets picked up.
If нёбо has <idx:orth value="нёбо"> and небо has <idx:orth value="небо"> only небо gets picked up.
If both have <idx:orth value="небо">, both get picked up and display 1/2 with the arrows.
So I think it may be safe to assume that values in orth tags are to be written without any diacritics.

It would have probably been better to tick the "Disable smilies in text" box below "Additional options".

cdhigh · 09-18-2025, 06:06 PM

I found something informative in the Amazon docs.

https://kdp.amazon.com/en_US/help/to...HXJS944GL88DNV

Quote:

Exact-match Parameter
By default, the Kindle device uses a fuzzy algorithm for matching diacritics during word lookup. Languages that use contrastive diacritics to distinguish between distinct word forms should use the exact="yes" attribute in the <idx:iform /> tag to force exact match of diacritics during lookup.

Musrar · 09-19-2025, 11:30 AM

Quote:

I found something informative in the Amazon docs.

Yeah, makes sense that it uses a fuzzy algorithm. That was solved by removing the ë in the orth value

After more testing I'm encountering more issues.

Apparently, some inflections don't get picked up in some words, whereas an inflection with the same form of another word gets picked up.

For example, the word капли can be both an inflection of ка́пнуть and ка́пля, but in my kindle only ка́пнуть gets picked up, and it makes no sense because in both cases it's just an iform, not even the orth. I tried deleting orth values but it change nothing.

The order in the xhtml files can't explain it, as ка́пнуть (picked up) is 06 6696 (xhtml, line) and ка́пля (not picked up) is 01 7778.

The same happens for example with есть. It's a form of быть and есть, and even removing orth values doesn't change only есть being picked up, and есть (picked up) comes later in the files, 00 955, whereas быть (not picked up) is 00 113.

Guess I'm yet again stuck.

cdhigh · 09-20-2025, 05:28 AM

1. First, you need to check whether your requirement is actually feasible. The way to do this is by confirming whether Amazon’s official dictionary (Download in kindle store) supports this feature.

2. If Amazon’s official dictionary does support it, you can try using tools like mobiunpack to extract the dictionary and see how the official dictionary implements it.

3. If even Amazon’s official dictionary doesn’t have this feature, it means Kindle doesn’t support it. A possible workaround is to avoid using the inflection function and instead convert all inflected forms into orth entries. In other words, create one entry per inflection (copy the definition of the root word). Although this will make the source file much larger, since most of the content is repetitive, Kindle’s compression should handle it well, so the final MOBI file size likely won’t increase too much.

Musrar · 09-20-2025, 07:06 PM

Thanks for pointing out the possible route.

I tested the official dictionaries and the same happens. The dictionary only picks up the entries that have есть as their orth value (which are 3 in the RU-EN official dictionary), it ignores the entry for быть, which contains an iform value="есть".

So it's safe to assume that kindle gives priority to orth values over iforms in the sense that if there's an orth value with the same value as an iform value, the entry for the iform value doesn't get picked up.

I'll test option 3 when I have time, seems like the only possible workaround. Considering each entry has between 10 and 20 iforms, this will turn the 76 MB mobi intro a monster, so maybe it's not super feasible.

cdhigh · 09-20-2025, 08:04 PM

1. You should apply the “-c1” compression, with about 50% compression ratio and affortable time, or the “-c2” compression, with about 30% compression ratio but taking much longer.
2. Only the duplicate iform needs to be extracted separately as orth; there’s no need to extract all of them.

09-18-2025, 01:04 PM	#6
cdhigh Enthusiast Posts: 49 Karma: 500000 Join Date: Oct 2011 Device: KINDLE 3	1. Insert a Code: <hr/> tag between each entry. 2. Set the attribute ‘value’ on each orth element. Attached Thumbnails

09-18-2025, 02:05 PM	#7
Musrar Member Posts: 10 Karma: 10 Join Date: Sep 2024 Device: Kindle Oasis	Many thanks, now that's solved. Another issue, however, arose, so I will ask it if it's not too much of a hassle. It's about displaying all the entries of words that without diacritics look the same. See the images: in v5 only нёбо gets picked up, and in v4 only нёбо. The ideal would be that in v5 both get picked up and I can scroll them as it's usually possible when faced with multiple smilar words. (нёбо has also небо as an iform because in books that diacritic is often not written). The codes are as in the image. I don't get why it doesn't show the typical 1/2 and the arrows. Attached Thumbnails

09-18-2025, 04:32 PM	#9
Musrar Member Posts: 10 Karma: 10 Join Date: Sep 2024 Device: Kindle Oasis	After some testing I found out the issue and I think I solved it. The key thing is the orth value. If нёбо has <idxrth value="нёбо"> and небо has <idxrth value="не́бо">, only нёбо gets picked up. If нёбо has <idxrth value="нёбо"> and небо has <idxrth value="небо"> only небо gets picked up. If both have <idxrth value="небо">, both get picked up and display 1/2 with the arrows. So I think it may be safe to assume that values in orth tags are to be written without any diacritics.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
kindlegen/mobi dictionary inflections limit?	Daltoonik	Kindle Formats	4	05-21-2025 11:39 PM
How long does it take kindlegen to generate a kindle dictionary?	njpig	Kindle Formats	4	04-20-2022 07:43 PM
KindleGen and XHTML/CSS2	pdurrant	Amazon Kindle	33	09-12-2010 01:50 PM
Content Stuck on Dictionary	andavane	Amazon Kindle	2	09-02-2010 03:30 PM

09-18-2025, 09:19 AM	#2
cdhigh Enthusiast Posts: 49 Karma: 500000 Join Date: Oct 2011 Device: KINDLE 3	Kindlegen 2.9 is not stable (at least for me), please try again with Kindlegen 1.2.

09-18-2025, 03:38 PM	#8
cdhigh Enthusiast Posts: 49 Karma: 500000 Join Date: Oct 2011 Device: KINDLE 3	I'm wondering if a Kindle/Kindlegen treats 'ё' and 'è' as the letter 'e', and if that behavior would cause it to overwrite the second entry.

09-20-2025, 05:28 AM	#13
cdhigh Enthusiast Posts: 49 Karma: 500000 Join Date: Oct 2011 Device: KINDLE 3	1. First, you need to check whether your requirement is actually feasible. The way to do this is by confirming whether Amazon’s official dictionary (Download in kindle store) supports this feature. 2. If Amazon’s official dictionary does support it, you can try using tools like mobiunpack to extract the dictionary and see how the official dictionary implements it. 3. If even Amazon’s official dictionary doesn’t have this feature, it means Kindle doesn’t support it. A possible workaround is to avoid using the inflection function and instead convert all inflected forms into orth entries. In other words, create one entry per inflection (copy the definition of the root word). Although this will make the source file much larger, since most of the content is repetitive, Kindle’s compression should handle it well, so the final MOBI file size likely won’t increase too much.

09-20-2025, 07:06 PM	#14
Musrar Member Posts: 10 Karma: 10 Join Date: Sep 2024 Device: Kindle Oasis	Thanks for pointing out the possible route. I tested the official dictionaries and the same happens. The dictionary only picks up the entries that have есть as their orth value (which are 3 in the RU-EN official dictionary), it ignores the entry for быть, which contains an iform value="есть". So it's safe to assume that kindle gives priority to orth values over iforms in the sense that if there's an orth value with the same value as an iform value, the entry for the iform value doesn't get picked up. I'll test option 3 when I have time, seems like the only possible workaround. Considering each entry has between 10 and 20 iforms, this will turn the 76 MB mobi intro a monster, so maybe it's not super feasible.

09-20-2025, 08:04 PM	#15
cdhigh Enthusiast Posts: 49 Karma: 500000 Join Date: Oct 2011 Device: KINDLE 3	1. You should apply the “-c1” compression, with about 50% compression ratio and affortable time, or the “-c2” compression, with about 30% compression ratio but taking much longer. 2. Only the duplicate iform needs to be extracted separately as orth; there’s no need to extract all of them.