View Full Version : Can anybody tell me about dictionaries?

09-26-2007, 03:32 AM
One of the things I like most mobipocket reader is the 'lookup' facility. However there are only a limited range range or prc format dictionaries - for example I haven't been able to find an Italian dictionary. So I'm wondering whther there is any way to convert a .Dict or other open source dictionary to mobipocket format.

Mobipocket have published a sample dictionary (uploaded - the mobi documentation is here ( There is a sample xml file which includes a definition that looks like this

<definition>a seat for one person, which has a back,
usually four legs and sometimes two arms</definition>
<etymology>from Latin "cathedra"</etymology>

and html where the definition looks like this:

<h2><idx:orth>chair</idx:orth></h2><br /><ul>
<idx:gramgrp value="noun"><li>grammatical group : noun</li><br /><li>inflexions : chairs</li><br />
</idx:gramgrp><li>definition : a seat for one person, which has a back, usually four legs and sometimes two arms</li><br />
<idx:string name="usage" value="standard" />
<li>usage : standard</li><br />
<mbp:pagebreak />
<idx:subentry name="etymology"><li>etymology : from Latin "cathedra"</li><br /></idx:subentry></ul><br /></idx:entry>

I've unpacked a Dict dictionary and I get this for an entry:

abandonner, délaisser, livrer, quitter
abandonner, renoncer, résigner

(OK I know I'm not caomparing like with like as a translation is not definition).

There's also an xsl stylesheet that includes this:

<xsl:template match="dictionary">

<xsl:call-template name="bottom_frame"/>
<xsl:call-template name="top_frame" />

<mbp:pagebreak crossable="no"/>
<xsl:for-each select="word">
<idx:entry name="word" scriptable="yes">
<xsl:attribute name="id">
<xsl:value-of select="id" />
<h2><idx:orth><xsl:value-of select="orth"/> </idx:orth></h2>
<xsl:attribute name="value">
<xsl:value-of select="gramgrp" />
<li>grammatical group : <xsl:value-of select="gramgrp" /></li><br/>
<xsl:attribute name="infl">
<xsl:value-of select="infl" />
<li>inflexions : <xsl:value-of select="infl" /></li><br/>
<li>definition : <xsl:value-of select="definition"/></li><br/>
<idx:string name="usage">
<xsl:attribute name="value">
<xsl:value-of select="usage" />
<li>usage : <xsl:value-of select="usage"/></li><br/>
<idx:subentry name="etymology">
<li>etymology : <xsl:value-of select="etymology"/></li><br/>
<mbp:pagebreak crossable="no"/>
<xsl:apply-templates select="word"/>

<xsl:template match="thesaurus">
<xsl:call-template name="top_frame_thesaurus" />
<idx:ext-subentry name="thesaurus">
<xsl:attribute name="extends">
<xsl:value-of select="../id" />
<xsl:value-of select="."/>

<xsl:template match="orth|id|infl|usage|gramgrp|etymology|definition"/>

But I don't know how you might get all of these bits to fit together.

Would the most sensible step be to take an IDXF ( format dictionary and convert to the xml used by Mobi? I downloaded a IDXF English-Spanish dictionary and a sample looks like this:

SIDA </ar>
<ar><k>ALGOL (abbr. for algorithmic language)</k>
ALGOL (abrev. inglesa de lenguaje basado en algoritmos) </ar>
<ar><k>AM (abbr. for amplitude modulation)</k>
AM (abrev. inglesa de modulación de amplitud) </ar>
<ar><k>AP (abbr. for array processor)</k>
procesador de matrices </ar>
<ar><k>AP (abbr. for automatic programming)</k>
programación automática </ar>
<ar><k>API (abbr. for application program interface)</k>
API (abrev. inglesa de interfaz para programas de aplicación) </ar>
<ar><k>APL (abbr. for A Programming Language)</k>
APL (lenguaje de programación -) </ar>
<ar><k>APPC (abbr. for advanced program-to-program communications)</k>
comunicaciones en condiciones de igualdad </ar>
<ar><k>APT (abbr. for automatic programming tools)</k>
herramientas de programación automática) </ar>

Which looks like it could be relatively easy to convert to using the xml tags used by Mobi.

Any thoughts, comments or suggestions on where to go next? I did find a thread where someone had posted a prc format dictionary so I'm wondering if anyone has been down this path before.