View Full Version : Can anybody tell me about dictionaries?


andym
09-26-2007, 03:32 AM
One of the things I like most mobipocket reader is the 'lookup' facility. However there are only a limited range range or prc format dictionaries - for example I haven't been able to find an Italian dictionary. So I'm wondering whther there is any way to convert a .Dict or other open source dictionary to mobipocket format.

Mobipocket have published a sample dictionary (uploaded - the mobi documentation is here (http://www.mobipocket.com/dev/article.asp?BaseFolder=prcgen&File=indexing.htm)). There is a sample xml file which includes a definition that looks like this

<word>
<orth>chair</orth>
<id>1</id>
<definition>a seat for one person, which has a back,
usually four legs and sometimes two arms</definition>
<gramgrp>noun</gramgrp>
<infl>chairs</infl>
<usage>standard</usage>
<etymology>from Latin "cathedra"</etymology>
</word>


and html where the definition looks like this:

<h2><idx:orth>chair</idx:orth></h2><br /><ul>
<idx:gramgrp value="noun"><li>grammatical group : noun</li><br /><li>inflexions : chairs</li><br />
</idx:gramgrp><li>definition : a seat for one person, which has a back, usually four legs and sometimes two arms</li><br />
<idx:string name="usage" value="standard" />
<li>usage : standard</li><br />
<mbp:pagebreak />
<idx:subentry name="etymology"><li>etymology : from Latin "cathedra"</li><br /></idx:subentry></ul><br /></idx:entry>

I've unpacked a Dict dictionary and I get this for an entry:


abandon
[?bćnd?n]
abdiquer
abandonner, délaisser, livrer, quitter
abandonner, renoncer, résigner


(OK I know I'm not caomparing like with like as a translation is not definition).

There's also an xsl stylesheet that includes this:


<xsl:template match="dictionary">
<html>
<body>


<mbp:pagebreak/>
<mbp:frameset>
<xsl:call-template name="bottom_frame"/>
<xsl:call-template name="top_frame" />


<mbp:pagebreak crossable="no"/>
<xsl:for-each select="word">
<idx:entry name="word" scriptable="yes">
<xsl:attribute name="id">
<xsl:value-of select="id" />
</xsl:attribute>
<h2><idx:orth><xsl:value-of select="orth"/> </idx:orth></h2>
<br/>
<ul>
<idx:gramgrp>
<xsl:attribute name="value">
<xsl:value-of select="gramgrp" />
</xsl:attribute>
<li>grammatical group : <xsl:value-of select="gramgrp" /></li><br/>
<xsl:attribute name="infl">
<xsl:value-of select="infl" />
</xsl:attribute>
<li>inflexions : <xsl:value-of select="infl" /></li><br/>
</idx:gramgrp>
<li>definition : <xsl:value-of select="definition"/></li><br/>
<idx:string name="usage">
<xsl:attribute name="value">
<xsl:value-of select="usage" />
</xsl:attribute>
</idx:string>
<li>usage : <xsl:value-of select="usage"/></li><br/>
<mbp:pagebreak/>
<idx:subentry name="etymology">
<li>etymology : <xsl:value-of select="etymology"/></li><br/>
</idx:subentry>
</ul>
<br/>
</idx:entry>
<mbp:pagebreak crossable="no"/>
</xsl:for-each>
</mbp:frameset>
<xsl:apply-templates select="word"/>
</body>
</html>
</xsl:template>


<xsl:template match="thesaurus">
<mbp:frameset>
<xsl:call-template name="top_frame_thesaurus" />
<idx:ext-subentry name="thesaurus">
<xsl:attribute name="extends">
<xsl:value-of select="../id" />
</xsl:attribute>
<xsl:value-of select="."/>
</idx:ext-subentry>
</mbp:frameset>
</xsl:template>

<xsl:template match="orth|id|infl|usage|gramgrp|etymology|definition"/>


But I don't know how you might get all of these bits to fit together.

Would the most sensible step be to take an IDXF (http://xdxf.revdanica.com/) format dictionary and convert to the xml used by Mobi? I downloaded a IDXF English-Spanish dictionary and a sample looks like this:

<ar><k>AIDS</k>
SIDA </ar>
<ar><k>ALGOL (abbr. for algorithmic language)</k>
ALGOL (abrev. inglesa de lenguaje basado en algoritmos) </ar>
<ar><k>AM (abbr. for amplitude modulation)</k>
AM (abrev. inglesa de modulación de amplitud) </ar>
<ar><k>AP (abbr. for array processor)</k>
procesador de matrices </ar>
<ar><k>AP (abbr. for automatic programming)</k>
programación automática </ar>
<ar><k>API (abbr. for application program interface)</k>
API (abrev. inglesa de interfaz para programas de aplicación) </ar>
<ar><k>APL (abbr. for A Programming Language)</k>
APL (lenguaje de programación -) </ar>
<ar><k>APPC (abbr. for advanced program-to-program communications)</k>
comunicaciones en condiciones de igualdad </ar>
<ar><k>APT (abbr. for automatic programming tools)</k>
herramientas de programación automática) </ar>


Which looks like it could be relatively easy to convert to using the xml tags used by Mobi.

Any thoughts, comments or suggestions on where to go next? I did find a thread where someone had posted a prc format dictionary so I'm wondering if anyone has been down this path before.