View Single Post
Old 09-26-2007, 03:32 AM   #1
andym
Groupie
andym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-books
 
Posts: 189
Karma: 793
Join Date: Oct 2006
Can anybody tell me about dictionaries?

One of the things I like most mobipocket reader is the 'lookup' facility. However there are only a limited range range or prc format dictionaries - for example I haven't been able to find an Italian dictionary. So I'm wondering whther there is any way to convert a .Dict or other open source dictionary to mobipocket format.

Mobipocket have published a sample dictionary (uploaded - the mobi documentation is here). There is a sample xml file which includes a definition that looks like this

Code:
<word>
 <orth>chair</orth>
 <id>1</id>
 <definition>a seat for one person, which has a back, 
usually four legs and sometimes two arms</definition>
 <gramgrp>noun</gramgrp>
 <infl>chairs</infl>
 <usage>standard</usage>
 <etymology>from Latin "cathedra"</etymology>
</word>
and html where the definition looks like this:

Code:
<h2><idx:orth>chair</idx:orth></h2><br /><ul>
<idx:gramgrp value="noun"><li>grammatical group : noun</li><br /><li>inflexions :  chairs</li><br />
</idx:gramgrp><li>definition : a seat for one person, which has a back, usually four legs and sometimes two arms</li><br />
<idx:string name="usage" value="standard" />
<li>usage : standard</li><br />
<mbp:pagebreak />
<idx:subentry name="etymology"><li>etymology : from Latin "cathedra"</li><br /></idx:subentry></ul><br /></idx:entry>
I've unpacked a Dict dictionary and I get this for an entry:

Code:
abandon
	[?bćnd?n]
	abdiquer
	abandonner, délaisser, livrer, quitter
	abandonner, renoncer, résigner
(OK I know I'm not caomparing like with like as a translation is not definition).

There's also an xsl stylesheet that includes this:

Code:
 <xsl:template match="dictionary">
<html>
<body>


<mbp:pagebreak/>
<mbp:frameset>
		<xsl:call-template name="bottom_frame"/>
		<xsl:call-template name="top_frame" />
		

<mbp:pagebreak crossable="no"/>
<xsl:for-each select="word">
  <idx:entry name="word" scriptable="yes">
  	<xsl:attribute name="id">
	  <xsl:value-of select="id" />
	</xsl:attribute>
	<h2><idx:orth><xsl:value-of select="orth"/> </idx:orth></h2>
	<br/>
	<ul>
	<idx:gramgrp>
		<xsl:attribute name="value">
		  <xsl:value-of select="gramgrp" />
		</xsl:attribute>
		<li>grammatical group : <xsl:value-of select="gramgrp" /></li><br/>
		<xsl:attribute name="infl">
		  <xsl:value-of select="infl" />
		</xsl:attribute>
		<li>inflexions :  <xsl:value-of select="infl" /></li><br/>
	</idx:gramgrp>
	<li>definition : <xsl:value-of select="definition"/></li><br/>
   	<idx:string name="usage">
		<xsl:attribute name="value">
		  <xsl:value-of select="usage" />
		</xsl:attribute>
	</idx:string>
	<li>usage : <xsl:value-of select="usage"/></li><br/>
	<mbp:pagebreak/>
	<idx:subentry name="etymology">
	<li>etymology : <xsl:value-of select="etymology"/></li><br/>
   	</idx:subentry>
   	</ul>
    <br/>
   </idx:entry>
<mbp:pagebreak crossable="no"/>
</xsl:for-each>
</mbp:frameset>
<xsl:apply-templates select="word"/>
</body> 
</html> 
</xsl:template>


<xsl:template match="thesaurus">
<mbp:frameset>
	<xsl:call-template name="top_frame_thesaurus" />
	<idx:ext-subentry name="thesaurus">
	<xsl:attribute name="extends">
	<xsl:value-of select="../id" />
	</xsl:attribute>
	<xsl:value-of select="."/>
	</idx:ext-subentry>
</mbp:frameset>
</xsl:template>

<xsl:template match="orth|id|infl|usage|gramgrp|etymology|definition"/>
But I don't know how you might get all of these bits to fit together.

Would the most sensible step be to take an IDXF format dictionary and convert to the xml used by Mobi? I downloaded a IDXF English-Spanish dictionary and a sample looks like this:

Code:
<ar><k>AIDS</k>
SIDA </ar>
<ar><k>ALGOL (abbr. for algorithmic language)</k>
ALGOL (abrev. inglesa de lenguaje basado en algoritmos) </ar>
<ar><k>AM (abbr. for amplitude modulation)</k>
AM (abrev. inglesa de modulación de amplitud) </ar>
<ar><k>AP (abbr. for array processor)</k>
procesador de matrices </ar>
<ar><k>AP (abbr. for automatic programming)</k>
programación automática </ar>
<ar><k>API (abbr. for application program interface)</k>
API (abrev. inglesa de interfaz para programas de aplicación) </ar>
<ar><k>APL (abbr. for A Programming Language)</k>
APL (lenguaje de programación -) </ar>
<ar><k>APPC (abbr. for advanced program-to-program communications)</k>
comunicaciones en condiciones de igualdad </ar>
<ar><k>APT (abbr. for automatic programming tools)</k>
herramientas de programación automática) </ar>
Which looks like it could be relatively easy to convert to using the xml tags used by Mobi.

Any thoughts, comments or suggestions on where to go next? I did find a thread where someone had posted a prc format dictionary so I'm wondering if anyone has been down this path before.
Attached Files
File Type: zip dictionary.zip (12.6 KB, 628 views)

Last edited by andym; 09-26-2007 at 03:36 AM.
andym is offline   Reply With Quote