Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 09-26-2007, 03:32 AM   #1
andym
Groupie
andym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-booksandym has learned how to read e-books
 
Posts: 189
Karma: 793
Join Date: Oct 2006
Can anybody tell me about dictionaries?

One of the things I like most mobipocket reader is the 'lookup' facility. However there are only a limited range range or prc format dictionaries - for example I haven't been able to find an Italian dictionary. So I'm wondering whther there is any way to convert a .Dict or other open source dictionary to mobipocket format.

Mobipocket have published a sample dictionary (uploaded - the mobi documentation is here). There is a sample xml file which includes a definition that looks like this

Code:
<word>
 <orth>chair</orth>
 <id>1</id>
 <definition>a seat for one person, which has a back, 
usually four legs and sometimes two arms</definition>
 <gramgrp>noun</gramgrp>
 <infl>chairs</infl>
 <usage>standard</usage>
 <etymology>from Latin "cathedra"</etymology>
</word>
and html where the definition looks like this:

Code:
<h2><idx:orth>chair</idx:orth></h2><br /><ul>
<idx:gramgrp value="noun"><li>grammatical group : noun</li><br /><li>inflexions :  chairs</li><br />
</idx:gramgrp><li>definition : a seat for one person, which has a back, usually four legs and sometimes two arms</li><br />
<idx:string name="usage" value="standard" />
<li>usage : standard</li><br />
<mbp:pagebreak />
<idx:subentry name="etymology"><li>etymology : from Latin "cathedra"</li><br /></idx:subentry></ul><br /></idx:entry>
I've unpacked a Dict dictionary and I get this for an entry:

Code:
abandon
	[?bænd?n]
	abdiquer
	abandonner, délaisser, livrer, quitter
	abandonner, renoncer, résigner
(OK I know I'm not caomparing like with like as a translation is not definition).

There's also an xsl stylesheet that includes this:

Code:
 <xsl:template match="dictionary">
<html>
<body>


<mbp:pagebreak/>
<mbp:frameset>
		<xsl:call-template name="bottom_frame"/>
		<xsl:call-template name="top_frame" />
		

<mbp:pagebreak crossable="no"/>
<xsl:for-each select="word">
  <idx:entry name="word" scriptable="yes">
  	<xsl:attribute name="id">
	  <xsl:value-of select="id" />
	</xsl:attribute>
	<h2><idx:orth><xsl:value-of select="orth"/> </idx:orth></h2>
	<br/>
	<ul>
	<idx:gramgrp>
		<xsl:attribute name="value">
		  <xsl:value-of select="gramgrp" />
		</xsl:attribute>
		<li>grammatical group : <xsl:value-of select="gramgrp" /></li><br/>
		<xsl:attribute name="infl">
		  <xsl:value-of select="infl" />
		</xsl:attribute>
		<li>inflexions :  <xsl:value-of select="infl" /></li><br/>
	</idx:gramgrp>
	<li>definition : <xsl:value-of select="definition"/></li><br/>
   	<idx:string name="usage">
		<xsl:attribute name="value">
		  <xsl:value-of select="usage" />
		</xsl:attribute>
	</idx:string>
	<li>usage : <xsl:value-of select="usage"/></li><br/>
	<mbp:pagebreak/>
	<idx:subentry name="etymology">
	<li>etymology : <xsl:value-of select="etymology"/></li><br/>
   	</idx:subentry>
   	</ul>
    <br/>
   </idx:entry>
<mbp:pagebreak crossable="no"/>
</xsl:for-each>
</mbp:frameset>
<xsl:apply-templates select="word"/>
</body> 
</html> 
</xsl:template>


<xsl:template match="thesaurus">
<mbp:frameset>
	<xsl:call-template name="top_frame_thesaurus" />
	<idx:ext-subentry name="thesaurus">
	<xsl:attribute name="extends">
	<xsl:value-of select="../id" />
	</xsl:attribute>
	<xsl:value-of select="."/>
	</idx:ext-subentry>
</mbp:frameset>
</xsl:template>

<xsl:template match="orth|id|infl|usage|gramgrp|etymology|definition"/>
But I don't know how you might get all of these bits to fit together.

Would the most sensible step be to take an IDXF format dictionary and convert to the xml used by Mobi? I downloaded a IDXF English-Spanish dictionary and a sample looks like this:

Code:
<ar><k>AIDS</k>
SIDA </ar>
<ar><k>ALGOL (abbr. for algorithmic language)</k>
ALGOL (abrev. inglesa de lenguaje basado en algoritmos) </ar>
<ar><k>AM (abbr. for amplitude modulation)</k>
AM (abrev. inglesa de modulación de amplitud) </ar>
<ar><k>AP (abbr. for array processor)</k>
procesador de matrices </ar>
<ar><k>AP (abbr. for automatic programming)</k>
programación automática </ar>
<ar><k>API (abbr. for application program interface)</k>
API (abrev. inglesa de interfaz para programas de aplicación) </ar>
<ar><k>APL (abbr. for A Programming Language)</k>
APL (lenguaje de programación -) </ar>
<ar><k>APPC (abbr. for advanced program-to-program communications)</k>
comunicaciones en condiciones de igualdad </ar>
<ar><k>APT (abbr. for automatic programming tools)</k>
herramientas de programación automática) </ar>
Which looks like it could be relatively easy to convert to using the xml tags used by Mobi.

Any thoughts, comments or suggestions on where to go next? I did find a thread where someone had posted a prc format dictionary so I'm wondering if anyone has been down this path before.
Attached Files
File Type: zip dictionary.zip (12.6 KB, 629 views)

Last edited by andym; 09-26-2007 at 03:36 AM.
andym is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
dictionaries shleepy Bookeen 29 12-14-2013 11:15 AM
kindle dictionaries... manumitir Amazon Kindle 15 10-31-2010 06:24 AM
iPad dictionaries z0rr0 Apple Devices 8 06-27-2010 12:58 PM
dictionaries and languages Jellby Kindle Formats 12 11-23-2008 06:02 PM
About dictionaries ddaneel Bookeen 8 07-12-2008 11:07 AM


All times are GMT -4. The time now is 01:39 PM.


MobileRead.com is a privately owned, operated and funded community.