View Single Post
Old 05-22-2009, 12:50 AM   #11
jgray
Fanatic
jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.
 
Posts: 548
Karma: 2928497
Join Date: Mar 2008
Device: Clara 2E & Sage
MS Reader dictionary format

Since MS Reader also does dictionaries (and nicely, too), I was curious what sort of markup Reader used. I downloaded the Dictionary Authoring Kit from here: http://www.microsoft.com/reader/deve...loads/dak.aspx

It is a self extracting archive, so I just unzipped it and in the Documentation folder, found "dak.chm". It seems that Microsoft uses a subset of TEI tags for Reader dictionaries. Very interesting that they used an existing standard.

I wonder if this existing method would be good to incorporate into epub, rather than reinventing the wheel? Of course, regardless of what method is used, we still need to wait until reading software supports dictionary lookup.

Here is a sample entry that I pasted from the "dak.chm" file:

Code:
Sample Dictionary Fragment
A typical EBDICT dictionary fragment might look like this:

<tei-ms:text>
 <tei-ms:body>
  <tei-ms:div0>
   <tei-ms:div1>
    <tei-ms:div2>
     <tei-ms:entry>
      <tei-ms:form>
       <tei-ms:orth>dictionary</tei-ms:orth>
       <tei-ms:syll>dic|tion|ar|y</tei-ms:syll>
      </tei-ms:form>
      <tei-ms:gramGrp><tei-ms:pos>n</tei-ms:pos></tei-ms:gramGrp>
      <tei-ms:sense n="1">
       <tei-ms:def>
       A reference book that contains words listed in alphabetical order and gives explanations of their meanings, often with additional information about grammar, pronunciation, and etymology.
       </tei-ms:def>
      </tei-ms:sense>
      <tei-ms:sense n="2">
       <tei-ms:def>
       A foreign-language reference book of words: a reference book that gives equivalents of words and phrases in two or more languages, often with translations from each language to the other in separate sections.
       </tei-ms:def>
       <tei-ms:eg>A Spanish-English dictionary</tei-ms:eg>
      </tei-ms:sense>
     </tei-ms:entry>
    </tei-ms:div2>
   </tei-ms:div1>
  </tei-ms:div0>
 </tei-ms:body>
</tei-ms:text> 

where:

<tei-ms:entry> delimits an entry 
<tei-ms:orth> gives the orthographic (written) form of the headword 
<tei-ms:syll> gives the syllabification 
<tei-ms:pos> specifies the part of speech (in this case, a noun) 
<tei-ms:sense> gives information about a particular sense of the word 
<tei-ms:def> gives the definition of the word in that sense 
<tei-ms:eg> gives an example of the usage of the word in that sense
I hope that the folks at IDPF are working on some type of dictionary format for epub. I commented last year over on Teleread that I thought dictionary support was something that was badly needed in epub.
jgray is offline   Reply With Quote