MobileRead Forums - View Single Post

Nate the great · 05-30-2009, 02:47 PM

Quote:

Originally Posted by Peter Sorotokin

I see. So here is a minimalistic proposal based on the discussion so far:

1. Add metadata tags (exact tags TBD) indicating that the EPUB is a dictionary, optional "input" language (the langauage that the dictionary articles are in is indicated by dc:language element), optional reference to the index file and optional collation declaration that describes the order of terms in the dictionary.

2. Dictionary should be split in multiple sections. In addition, an index file can optionally be provided. Index file should have linear="no" attribute in the spine. If an index is provideed, it should be referenced by the metadata.

3. Each entry in the dictionary must be formatted using XHTML dl tag. The first dt tag inside dl is considered to be a primary term. Dictionary entries must go in the order specified by collation - both inside a single section and across all sections as they are referenced in the spine.

4. Index is an XHTML file (exact structure TBD) that lists the sections of the dictionary itself (as opposed to supplementary material) and only the first term for each section. That both allows for efficient search and does not bloat the index.

Peter

Why don't we try to limit this thread to just the discussion of the index?

1, yes.

2, I think an index should be required due to the need for a speedy lookup.

3, The dl tags seem to be duplicating what we are trying to do with the XML tags, and can't get achieve the specificity desired . Why use both?

4, Let me expand on what I wrote before.

A dictionary, for example, will have at a minimum title index(or something that will serve that purpose). It might also have one or more keyword indexes.

The title index will be in its own file that is separate from the the rest of the book as well as being separate from the other indexes. Each index will be in a separate file (or files) from the other indexes. If there is more than one type of keyword (example: "famous people" & "famous places"), each type of keyword will have its own index with its own files.

Here is where my explanation wasn't clear before. A keyword index, "famous people" for example, would be in the file "famous people_x.html". The entries would look like this:

Quote:

<a href="johnny appleseed.html">Johnny Appleseed</a><br />
<a href="Kevin Costner.html">Kevin Costner</a><br />
etc.

The file "johnny appleseed.html" would contain entries something like this:

Quote:

<a href="dictionary.html#d_somenumberX">an entry</a><br />
<a href="dictionary.html#d_somenumberY">another entry</a><br />

So a keyword index would actually consist of a group of files.