Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 05-29-2009, 03:20 PM   #16
Nate the great
Sir Penguin of Edinburgh
Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.
 
Nate the great's Avatar
 
Posts: 12,375
Karma: 23555235
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
Quote:
Originally Posted by Peter Sorotokin View Post
That's a legitimate point of view, of course. The problem is that building index on the device would be slow and drain the battery and building it elsewhere would mean that special software needs to be used to transfer the book to the device. I think that support for indexing is just too central for a dictionary to leave it out.
I only wrote that because I misunderstood how Mobipocket handled indexes. I was also hoping this thread would die, but oh well.

I think it's safe to assume that the indexes will be made during the ebook creation process. I would suggest that each index in an ebook be in a separate file(s). Given that Epub is basically zipped HTML, an index will likely consist of 1 or more links (that lead to other places in the ebook).

I think we should consider copying the behavior of Mobipocket indexes. A link in the title index doesn't lead to the respective title. Instead, it leads to the beginning of the entry containing the title. (I also agree with Igorsk about the need for multiple head words.) In the keyword index, each keyword will listed once, and link to a separate file consisting of links to each of the entries containing the keyword. The links won't lead to the keyword, but to the beginning of the entry.
Nate the great is offline   Reply With Quote
Old 05-30-2009, 01:15 AM   #17
Peter Sorotokin
speaking for myself
Peter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it is
 
Posts: 139
Karma: 2166
Join Date: Feb 2008
Location: San Francisco Bay Area
Device: PRS-505
Quote:
Originally Posted by Nate the great View Post
I think it's safe to assume that the indexes will be made during the ebook creation process. I would suggest that each index in an ebook be in a separate file(s). Given that Epub is basically zipped HTML, an index will likely consist of 1 or more links (that lead to other places in the ebook).

I think we should consider copying the behavior of Mobipocket indexes. A link in the title index doesn't lead to the respective title. Instead, it leads to the beginning of the entry containing the title. (I also agree with Igorsk about the need for multiple head words.) In the keyword index, each keyword will listed once, and link to a separate file consisting of links to each of the entries containing the keyword. The links won't lead to the keyword, but to the beginning of the entry.
So what you saying is that an entry in the index will look like this (assuming specialized XML mark-up for index).
Code:
<entry href="QU.html#queen">queen</entry>
And corresponding dictionary article like that (assuming XHTML for the content):
Code:
<dl id="queen">
<dt>queen<dt>
<dd>a female sovereign or monarch</dd>
</dl>
That would work on the syntax level, but I don't think a flat index file containing all words is going to cut it: it is still going to be too big.

Also, can we used any existing mark-up (e.g. XHTML or perhaps NCX) for index file? Should we just use XHTML-based index with some metadata marking it as such?

I'll think a bit more about it.
Peter Sorotokin is offline   Reply With Quote
Advert
Old 05-30-2009, 07:28 AM   #18
Nate the great
Sir Penguin of Edinburgh
Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.
 
Nate the great's Avatar
 
Posts: 12,375
Karma: 23555235
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
Actually, I was thinking of straightforward HTML for the index entry:

<a href="dictionary.html#d_somenumberX">dictionary</a><br />

It would have a corresponding link in the body of the ebook, of course.
Nate the great is offline   Reply With Quote
Old 05-30-2009, 07:36 AM   #19
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
It's certainly no problem having multiple headwords for a single entry in Mobi dictionaries. The Chambers dictionary I have on my Gen3 will, for example, find words with variant spellings - eg "center" or "centre".
HarryT is offline   Reply With Quote
Old 05-30-2009, 01:12 PM   #20
Peter Sorotokin
speaking for myself
Peter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it is
 
Posts: 139
Karma: 2166
Join Date: Feb 2008
Location: San Francisco Bay Area
Device: PRS-505
Quote:
Originally Posted by Nate the great View Post
Actually, I was thinking of straightforward HTML for the index entry:

<a href="dictionary.html#d_somenumberX">dictionary</a><br />

It would have a corresponding link in the body of the ebook, of course.
I see. So here is a minimalistic proposal based on the discussion so far:

1. Add metadata tags (exact tags TBD) indicating that the EPUB is a dictionary, optional "input" language (the langauage that the dictionary articles are in is indicated by dc:language element), optional reference to the index file and optional collation declaration that describes the order of terms in the dictionary.

2. Dictionary should be split in multiple sections. In addition, an index file can optionally be provided. Index file should have linear="no" attribute in the spine. If an index is provideed, it should be referenced by the metadata.

3. Each entry in the dictionary must be formatted using XHTML dl tag. The first dt tag inside dl is considered to be a primary term. Dictionary entries must go in the order specified by collation - both inside a single section and across all sections as they are referenced in the spine.

4. Index is an XHTML file (exact structure TBD) that lists the sections of the dictionary itself (as opposed to supplementary material) and only the first term for each section. That both allows for efficient search and does not bloat the index.

Peter
Peter Sorotokin is offline   Reply With Quote
Advert
Old 05-30-2009, 02:47 PM   #21
Nate the great
Sir Penguin of Edinburgh
Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.
 
Nate the great's Avatar
 
Posts: 12,375
Karma: 23555235
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
Quote:
Originally Posted by Peter Sorotokin View Post
I see. So here is a minimalistic proposal based on the discussion so far:

1. Add metadata tags (exact tags TBD) indicating that the EPUB is a dictionary, optional "input" language (the langauage that the dictionary articles are in is indicated by dc:language element), optional reference to the index file and optional collation declaration that describes the order of terms in the dictionary.

2. Dictionary should be split in multiple sections. In addition, an index file can optionally be provided. Index file should have linear="no" attribute in the spine. If an index is provideed, it should be referenced by the metadata.

3. Each entry in the dictionary must be formatted using XHTML dl tag. The first dt tag inside dl is considered to be a primary term. Dictionary entries must go in the order specified by collation - both inside a single section and across all sections as they are referenced in the spine.

4. Index is an XHTML file (exact structure TBD) that lists the sections of the dictionary itself (as opposed to supplementary material) and only the first term for each section. That both allows for efficient search and does not bloat the index.

Peter
Why don't we try to limit this thread to just the discussion of the index?

1, yes.

2, I think an index should be required due to the need for a speedy lookup.

3, The dl tags seem to be duplicating what we are trying to do with the XML tags, and can't get achieve the specificity desired . Why use both?

4, Let me expand on what I wrote before.

A dictionary, for example, will have at a minimum title index(or something that will serve that purpose). It might also have one or more keyword indexes.

The title index will be in its own file that is separate from the the rest of the book as well as being separate from the other indexes. Each index will be in a separate file (or files) from the other indexes. If there is more than one type of keyword (example: "famous people" & "famous places"), each type of keyword will have its own index with its own files.

Here is where my explanation wasn't clear before. A keyword index, "famous people" for example, would be in the file "famous people_x.html". The entries would look like this:
Quote:
<a href="johnny appleseed.html">Johnny Appleseed</a><br />
<a href="Kevin Costner.html">Kevin Costner</a><br />
etc.
The file "johnny appleseed.html" would contain entries something like this:
Quote:
<a href="dictionary.html#d_somenumberX">an entry</a><br />
<a href="dictionary.html#d_somenumberY">another entry</a><br />
So a keyword index would actually consist of a group of files.
Nate the great is offline   Reply With Quote
Old 05-31-2009, 01:34 PM   #22
Peter Sorotokin
speaking for myself
Peter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it is
 
Posts: 139
Karma: 2166
Join Date: Feb 2008
Location: San Francisco Bay Area
Device: PRS-505
Quote:
Originally Posted by Nate the great View Post
Why don't we try to limit this thread to just the discussion of the index?
OK, but we'll need to discuss XHTML vs. TEI or XDFX. I am leaning towards using XHTML.

Quote:
A dictionary, for example, will have at a minimum title index(or something that will serve that purpose). It might also have one or more keyword indexes.
While I see similarity between title index and keyword index, for practical purposes they may need to be treated somewhat differently (like in p-word). For foreign language dictionaries, title index is going to bloat to the same size as dictionary itself (since definitions a lot of times are as long as the link would be). On the other hand, each individual piece of the dictionary body is already self-indexing, since words go in alphabetical order.

On the other hand, keyword indices have to list every word (since they cannot rely on the document structure), but typically won't be as large (judging by the printed books). Also, in many cases, keyword index includes a short definition for each term, in addition to the link(s) to the book body. From that perspective, keyword indices are more similar to small dictionaries than to the title index.

Finally, my instincts are to avoid br tag. Wrap it in p, li, dt - whatever - instead.

Peter
Peter Sorotokin is offline   Reply With Quote
Old 06-02-2009, 09:10 PM   #23
Nate the great
Sir Penguin of Edinburgh
Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.
 
Nate the great's Avatar
 
Posts: 12,375
Karma: 23555235
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
Quote:
Originally Posted by Peter Sorotokin View Post
OK, but we'll need to discuss XHTML vs. TEI or XDFX. I am leaning towards using XHTML.
I would prefer XHTML because simpler is usually better.
Quote:
While I see similarity between title index and keyword index, for practical purposes they may need to be treated somewhat differently (like in p-word). For foreign language dictionaries, title index is going to bloat to the same size as dictionary itself (since definitions a lot of times are as long as the link would be). On the other hand, each individual piece of the dictionary body is already self-indexing, since words go in alphabetical order.
I disagree about the entry length. I looked at the WordNet Mobi dictionary. The average length was at least twice as long as the link.

Also, while the entries of a dictionary are alphabetical, having a list of just headwords without the entries means you can look at and discard more entries at a time. This will make finding a word (with uncertain spelling) faster.

Question: would it be possible to build the headword index into the toc.ncx file? If so, could it behave like an index?

Quote:
On the other hand, keyword indices have to list every word (since they cannot rely on the document structure), but typically won't be as large (judging by the printed books). Also, in many cases, keyword index includes a short definition for each term, in addition to the link(s) to the book body. From that perspective, keyword indices are more similar to small dictionaries than to the title index.

Peter
A definition would be in the body of the text, not the keyword index. I've never seen a reference title that had an index with definitions. I've seen books with both glossaries and indices, but they were separate entities.
Nate the great is offline   Reply With Quote
Old 06-04-2009, 10:04 AM   #24
Peter Sorotokin
speaking for myself
Peter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it is
 
Posts: 139
Karma: 2166
Join Date: Feb 2008
Location: San Francisco Bay Area
Device: PRS-505
Quote:
Originally Posted by Nate the great View Post
I disagree about the entry length. I looked at the WordNet Mobi dictionary. The average length was at least twice as long as the link.
Oh, but 2 is approximately 1 ;-). I bet a full index for 100M Russian-English dictionary is going to be at least 10M and my gut feeling tells me that's about 10 times more than practical.

Quote:
Also, while the entries of a dictionary are alphabetical, having a list of just headwords without the entries means you can look at and discard more entries at a time. This will make finding a word (with uncertain spelling) faster.
You can think of my proposal as search tree (althouh very shallow). I think it is better for searches than flat array in almost all cases.

Quote:
Question: would it be possible to build the headword index into the toc.ncx file? If so, could it behave like an index?
Per spec, I do not see how, but I'd rather someone else confirm it.
Peter Sorotokin is offline   Reply With Quote
Old 06-04-2009, 10:10 AM   #25
Peter Sorotokin
speaking for myself
Peter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it isPeter Sorotokin knows what time it is
 
Posts: 139
Karma: 2166
Join Date: Feb 2008
Location: San Francisco Bay Area
Device: PRS-505
BTW, judging by the level of interest that this thread generated, people care about dictionaries even less than I thought ;-)
Peter Sorotokin is offline   Reply With Quote
Old 06-04-2009, 11:33 AM   #26
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by Peter Sorotokin View Post
BTW, judging by the level of interest that this thread generated, people care about dictionaries even less than I thought ;-)
Do to the technical nature of this thread I don't see how you can jump to that conclusion. Most people don't care how a dictionary is implemented, they just want one.

Dale
DaleDe is offline   Reply With Quote
Old 06-04-2009, 01:25 PM   #27
jgray
Fanatic
jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.
 
Posts: 554
Karma: 2928497
Join Date: Mar 2008
Device: Clara 2E & Sage
Quote:
Originally Posted by DaleDe View Post
Do to the technical nature of this thread I don't see how you can jump to that conclusion. Most people don't care how a dictionary is implemented, they just want one.

Dale
I agree with Dale on this. Peter, perhaps you are thinking too much like an engineer and not enough like an average reader? As for interest in this thread, I have been following it closely, as I am sure others have.
jgray is offline   Reply With Quote
Old 06-04-2009, 01:38 PM   #28
Sabardeyn
Guru
Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.Sabardeyn ought to be getting tired of karma fortunes by now.
 
Sabardeyn's Avatar
 
Posts: 644
Karma: 1242364
Join Date: May 2009
Location: The Right Coast
Device: PC (Calibre), Nexus 7 2013 (Moon+ Pro), HTC HD2/Leo (Freda)
I've been following this discussion since the beginning, although I have not commented. The discussion is about a format that I don't use (mobi/mobipocket), have no personal knowledge of, and provides detailed discussion of the programming of same.

In other words, I don't see that I can contribute to the discussion in any constructive manner. So...
Sabardeyn is offline   Reply With Quote
Old 06-04-2009, 01:43 PM   #29
zelda_pinwheel
zeldinha zippy zeldissima
zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.zelda_pinwheel ought to be getting tired of karma fortunes by now.
 
zelda_pinwheel's Avatar
 
Posts: 27,827
Karma: 921169
Join Date: Dec 2007
Location: Paris, France
Device: eb1150 & is that a nook in her pocket, or she just happy to see you?
Quote:
Originally Posted by DaleDe View Post
Do to the technical nature of this thread I don't see how you can jump to that conclusion. Most people don't care how a dictionary is implemented, they just want one.

Dale
Quote:
Originally Posted by jgray View Post
I agree with Dale on this. Peter, perhaps you are thinking too much like an engineer and not enough like an average reader? As for interest in this thread, I have been following it closely, as I am sure others have.
Quote:
Originally Posted by Sabardeyn View Post
I've been following this discussion since the beginning, although I have not commented. The discussion is about a format that I don't use (mobi/mobipocket), have no personal knowledge of, and provides detailed discussion of the programming of same.

In other words, I don't see that I can contribute to the discussion in any constructive manner. So...
add my vote and agreement to that. i CARE about dictionaries, a LOT. and i desperately want dictionary support for epub. but i don't know anything about mobipocket to contribute. i have been following a bit though and the discussion interests me. and i'm very glad that nate has decided to tackle the question.
zelda_pinwheel is offline   Reply With Quote
Old 06-06-2009, 10:09 PM   #30
Valloric
Created Sigil, FlightCrew
Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.Valloric ought to be getting tired of karma fortunes by now.
 
Valloric's Avatar
 
Posts: 1,982
Karma: 350515
Join Date: Feb 2008
Device: Kobo Clara HD
Quote:
Originally Posted by Peter Sorotokin View Post
BTW, judging by the level of interest that this thread generated, people care about dictionaries even less than I thought ;-)
I would like to chime in with the rest and say that although I haven't added to the discussion, I'm following it closely. Dictionaries are important.
Valloric is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Amazon extending DXG Returns? nremondelli Amazon Kindle 3 08-01-2010 04:37 PM
Proposal: Extending Epub with reference book tags Nate the great ePub 31 10-16-2009 04:56 AM
iLiad New ContentLister mockup proposal Iñigo iRex Developer's Corner 9 12-08-2008 02:40 PM
A homebrew proposal DasFool Sony Reader Dev Corner 4 07-30-2008 05:45 AM
Projects/files maintenance proposal Alexander Turcic Announcements 6 10-26-2006 09:24 AM


All times are GMT -4. The time now is 07:45 PM.


MobileRead.com is a privately owned, operated and funded community.