View Single Post
Old 12-26-2009, 01:02 PM   #3
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi,

I contributed the first version of the dc code to Sigil (with a lot of help from Valloric!) and I agree with your assessment of the 2008 spec. It is completely useless.

It seems to assumes that the metadata in an xhtml/html document will always be embedded in a device that has live network access and as such uses links to abstract classes to implement many of their standards.

This is a huge assumption! It means that no off-line reading will ever be done and that xhtml/html files are always and only served up by webservers.

I, of course, threw-out the 2008 document as completely worthless and went with the 2003 document since in all cases ebooks may be read off line and are certainly not being served up by a webserver (as in the case of epub). The funny thing is that the DC website points out no one seems to want to implement 2008 specs and are instead using the 2003 specs and then complains about it. They really are clueless.

So I tried to implment their regular "dc" namespace and the "dcterms" namespace BUT ONLY where it overlaps with the epub standard.

I also assumed that case was not important in the "name" field but that it was relevant in the "content" and "scheme" fields. The problem is that many refinements are used and stored in the name field, so I had to work around that.

I also wanted to support free-form html metadata as generated from pml (eReader er.pdb" books (Title, Author, Publisher, Copyright, and EISBN) as well as simplistic attempts by others where it overlapped with the epub metadata spec.

To give you some idea of the range of things supported, here is one of my test cases:

<meta name="Title" content="Test Case" />
<meta name="Author" content="Kevin Hendricks" />
<meta name="Copyright" content="Copyright &copy; 2005, 2006" />
<meta name="Publisher" content="My Super PublishingHouse" />
<meta name="EISBN" content="0-06-124666-2" />
<meta name="DC.contributor" content="Another me" />
<meta name="DC.contributor.aut" content="Another me1" />
<meta name="DC.contributor.arc" content="Another me2" />
<meta name="dc.date" content="2009-12-15" />
<meta name="dc.date.modified" content="2009-12-16" />
<meta name="DCTERMS.issued" content="2008-10-22" />
<meta name="dcterms.creator.aut" content="Another me3" />
<meta name="dc.identifier" scheme="ISSN" content="123456789" />
<meta name="dcterms.identifier.doi" content="987654321" />
<meta name="dc.identifier.lccs" content="123-123-123-123" />

Please note, the last line is a valid metadata identifier under DC but it will be ignored by Sigil since it is not one of their supported internal formats for identifiers.

Also note, that like you, I ignored ALL <link> fields since the book may be read off-line.

All of the rest do something.

Another thing I have not supported yet (again because there was no place for it in the internal Sigil structure) is "refinements" on the "Relation" field such as" "IsPartOf", "IsVersionOf", "IsFormatOf", "IsReferencedBy", "IsBasis For", "IsBasedOn", and "Requires".

The **internal** structure of Sigil supports the following data items - Please note that everything from the metadata must be mapped to one of these to be supported. If not, it will be ignored since there was no place to store it internally inside Sigil (which focused specifically on the official epub standard for metadata)


Title
Author (or *any* of the Marc relator codes)
Subject
Descriptions
Publisher
Date of publication
Date of creation
Date of modification
Type
Format
Relation
Coverage
Rights
ID (must be one of DOI, ISBN, ISSN, or CustomID)

It sounds like Sigil has decided to add "published" as a dcterm to augment "issued" which makes a lot of sense but not one of the official dcterms.

Please ask me and I can tell you what is supported and I would be happy to offer a patch to Sigil to support something that is very important to you as long as it fits with the epub metadata spec - and of course it is acceptable by the author of Sigil!!!

Hope this helps,

KevinH
KevinH is online now   Reply With Quote