View Full Version : ePUB metadata


Jellby
12-04-2008, 04:30 AM
Hi all,

I'm considering publishing all the books I've created (and those I make in the future) in ePUB format. Even if there are no full-featured ePUB readers (or none I could find/try), for some like me who likes to hand-craft the (X)HTML of each book, ePUB has the advantage of storing the XHTML and image files unmodified, so there's no need to save the source files, and then the ePUB could be used as a source to editions in other formats.

OK, so my question is, what is the best way to include the publication metadata in the ePUB files? I mean, in particular:

- Having MobileRead appear somewhere. I'm putting it an "publisher" now.

- Adding my name. I have it in the "dc:contributor" element, with "role=bkp" (book producer)

- Stating the source of the original text used for the conversion (Project Gutenberg, Wikimedia, etc.). I put that in the "dc:source" element, of course, but I'm worried that could be seen as a sort of "association" with Project Gutenberg, which is not allowed if the text is modified (and I usually modify it quite a bit).


Do you think this system is OK? Any comments? More data to add?

Jellby
12-04-2008, 01:24 PM
Oh, and about the different date "events", I use:

"publication" for the original publication year.
"creation" for the day I create the first version of the ePUB file.
"modification" for the day I made the last change in the ePUB file.

Hadrien
12-04-2008, 01:50 PM
Sounds pretty good Jellby. I'd like to see more files using dc:description too. I've added support for this in both our Mobipocket and ePub files on Feedbooks, and it is very nice to have a description displayed in Mobipocket Desktop.

Jellby
12-05-2008, 03:30 AM
Sounds pretty good Jellby. I'd like to see more files using dc:description too. I've added support for this in both our Mobipocket and ePub files on Feedbooks, and it is very nice to have a description displayed in Mobipocket Desktop.

Yes, I'd also like to add dc:description, dc:subject, and dc:type, but I think we need some kind of standard for the last two and a creative writer for the first :D

Hadrien
12-05-2008, 04:52 AM
Yes, I'd also like to add dc:description, dc:subject, and dc:type, but I think we need some kind of standard for the last two and a creative writer for the first :D

dc:type is usually quite limited: a MIME-type or something like "Text".
For dc:subject, sure, a controlled vocabulary might be better but you can use anything (use multiple dc:subject elements).

dc:rights is useful too, you can describe where the book is in the public domain or include the URI of the license if it's a CC-licensed book.

Jellby
12-05-2008, 12:15 PM
dc:type is usually quite limited: a MIME-type or something like "Text".

Really? What if it's a book with illustrations? Or a picture book with some verses?

Ah, I see... I've found this page (http://dublincore.org/documents/2000/07/11/dcmi-type-vocabulary/). But it looks like almost every ePUB will be "Text".

Hadrien
12-05-2008, 01:10 PM
Really? What if it's a book with illustrations? Or a picture book with some verses?

Ah, I see... I've found this page (http://dublincore.org/documents/2000/07/11/dcmi-type-vocabulary/). But it looks like almost every ePUB will be "Text".

Yeah, although you could potentially use a different controlled vocabulary.

Jellby
12-05-2008, 01:20 PM
Yeah, although you could potentially use a different controlled vocabulary.

Apparently, the Dublic Core suggests using multiple Type elements, with as many tags from the "DCMIType vocabulary" as needed (for instance "Text" and "Image" for an illustrated book), and also with tags from other vocabularies:

http://dublincore.org/documents/usageguide/elements.shtml

But I don't know if the ePUB specification allows multiple <dc:type> tags...

llasram
12-05-2008, 01:38 PM
But I don't know if the ePUB specification allows multiple <dc:type> tags...

From the OPF spec:

The metadata or dc-metadata (deprecated) elements may contain any number of instances of any Dublin Core elements. Dublin Core metadata elements may occur in any order; in fact, multiple instances of the same element type (e.g. multiple Dublin Core creator elements) can be interspersed with other metadata elements without change of meaning.

So there you go.

Jellby
12-05-2008, 01:50 PM
From the OPF spec:

Thanks! That's much better. I saw that it was specifically stated for some elements the possibility of multiple instances, but not for every one.

Hadrien
12-05-2008, 09:02 PM
Thanks! That's much better. I saw that it was specifically stated for some elements the possibility of multiple instances, but not for every one.

Yeah, that's very common, actually. I use several dc:subject both in our files and on our webpages (using RDFa).

DaleDe
12-06-2008, 12:52 PM
Apparently, the Dublic Core suggests using multiple Type elements, with as many tags from the "DCMIType vocabulary" as needed (for instance "Text" and "Image" for an illustrated book), and also with tags from other vocabularies:

http://dublincore.org/documents/usageguide/elements.shtml

But I don't know if the ePUB specification allows multiple <dc:type> tags...

Multiple types are allowed but both TEXT and IMAGE would be confusing as one is defined to be primarily text while the other is defined to be primarily images (a coffee table book). Which is it?

Hadrien
12-06-2008, 01:07 PM
Multiple types are allowed but both TEXT and IMAGE would be confusing as one is defined to be primarily text while the other is defined to be primarily images (a coffee table book). Which is it?

TEXT most of the time.

Hadrien
12-06-2008, 01:09 PM
Aside from the files themselves, I also recommend using RDFa on the webpage of the book to describe it:
For example on Feebooks: http://www.feedbooks.com/book/348
... can be extracted as: http://www.w3.org/2007/08/pyRdfa/extract?uri=http://www.feedbooks.com/book/3483&format=turtle&warnings=false&parser=lax&space-preserve=true&submit=Go!

Jellby
12-06-2008, 02:20 PM
And what about an author's alternative names (pseudonyms, real name...)? Is there any field/way to store them?

For instance, I would put "Mark Twain" as a creator with role="aut" and file-as="Twain, Mark"... but where could I put "Samuel Langhorne Clemens", his real name?

llasram
12-06-2008, 03:06 PM
And what about an author's alternative names (pseudonyms, real name...)? Is there any field/way to store them?

For instance, I would put "Mark Twain" as a creator with role="aut" and file-as="Twain, Mark"... but where could I put "Samuel Langhorne Clemens", his real name?

Ooh. Tricky. The full list of MARC relator codes (http://www.loc.gov/marc/relators/relaterm.html) has one that look promising:

Attributed name [att]
Use for an author, artist, etc., relating him/her to a work for which there is or once was substantial authority for designating that person as author, creator, etc. of the work.

I'd probably go with an @role="aut" of "Mark Twain" and an @role="att" of "Samual Clemens" but I have no formal experience in how the MARC codes are used.

Hadrien
12-06-2008, 03:46 PM
There's also a shorter list in the EPUB specs: http://www.idpf.org/2007/opf/OPF_2.0_final_spec.html#Section2.2.6

llasram
12-06-2008, 05:51 PM
There's also a shorter list in the EPUB specs: http://www.idpf.org/2007/opf/OPF_2.0_final_spec.html#Section2.2.6

Indeed, but the short list does not contain "att" :p.

tompe
12-06-2008, 05:51 PM
And what about an author's alternative names (pseudonyms, real name...)? Is there any field/way to store them?

For instance, I would put "Mark Twain" as a creator with role="aut" and file-as="Twain, Mark"... but where could I put "Samuel Langhorne Clemens", his real name?

Does this kind of information really belong in a book? What happens when an author adds a new psedonym? What happens when an author change his name? And so on. Would it not be better to have a pointer from the book to the current information about the author?

Jellby
12-07-2008, 05:08 AM
Besides, can a given dc:creator element have more than one opf:role attribute?

I've been trying to find something about this, but I couldn't find anything. Nowhere I've seen it stated that only one opf:role is allowed or thar multiple opf:role are possible.

Usually, each dc:creator would have only one opf:role, but I guess one would desire to have several in cases where, for example, the author also drew (some of) the illustrations, or an author translated his/her own work, or the translator added extensive notes and comments...

So, does anyone have more information about this possibility?

[EDIT]: Damn! I clicked "edit" instead of "quote"... I intended to quote my own previous post, but ended up editing (and deleting) it... at least the quoted part still survives (and it's the most important).

Jellby
12-10-2008, 07:18 AM
just a pump, since I failed to add a post... (please read again the one above.)

llasram
12-10-2008, 10:55 AM
I've been trying to find something about this, but I couldn't find anything. Nowhere I've seen it stated that only one opf:role is allowed or thar multiple opf:role are possible.

Basic XML rules are that all attribute names for a particular element are unique. However, what you could do is just have multiple <dc:creator/> (or <dc:contributor/>) tags with the same content and different @opf:role values.

Oh, and I missed your note about @opf:role="att". I think you may be right, but the phrasing of "there is or once was substantial authority" makes it unclear. Consider a book such as the Rhetorica ad Herennium which was attributed to Cicero, and let's say that we discover definitively that it was actually by Cannutius. The book was "published" by the name "Cicero," but was actually by someone names "Cannutius." What you're proposing is that the @role="aut" is "Cannutius" and @role="att" is "Cicero". But if you do the same thing for Huckleberry Finn then you should have your @role="aut" be "Samuel Clemens" and your @rol="att" be "Mark Twain".

If you wanted to be really specific, you could use the @dcterms:coverage (http://dublincore.org/documents/dcmi-terms/#terms-coverage) property with @role="att" to specify different time periods of attribution by "substantial authority."

-Marshall

Jellby
12-10-2008, 11:20 AM
What you're proposing is that the @role="aut" is "Cannutius" and @role="att" is "Cicero". But if you do the same thing for Huckleberry Finn then you should have your @role="aut" be "Samuel Clemens" and your @rol="att" be "Mark Twain".

Yes, from the specification that's what I think should be done in the first case. The difference is Cicero and Cannutius are (I assume ) two different people, while Mark Twain and Samuel Clemens are the same single person (just as "Twain, Mark", in a opf:file-as attribute, is the same person).

So, I guess having two different dc:creator elements for Cicero and Cannutius is all right, but having two for Mark Twain and Samuel Clemens is not practical, there's no way of knowing/marking they are the same person.

For the case of several opf:role for a single author, the reading systems could "merge" multiple dc:creator (each one with a single opf:role) if they have the same opf:file-as... I wonder if any system does this or if any developer will think of this.


It would be different if we discover, instead, that the person we know as Cicero was actually named Cannutius.

Hadrien
12-10-2008, 11:27 AM
Yes, from the specification that's what I think should be done in the first case. The difference is Cicero and Cannutius are (I assume ) two different people, while Mark Twain and Samuel Clemens are the same single person (just as "Twain, Mark", in a opf:file-as attribute, is the same person).

So, I guess having two different dc:creator elements for Cicero and Cannutius is all right, but having two for Mark Twain and Samuel Clemens is not practical, there's no way of knowing/marking they are the same person.

For the case of several opf:role for a single author, the reading systems could "merge" multiple dc:creator (each one with a single opf:role) if they have the same opf:file-as... I wonder if any system does this or if any developer will think of this.


It would be different if we discover, instead, that the person we know as Cicero was actually named Cannutius.

You'll probably need another controlled vocabulary to do this. FOAF (http://xmlns.com/foaf/spec/) is the most commonly used vocabulary in the semantic web community to describe people.

Chang
12-02-2009, 08:03 AM
And what about an author's alternative names (pseudonyms, real name...)? Is there any field/way to store them?

For instance, I would put "Mark Twain" as a creator with role="aut" and file-as="Twain, Mark"... but where could I put "Samuel Langhorne Clemens", his real name?

I'm just curious where these attributes come in handy. It's not compulsory to use these according to OPF specs. This thread is one year old already but has something changed for nowadays? Does some reading devices pick up info from the attributes and show it for the user e.g. pseydonyms of the author?
To sum up, is there any practical reason to use these attributes?

quillaja
12-03-2009, 01:07 AM
If Samuel Clemens wrote the book, his name would be on it. But it's not. "Mark Twain" wrote the book, whether he's a real person or not. That we know they are the same person is immaterial to the book.

Chang
12-10-2009, 07:44 AM
I'm sorry but I don't quite get your point.

If Samuel Clemens wrote the book, his name would be on it.

Would be on what/where?

And I would still like to know is there any practical reason to use these attributes? Does the regular user get any info from the attributes or is it just for people who unzip epub files?

Chang
12-17-2009, 01:36 AM
bump
Anyone? :)

rogue_ronin
12-17-2009, 07:56 PM
By definition, metadata is outside the document as you might read it (although it might be included.) So, usefulness depends on the reader software. The standardized Dublin Core metadata would be available for whatever system might want it, although most readers fail to make much use of it.

Of course, as it's machine readable, it can be used for a lot of purposes besides reading.

Don't mistake the current level of exploitation as the ceiling of what is possible. It's always good to add good metadata and clean markup. Future-you, and others, will appreciate it.

m a r