View Full Version : Metadata best practices


Ryn
02-08-2012, 09:58 AM
First off, I'm sorry to barge in like this with post count all-of-1, but I've been in lurk mode for quite some time and I just couldn't restrain myself any longer.

With today's myriad options of obtaining eBooks, discoverability is an obvious boon. As with web properties, metadata may play a key role in this process. Title, author, genre, publisher, language and even subject and description seem logical candidates for enhancement.

The subject tag clearly lends itself to keyword stuffing, but how do the different marketplaces respond to this? Do they even process/read/spider the subject tag? Do they penalize obvious mischaracterizations?

The description tag can be used for the backcover text, praise, introduction, or a short summary that tells the marketplace exactly who this publication is aimed at.

I'd like this thread to be a developing compendium of some of the wisdom on the subject of metadata that is present within this community. By exploring our common wisdom we can help improve the ePub standard while at the same time perhaps selling more books. Moreover, by being aware of the developmental distance between ePub and html proper, we can explore ways in which publishers and creators of eBooks are able to wrest back some control from almighty Amazon and Apple.

I would like to invite everyone to share:
* current thinking on the OPF metadata tags
* ways to tweak individual tag data
* experiences with metadata manipulation
* our understanding of how metatags influence discovery
* possible best practices suggestions

Thanks for not tl/dr'ing!

Adjust
02-08-2012, 08:18 PM
This is a template Metadata info I always replace

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:identifier opf:scheme="ISBN">9781234567890</dc:identifier>
<dc:title>Book Title</dc:title>
<dc:creator opf:role="aut" opf:file-as="Name, Author">Author Name</dc:creator>
<dc:publisher>Insert Publisher</dc:publisher>
<dc:rights>Copyright Author Name 2012</dc:rights>
<dc:format />
<dc:date>2012</dc:date>
<dc:description>Insert Blurb Here</dc:description>
<dc:subject>Humanities; sciences; social sciences; scientific rationalism</dc:subject>
<dc:language>en</dc:language>
<meta name="cover" content="my-cover-image"/>
</metadata>

twedigteam
02-09-2012, 01:06 PM
Great post. I think this is worthy of a steady conversation, metadata always seems to be shifting in terms of application in the eBook world.

I have the same questions about keyword/search term metadata. Do online retailers actually recognize this field? I feel like it would be the "webby" way to present BISAC data, but it's unclear if it's worth the effort or not. Obviously, "discoverability" is the goal here, and it seems that custom keywords could lend a much more specific representation of the content than an archaic categorization system.

I'd love to hear any/everyone else's experience + expertise on this one!!!

chrisridd
02-09-2012, 04:34 PM
Great post. I think this is worthy of a steady conversation, metadata always seems to be shifting in terms of application in the eBook world.

I agree. It does seem like ebook publishers put such minimal effort into it, which seems surprising given the amount they must have.

Does the epub 3 spec have better metadata support than epub 2?

I have the same questions about keyword/search term metadata. Do online retailers actually recognize this field? I feel like it would be the "webby" way to present BISAC data, but it's unclear if it's worth the effort or not. Obviously, "discoverability" is the goal here, and it seems that custom keywords could lend a much more specific representation of the content than an archaic categorization system.

I'd love to hear any/everyone else's experience + expertise on this one!!!

I don't have many store-bought epubs yet (most are PDFs I've converted myself) but the ones I have bought have been quite metadata-poor.

It doesn't help that readers don't seem to display much of it. Why put it in the books if the readers don't show it?

Toxaris
02-10-2012, 03:55 AM
Actually, the metadata support in ePUB2 is quite good. The only thing that I am really missing is the option for series metadata. I know Calibre uses a custom metadata field, but it is not in the official specs.

The main problem is not the allowable metadata, but the reluctance by publishers to actually use it. The fields that are there, are mostly used by the library programs and not by the readers. The readers usually only support what they think the most useful are.

Ryn
02-10-2012, 05:03 AM
Metadata are obviously not *completely* neglected by publishers, as marketplaces would have a hard time presenting their customers with content without the <creator> and <title> tags being filled in.

It seems that there might be chances for eBook publishers who DO take advantage of the complete ePub2 metadata possibilities, ie <subject>, <description>. This of course presumes that we as publishers have any idea of how the different marketplaces process these tags, or how this may affect discoverability and ultimately sales.

I for one don't know these things, but I'm hoping some of the more seasoned members here do, and don't mind sharing some of their "trade secrets" :)

twedigteam
02-10-2012, 02:38 PM
This of course presumes that we as publishers have any idea of how the different marketplaces process these tags, or how this may affect discoverability and ultimately sales.

Exactly. One can include as much metadata as we want [100s of fields] but ultimately could end up being a arbitrary practice depending on what the retail outlets choose to support. I really, really hold out hope for broad keyword support...and I wonder if some retailers do consider this on their back end, similar to SEO for broader search engines, encouraging easier searching and discoverability of titles. A title could potentially be loaded with tens (or even hundreds) of tags that represent the content.

chrisridd
02-11-2012, 03:10 PM
Actually, the metadata support in ePUB2 is quite good. The only thing that I am really missing is the option for series metadata. I know Calibre uses a custom metadata field, but it is not in the official specs.

I took a quick look at the epub 3 specs, and there's no significant change in its metadata requirements over epub 2. In particular it still doesn't look like there's any concept of "series" :(

The main problem is not the allowable metadata, but the reluctance by publishers to actually use it. The fields that are there, are mostly used by the library programs and not by the readers. The readers usually only support what they think the most useful are.

Are the reader devices the most important consumers of metadata, or is it the stores?

twedigteam
02-13-2012, 01:46 PM
I would say retail/stores/libraries are the more important consumers of metadata, since this is how readers will discover new titles.

It's only assumed that an eBook should represent the same metadata a print edition could offer, no more, no less.

Jim Lester
02-13-2012, 03:02 PM
Having a retailer acquire their metadata from inside the ePub is problematic:


Retailers may sell content other than ePub (ie PDF,Mobi, or even paper), with varying metadata capabilities ranging from none to ePub.

Retailers may require metadata that should not be stored in the file, either because of sensitivity, size or variability (such as pricing information, larger thumbnails, etc...).


So I'd be willing to bet that most retailers are using a separate metadata feed from the files.

chrisridd
02-13-2012, 05:09 PM
Having a retailer acquire their metadata from inside the ePub is problematic:

The epub 3 specs appear to refer to a metadata publishing standard called ONIX. Is that used by retailers?

Whackatagin
02-14-2012, 10:11 AM
The epub 3 specs appear to refer to a metadata publishing standard called ONIX. Is that used by retailers?

In general, yes! It is submitted as a separate xml format document for both paper and digital publications as an industry standard. (Currently being upgraded from ONIX 2 to ONIX 3 by most retailers.)

It is also generally very definitive in comparison to "in-book" metadata, containing all pricing & currency rates, territory & licensing, description, publishing account id info, BISAC code(s), encryption status, and alternative formats & editions, all in addition to the standard/mandatory "in-book" metadata of; title, author & language. None of this extra data is really required for self-publishing. The last three, plus date of publication & ISBN (If required) are all you really need, and is all a retailer will expect, or indeed look at. Unless you are a publisher with a vending agreement/contract with a retailer, you will not be asked for, or be expected to provided, an ONIX file with your titles.

Most upload processes for self-publishing usually have a facility to add the description text and other basic info during the upload. Amazon KDP being a prime example. You can add as much "other" metadata as you wish to your e-pub/mobi, whats not required will be ignored.

Tip
Ive noticed Sigil has a fairly extensive metadata headings list (Adv tab) if you want to add additional info for other reasons, such as self retail/distribution, or to credit contributors.

Hope that helps:)

DaleDe
02-14-2012, 07:44 PM
I built a wiki page on ONIX. I may help distinguish this but for self published books that are sold by someone else this extra data is needed and the description is quite useful inside the ePub itself if the reader supports this option.

Dale

twedigteam
02-17-2012, 01:05 PM
Do you have a link for your ONIX Wiki? I'd love to check that out.....

chrisridd
02-17-2012, 03:27 PM
Do you have a link for your ONIX Wiki? I'd love to check that out.....

It seems to be ONIX.