<!-- eBook Content Metadata -->
We now come to the portion of the metadata that is common to all ebook formats, so this will apply to any ebook that you make, whether fixed layout or reflowable. This section can be just a few short lines giving only the bare essentials of title, language, and identifier (the only three that are required), or add a host of other information that can be useful for identifying and indexing your title. In this, more is always better, as individual systems can ignore non-relevant portions, but cannot make them up if they are not provided. Furthermore, much of this information is searchable in online catalogs, such as Amazon's search engine and the Google Play bookstore, helping potential readers find your work.
There are 15 elements, known as "properties," in the standard Dublin Core Metadata Element Set (DCMES). These are (in alphabetical order):
Not all of these are relevant to all ebooks, nor were they created specifically for ebooks, but rather for electronic publications in general. The purpose of using a standard reference set such as Dublin Core is so that everyone who might reference or add to the data pool will be using a common vocabulary.
I will describe each property - grouped into a more or less logical order - along with their modifiers and options, but you can also find a complete table of all fifteen in Appendix C of the ebook tutorial, along with concise descriptors and reference links (included at the end of this post).
The <dc:title> attribute is one of three required elements in any ePub-based ebook, Kindle included. This is the title that will appear on the Home screen or Kindle Bookshelf in list view. You can add more than one, using separate entries, either for subtitles or series/volume titles. But the primary title by which you want your ebook to be listed must be given first. Simply insert the title of your ebook between the angle brackets, like so:
<dc:title>YOUR TITLE HERE</dc:title>
There is no opf modifier to designate a given title entry as the series title or a subtitle, so the order in which they are listed is critical. You can enter it all as a single title, or as separate entities, but whatever is in the first entry will be how the book is listed anywhere that it appears.
You will also be allowed to enter additional title information during the upload process to KDP, including Series titles and/or Volume numbers. But be sure to include it in the metadata section, so that no matter where the ebook goes, the information will go with it. For example, it may be sold to libraries or academic institutions, or find its way to secondary resellers if used ebooks become a legitimate market in the future. And, of course, you want it to appear correctly in the reader's own device library.
As a side note, you should be aware that ebooks will only appear on the Books tab of the Kindle device after they are downloaded from Amazon. Otherwise, they are considered personal documents and will appears on the Docs tab, even when you put them in the Books folder on your device. This is unfortunate for those who wish to sell ebooks directly to their readers, but you cannot change it, so don't try. A metadata entry called "CDE Type" with the content value EBOK (or EBSP for samples) will be created during the KDP upload process, but it is ignored if added manually. Incidentally, Calibre will add the EBOK entry during its conversion of reflowable ebooks, but unfortunately it cannot yet convert fixed layout files correctly.
You are required to include a title, for obvious reasons, but not an author or other creator, as works can be anonymous. There can, of course, also be multiple creators of a work, and each of these should be entered as a separate entity. The OPF spec adds two optional attributes to the <dc:creator> element that greatly aid with this and extend its value as a data attribute. The first of these is file-as, which allows you to specify the last-name, first-name sorting order for creator names, as in:
<dc:creator opf:file-as="Last, First">First Last</dc:creator>
If you leave this out your ebook will be listed under your first name in many catalogs and databases, such as the Calibre library.
The second opf modifier is the role attribute, which has a wide range of values to specify the exact function of each creator or contributor to the work, including author, illustrator, editor, and many others. In addition, there can be multiple entries of each.
These are entered using a three-character code, such as "aut" for author or "ill" for illustrator, as seen in the template metadata. For a complete list of the 223 role values and their 3-character codes, along with a description of each role's function, see the MARC Code List for Relators.
<dc:creator opf:role="aut">Your Name</dc:creator>
In the unlikely event this extensive list does not contain a value for a particular function utilized in the production of your work (the required skills for creating ebooks are expanding rapidly, after all), you can add generic contributor elements using the "oth" value for any others whose roles remain unspecified.
All role values can be used for both the creator and contributor elements, each of which are entered in the same way. The distinction between these is that creators are the primary producers of the work, such as the author or an artist of a heavily illustrated children's book or comic, while translators or designers who contribute to the work should be listed using the <dc:contributor> element. In many instances this might be a purely arbitrary distinction, but in general those who produce the work are creators, and those who help to shape the work are contributors. But there is no hard and fast rule for usage, so use your best judgment.
The publisher of a work is defined as the entity responsible for making the publication available in its present form. This is somewhat less obvious than one might think for self-published books, for the simple reason that the legal "publisher of record" is the entity to whom the assigned ISBN is registered, and not necessarily the person or organization on whose book it ends up.
For example, if you publish your ebook through Smashwords, they will provide you with a free ISBN. However, contrary to popular belief, you will not be a "self-published" author, since the publisher will be listed as Smashwords at every retail vendor who carries your ebook. This is a technicality for all intents and purposes, since you will have "self-published" your own work. But you are not the publisher, Smashwords is.
Conversely, if you do not provide an ISBN when uploading your ebook to Amazon (or Barnes & Noble or Google Play, who do not require one), they will assign their own "unique identifier" to your work (an ASIN in the case of Amazon), but you will still be listed as the publisher.
This might seem a fine idea from a financial standpoint, since ISBNs do not come cheap (at least in the United States: in Canada they're free). However, this makes cross-referencing your works vastly more difficult for archivists and collectors, not to mention librarians who might want to buy your ebook. With an ISBN assigned to the ebook edition of your book (it cannot be the same one used for a print edition, if there is one), your title can easily be found anywhere that it is listed. In addition, all sales for that title will be aggregated in bestseller lists, whereas this is not always the case for titles with a different number assigned at every vendor.
The sole reason I point this out here is that simply entering your personal or business name into the publisher metadata entity does not make you the publisher of record - legally, at least, which may become an issue down the line if you decide to sell your Smashwords published ebook to another publisher. It is always best to have an ISBN assigned to your work, and for legal reasons it is always best to have it registered to you.
The opf spec adds the event attribute to the <dc:date> element, allowing you to specify the nature of a given point in time, such as the date of creation, publication, modification or revision, of which you can include one, all, or none. There is no defined set of date values, so you can enter any that you feel might be useful, such as an entire revision history, or just the date that it was finished. Only the first one will be recognized by Kindlegen during conversion, but the rest will be retained in the internal metadata.
The date element is optional (though recommended by Amazon), and if used you can enter just the four-digit year as shown and call it good for most books. However, if you're working on a periodical or other timely content (or simply like to be precise), you may want to include the month and day as well. If so, you must use the year-month-day format of YYYY-MM-DD in order to conform to ePub standards.
A statement of your rights within the metadata is recommended, though not required. This can be anything from a simple "All Rights Reserved" (or creative commons, public domain, or whatnot), to a full assertion of your Intellectual Property Rights in different territories and/or under various conditions (as, for example if you're doing translations into different languages, or have retained the ebook rights but not the print edition rights). In most cases, if you're getting this involved you'll want to consult a literary agent or legal representative who knows publishing law.
A reference to the system of protection should be included along with any assertion of rights, such as "Copyright" or "Trademark" for print and graphic elements. Thus, a standard concise statement of universal rights might be:
<dc:rights>© Copyright YYYY - All Rights Reserved</dc:rights>Within the XML code framework of the OPF file you should include the full word "Copyright," and/or use a universally recognized string to stand in for the © symbol, as it may not be recognized by reading systems throughout the world. Three standard, code-based alternatives are given here:
The second required element is at least one entity that identifies the primary language of the content. This should employ the standard RFC 3066 Unicode language identifiers, using a base two-letter code, either alone or with a secondary string, or "subtag" for a language variant. So, for example, English can be either plain EN, or EN-US for United States dialects, or EN-GB for British idioms, or any of a host of others. Values are not case-sensitive, so en, En, and EN are all valid entries.
<dc:language>En-US</dc:language>Generally it is best to avoid using subtags unless it is clearly relevant to the work, as it inherently alienates speakers of related languages for which the work is equally accessible. On the other hand, readers might be interested to know that they are in for British slang when diving into Harry Potter for the first time.
You can also include more than one language identifier if more than one language is used in the work, as are found in dual-language translations, or where large portions are given in another tongue, such as Latin passages in a history text. It is not necessary to include a language identifier for single foreign terms scattered throughout a book, unless it is required to know the language for comprehension of the content, or where are a large number of foreign terms have been inserted.
The third and final element you are required to include is at least one unique identifier. As mentioned in the introduction to the OPF file earlier, there must be one identifier with an id value that matches the value of the unique-identifier element in the header declaration, and this is where it goes.
An identifier is a reference to the publication from a given system of documentation, the most common of which is the International Standard Book Number (ISBN). However, it can be any string of data that is specific to this particular incarnation of your work, though preferably one drawn from a formal identification system, such as a Globally Unique Identifier (GUID), Uniform Resource Identifier (URI), or Digital Object Identifier (DOI). No particular identifier schema is required, nor has any been endorsed by either Dublin Core or the IDPF, although, of course, an ISBN is universally recognized as the standard for publications.
Multiple identifiers can be used instead of, or in addition to, an ISBN, but as mentioned, one must be the globally unique identifier. This is specified by an XML id tag that matches the unique-identifier value in the header. The value itself can be any descriptive term or key phrase, such as the common "BookId" or "book-id" employed both by Amazon and Apple in their samples. The value is case sensitive only in that its two entries must match exactly. Otherwise the id value is essentially arbitrary, functioning merely as a label for the data string, although, of course, you will want it to make sense and clearly describe the contents of this particular identifier.
In addition to the id value, the OPF spec adds an optional scheme attribute that allows you to name the specific system of authority that generated or assigned the data used as the identifier. Here is where you would enter "ISBN" or "DOI", or whatever the source of the data string may be. Here again, the OPF spec does not endorse or restrict you to any given identifier scheme.
In the templates I have included three identifiers, each with a particular function. The first of these is labeled "BookId" and is used to identify the Version number of the ebook, since there is no other means of doing so within the metadata, aside from a "revision" event in a date element. It is good practice to include a version or edition number on the copyright page within the ebook itself for the reader's reference, but in order for electronic databases to recognize a revised edition, you will need to include it in the metadata.
The second line in the template is given only as an example of how to enter an ISBN, since the template itself does not have one, being free. You would simply enter your ISBN number between the angle brackets, where it now says NONE. The id that I've entered for it here is just an arbitrary label, and has no actual value. But I could make this the unique identifier simply by changing either the id value here or in the header so that they match.
The final entry is the one I have used as the template's globally unique identifier, and as you can see it matches the unique-identifier value in the header declaration.
Here I've used a standard 32-character data string known as a UUID, or Universally Unique Identifier, which contains an encoded timestamp based on the moment and node location where it was created. These can be generated at no cost on many websites, including UUIDGEN, where one is automatically created the moment to land on the page. The string itself is the portion after the prefixed urn:uuid: tag. The "urn" element stands for Uniform Resource Name, which is a predecessor of the URI and URL.
The dtb:uid value of the id will be found in the metadata section of the NCX file as well, where it should match this data string as well (if used), although it is not strictly necessary. I'll discuss that further when we get there.
The UUID is particularly useful in that it can be decoded to discover the exactly moment and location of creation, but looks like utter gibberish otherwise. Each time you update the ebook's content, however trivial, you should alter this data string, as well as the version number if used, in order to allow users (or yourself) to identify the particular version they are holding.
By using the type element you can add category data such as whether the work is Fiction, Non-Fiction, Poetry, etc., and/or a specific genre or classification, such as Science Fiction or Art History. You can also include functional descriptors for such works as Almanacs and Working Papers. The values here are limitless, but should use commonly recognized terminology.
The subject element, on the other hand, is used to define the topic or subject matter of the content itself. This is where you would add Library of Congress classification codes, BISAC Subject Headings, or others such as those used by Amazon to categorize their entries (many of which will be drawn from these metadata entries). One reason you're adding all this extra information is precisely so that retailers can add it to their product pages. Adding it here facilitates the quick and accurate transfer of metadata concerning your work, and this is your chance to make sure it's right.
The template provides half a dozen examples of subject headings that might be appropriate for such a work, drawn from the BISAC listing. You can add as many as you like, and as always, more is generally better than too few.
This is where you place the back jacket blurb or other descriptive content that tells the reader what the book is about, with the intention of enticing them to read it. Anything you might desire a potential customer to know could go here, including reviews, extracts, or a table of contents to let them know exactly what's included.
Aside from the content of the book itself, this is the most important piece of writing you will do with regard to your published work. Give it some serious thought and apply your finest craftsmanship, as it will show up all over the Internet, on every ebook retailer and social reading site, as well as book review blogs and library lending catalogs, and once it's there it's there for good.
One other tag you might include is format (Kindle in this case, but ePub or iBooks or whatever is appropriate otherwise). This might seem redundant since you've got the ebook right there in front of you, but not everyone reading this data will, and it's one more way this specific iteration of your work can be identified. For example, a library may be looking at a metadata listing in search of a particular format to include in their catalog, and other general ebook retailers will want to identify the format for their customers before selling it to them via whatever systems are put in place down the line as ebooks become more common.
There are a handful of other elements that you can add in order to expand upon or define further aspects of your work, such as its sources or relation to other works. Following is the Appendix C table containing the complete list and their definitions.
Before moving on to the next section of the OPF, be sure to close the metadata section by using the backslash </metadata> tag. It's also a good idea to check through your list of entries to be sure that they're all closed as well.
DUBLIN CORE METADATA ATTRIBUTES
Used for persons making contributions to the publication in a manner that is secondary to the role of creator, such as editors, translators, or designers. The significance of the role is arbitrary, as, for example, an illustrator may be a primary creator or a secondary contributor. The semantics and attributes are the same as those used for creator.
Identifies the spatial or temporal topic of the publication, such as the geographic applicability of a resource or jurisdiction under which it is relevant. Spatial coverage refers to a physical region, using place names or coordinates, while temporal coverage specifies a time period covered by the content, such as Neolithic or Elizabethan. See the DC Coverage Element for further descriptive commentary.
The primary authors or creators of the publication, whether person, service, or organization. A separate element should be used for each name, and these should be given in the order to be presented to the reader. The OPF 2.0 spec adds the optional attributes role and file-as. See MARC Code List for Relators for complete list of the 223 role values and their 3-character codes, along with a description of each role's function.
Any relevant date(s) for the publication, such as creation, publication, and/or revision. The OPF spec adds the optional event attribute to define these temporal points, using standard defined Date & Time Formats (i.e. YYYY-MM-DD), of which the 4-digit year is required (this is not required by Amazon, although it is recommended). A full set of event values has not been defined.
The description(s) of the publication content. This may include, but is not limited to, an abstract, table of contents, graphical representation, or free-form account of the content. It is often used to provide the sales copy for online retailers, as well as book reviewers and bloggers.
The media-type or physical dimensions of the publication. This might also include such factors as duration, software version, or operating system, along with hardware required to use the resource. The recommendation is to use a MIME type.
Required. One or more unique references to the publication within a given context, such as ISBN, DOI, URN or URI. The OPF spec provides an optional scheme attribute that names the system or authority that generated the identifier, although no specific schema are endorsed or defined as required. One element must be defined as the unique-identifier, with an id value specified.
Required. One or more language identifiers, with optional region subtag, such as en-us for English as spoken in the United States, or fr-ca for Canadian French. Avoid using subtags unless necessary. The OPF spec requires that these conform to RFC 3066 or its successor.
The publisher(s) of the resource, defined as the entity responsible for making the publication available in its present form, such as a publishing house, university department, or a corporate entity. This is generally the entity to whom the ISBN is registered (the "publisher of record").
Identifies related resources and their relationship to the current publication. Used to express linkages between entities, such as soundtracks to a movie, or reference works dependent on support materials. There is no standard schema adopted.
An assertion of the rights held by the publisher/creator with respect to the publication and its content. A statement of Copyright notice should appear, including Intellectual and other Property Rights, as well as links and/or references to the rights protection system or service employed.
Identification of primary or secondary resource documents or publications from which the current publication is derived, either in whole or in part, including necessary discovery metadata.
Required. There must be at least one title for the publication, although multiple titles and/or subtitles are allowed, such as for series or volumes. The first one listed should be the primary title.
The nature or genre of the publication. Includes terms for general categories (such as Fiction or Non-Fiction, etc.), genres (such as Young Adult, Fantasy, Romance), as well as those describing functions (such as Technical Report or Dictionary). To describe the physical medium or coding mechanism of the resource, use the format element.