Alyssa's Coding Journal

About


EPUB 3.2 Structure: A Simplified Overview

Author: Alyssa Riceman

Posted:

Updated:


Introduction

This is an overview of the internal structure of EPUB 3.2 files. (Which I’ll henceforth just call ‘EPUB’, no version number specified, when not specifically contrasting with other EPUB versions.) It’s written as a followup / companion piece to my prior overview of EPUB 2.0.1 files, with similar goals in mind: to serve as a resource for programmers who want to make EPUB-generating software, as an easier-to-read (albeit less thorough) alternative to the official format specification (archive).

Like the previous summary, this one is targeted specifically at creators of EPUB-writing software, not EPUB-reading software, and will omit summaries of reader-specific requirements and of features which are deprecated or otherwise unlikely to be of relevance to creators of EPUB-writing software. For those interested in a more complete picture, see the format specification linked in the prior paragraph.

EPUB 3.2, as a format, is substantially more elaborate and feature-rich than EPUB 2.0.1 was. Moreover, even in those parts of the format which are superficially similar, many changes have been made. Thus, this summary will be very long, and will include many elements which are similar to but subtly different from those in the 2.0.1 summary.

High-Level Structure

An EPUB file, at the highest level, is a ZIP file (with .epub extension, traditionally) whose internal structure meets various criteria. The broad structure looks something like this:

[Zip file root]/
    mimetype
    META-INF/
        container.xml
        [Optionally some other metadata files]
    [The actual book content, including, at minimum, an OPF package file and an XHTML navigation document, and probably a bunch more than that in practice]

The mimetype file identifies to readers that this is an EPUB file and should be processed accordingly.

The META-INF directory holds high-level metadata for the EPUB file as a whole. Of particular note is its container.xml file, which lists one or more OPF files, each of which defines a complete rendition of the book. (In practice, there will usually be just a single one; readers mostly lack support for books with multiple renditions, at present.)

Each OPF file (traditionally equipped with a .opf extension) lays out the metadata, constituent files, and linear reading order which, together, define a rendition of the book. Each OPF file also has to contain a link to an XHTML file, the navigation document, which serves as the rendition’s table of contents.

The navigation document (traditionally equipped with a .xhtml extension) contains a list of links to sections of the book, with annotations to ensure that the list is machine-readable and can thus function as a source of table-of-contents metadata.

Aside from these required elements, a typical EPUB book will be comprised of additional XHTML files (or, less traditionally, SVG files) containing the book’s text and other content—one doesn’t typically create a book containing a table of contents and nothing else, after all—with support for most of the traditional features of webpages, stylesheets and images and font embeds and suchlike.

The full list of media formats supported for inclusion in an EPUB without the need for fallbacks (more on fallbacks later) is:

(The EPUB specification allows use of alternate media types for some of the formats here, for backwards-compatibility reasons. The ones listed here are the recommended ones for newly-created EPUBs.)

Note that, unlike EPUB 2.0.1, which specifies concrete version numbers for each of its formats-supported-without-fallback, EPUB 3.2 allows use of whatever version one wants, albeit with the disclaimer that reader support is likely to be limited for excessively recent versions. The links above point at the versions linked to by the specification; but, in cases where later versions have since been released, there’s no restriction against using them instead.

The EPUB Media Overlay format is part of the EPUB specification, and so I’ll be summarizing it below rather than relying on a link; with that said, for those interested in reading the full specification, it can be read here. The EPUB 2.0.1 NCX format, meanwhile, is purely a legacy feature to improve backwards compatibility with EPUB 2.0.1 readers; its original definition is here, and I’ve previously written up a simplified summary as part of my EPUB 2.0.1 summary.

Over the course of this summary, I’ll cover all of these layers of the EPUB file structure, starting at the highest level and working down through the essentials, finishing off with the various optional frills.

Zip Container

As mentioned, at the highest level, an EPUB file is a ZIP file meeting certain specifications internal-structure-wise. Before getting into the intricacies of its internal structure and constituent files, though, there are a bunch of restrictions which apply at the ZIP abstraction-layer itself.

High-level limitations on the structure of the ZIP file include:

Path names within the ZIP file need to meet the following criteria:

Furthermore, all file and directory names (‘filenames’, for conciseness) need to meet the following criteria:

And, finally, all files stored within the ZIP file need to meet the following criteria:

…and now, with all of these limits out of the way, we can move on to describing all the things which make the EPUB file actually EPUB-like and not just a ZIP file with some restrictions.

mimetype

(Note: the mimetype file is the same in EPUB 3.2 as in EPUB 2.0.1; feel free to skip this section if already familiar with the EPUB 2.0.1 equivalent.)

As mentioned above, the first file within the ZIP’s internal ordering has to be the mimetype file. It has to reside in the zip root, uncompressed and unencrypted, without any extra fields in its ZIP header. Its contents should be the ASCII string:

application/epub+zip

The mimetype file serves as a quick way for readers to confirm, when loading a file, that it’s an EPUB file of some sort. (It doesn’t disambiguate between different EPUB versions; that, the readers have to handle elsewhere.) Thus, leaving it out will cause you problems.

(If you set it up right, then, looking at the EPUB file in a hex editor, you should see the string mimetypeapplication/epub+zip starting at offset 30.)

META-INF

Alongside the mimetype file, the zip root is required to contain a folder, META-INF, which is used to store various high-level metadata files. Within said folder, it’s required that there be a file, container.xml, which identifies the locations of the EPUB’s OPF files; there are also a variety of other optional files within META-INF which can be used to specify other potentially-relevant information about the EPUB. These are:

Those five files are all optional, while container.xml is mandatory. Also optionally, other files can be included in the META-INF folder, as long as their filenames don’t collide with any of the aforementioned six (whose names remain reserved for their stated purposes, even when no files filling those purposes are present).

(The format specification is somewhat unclear regarding exactly what limits there are, if any, on what other files can go into META-INF, aside from the ’no reserved names’ limit. For safety, my personal recommendation would be to avoid putting any files into META-INF which are referenced in any way by files in non-META-INF parts of the EPUB.)

When files in META-INF contain relative references to other files, the references should use the zip root, rather than META-INF, as their base. (In other words, there’s an implicit ../ at the start of any relative reference made from a file in META-INF.)

container.xml

container.xml is used to list the OPF files associated with each rendition of the EPUB. A simple container.xml file might look like this:

<?xml version="1.0" encoding="UTF-8"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
    <rootfiles>
        <rootfile full-path="OEBPS/package.opf" media-type="application/oebps-package+xml" />
    </rootfiles>
</container>

…and a more complex one might look like this:

<?xml version="1.0" encoding="UTF-8"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
    <rootfiles>
        <rootfile full-path="OEBPS/default.opf" media-type="application/oebps-package+xml" />
        <rootfile full-path="Alt/alternate_rendition.opf" media-type="application/oebps-package+xml" />
    </rootfiles>
    <links>
        <link href="resources/sample-resource.pdf" rel="placeholder-rel" media-type="application/pdf" />
    </links>
</container>

The <container> element is required, and it requires the displayed version and xmlns attribute values and <rootfiles> child; it can also optionally have a <links> child after the <rootfiles>. The <rootfiles> element needs to contain one or more <rootfile> elements, each of which needs a full-path attribute pointing to an OPF file and a media-type attribute of "application/oebps-package+xml". The <links> element, if present, needs to contain one or more <link> elements, each of which needs an href attribute pointing to a remote resource and a rel attribute describing the relation of said resource to the EPUB, and each of which can also optionally contain a media-type attribute specifying the media type of the linked resource.

Each rootfile in <rootfiles> should point at the OPF defining a rendition of the EPUB. In practice, you’ll usually have just a single one; most readers don’t support use of multiple renditions, and will just default to the first rootfile listed. But you can include multiple, if you expect doing so to be useful for whatever reason.

<links>, if present, is used to point at one or more files necessary for the EPUB to be processed correctly. This is unlikely to come up in practice, since most readers aren’t going to know what to do with a given link; but it might be useful for certain EPUB subformats. Each <link> in <links> should href to one such file, either internally or externally. There’s no official specification of what values rel can take; but it should be a space-separated list of relations, details likely to be supplied by whatever subformat you’re using which is driving you to use a <links> element at all.

encryption.xml

encryption.xml holds all encryption information for any encrypted files within the EPUB. It has to exist iff any file within the EPUB is encrypted (or obfuscated). You’re not allowed to encrypt mimetype, container.xml, or any of the other reserved files in META-INF (including encryption.xml itself), or any of the EPUB’s OPF files; but you can encrypt any other files in the EPUB. See Section 3.5.2.2 and Section 5 of the EPUB OCF specification for details, if you want them.

manifest.xml

manifest.xml can, in some theoretical sense, be used to provide a manifest of files in the EPUB. In a practical sense, there is literally no detail given regarding how one should format it, and the format specification directly admits that the only reason it’s included at all is for the sake of ODF compatibility. This file’s use case is so underspecified as to make it useless, and even if you do use it for some reason, readers are likely to ignore it. Thus I’d recommend against using it.

metadata.xml

metadata.xml can be used for container-level metadata. (With ‘container-level’ meaning, approximately, “for the overall EPUB rather than for any specific rendition”.)

There’s very little detail given regarding the format of metadata.xml. Its root element should be a <metadata> element with namespace "http://www.idpf.org/2013/metadata", and all its elements should be namespaced; that’s about it, when it comes to requirements from the EPUB specification. In practice, you should probably not use this unless you’re working with an EPUB subformat which defines it further; for a basic EPUB, even if you put things in metadata.xml, readers are unlikely to know what to do with said things.

rights.xml

rights.xml is reserved for DRM information. All elements in rights.xml should be namespaced, but aside from that it has no specified format; details are up to whoever is defining the DRM scheme at play.

The EPUB specification neglects to define the phrase “rights governed” before using it; but it states that, “if the rights.xml file is not present, no part of [the EPUB file] is rights governed”.5 Thus, for anyone who wants their EPUB’s contents to be rights governed, this file’s presence is important.

signatures.xml

signatures.xml can be used to hold digital signatures for the EPUB and/or its contents. See Section 3.5.2.6 of the EPUB OCF specification for details, if you want them.

OPF Package

Each EPUB has at least one OPF file linked from its container.xml. Each OPF file serves as a high-level description of the structure of a rendition of the book; while many specific functions are filled by other files, it’s the OPF file which serves as the central point of cohesion holding all the other files together. It should have a .opf extension.

An OPF file is a relatively elaborate XML file, and there are certain XML attributes which recur as legitimate options across many different elements in the OPF. These attributes are:

The broad structure of an OPF file is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<package version="3.0" unique-identifier="BookId">
    <metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
        <!--Metadata contents go here-->
    </metadata>
    <manifest>
        <!--Manifest contents go here-->
    </manifest>
    <spine>
        <!--Spine contents go here-->
    </spine>
    <!--Optionally a <guide> element + contents, for backwards compatibility-->
    <!--Optionally some number of <collection> elements + contents-->
</package>

The <package> element is required, and it requires the displayed version attribute value and a defined unique-identifier attribute. (I’ll say more on the unique-identifier below, in the metadata section.) It also can, optionally, have: an id attribute, a dir attribute, an xml:lang attribute, and/or a prefix attribute.

The prefix attribute can be used, if relevant, to declare your use of XML vocabularies beyond those defined in the main specification, useful for EPUB format extensions. The attribute’s value should be a whitespace-separated list of mappings of the form prefix: iri, where prefix is a prefix which can be placed before attribute names and iri is the IRI of an XML vocabulary. (The space between prefix and iri is required, to be clear; the parser is smart enough to recognize it as part of the pattern, rather than as whitespace.)

A few limits exist on what prefixes you can define: you can’t define prefixes mapping onto the already-available-and-non-prefix-requiring default vocabularies; you can’t define prefixes mapping into the Dublin Core Elements 1.1 namespace; and you can’t define prefixes named _.

Default reserved prefixes, which can be used in OPF files without declaration (although declaring them if applicable is still recommended, for the sake of compatibility with imperfectly standards-conformant readers and other tools) and which you should probably avoid name collisions with in your own prefix-definitions, are:

…so that’s the <package>’s attributes covered. Now, on to its contents. The <package> needs to contain, in this order: a <metadata> element, a <manifest> element, and a <spine> element. It can optionally follow those three with, in this order if more than one is present: a <guide> element (for purposes of backwards compatibility with EPUB 2.0.1 readers), a <bindings> element (deprecated, so I won’t be going into further detail on it), and/or arbitrarily many <collection> elements.

<metadata>

The <metadata> element, as the name suggests, contains metadata. There’s a very large amount of possible metadata which can go into it, but a relatively minimal <metadata> element and its contents might look something like this:

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
    <dc:identifier id="BookId">urn:uuid:40439CB8-FA73-4F56-93DD-EEB7BB82C9CF</dc:identifier>
    <meta property="dcterms:modified">2022-01-01T00:00:01Z</meta>
    <dc:language>en</dc:language>
    <dc:title>EPUB 3.2 Structure: A Simplified Overview</dc:title>
</metadata>

…and a somewhat more complex one might look something like this:

<metadata xmlns:dc="http://purl.org/dc/elements/1.1/">
    <dc:identifier id="BookId">urn:uuid:40439CB8-FA73-4F56-93DD-EEB7BB82C9CF</dc:identifier>

    <meta property="dcterms:modified">2022-01-01T00:00:01Z</meta>

    <dc:language>en</dc:language>

    <dc:title id="t">EPUB 3.2 Structure: A Simplified Overview</dc:title>
    <meta property="alternate-script" refines="t" xml:lang="ja">EPUB 3.2の構造:簡略化の概要</meta>
    <link refines="t" href="audio/title.mp3" rel="voicing" media-type="audio/mpeg" />

    <dc:creator id="c">Alyssa Riceman</dc:creator>
    <meta property="file-as" refines="c">Riceman, Alyssa</meta>
    <meta property="alternate-script" refines="c" xml:lang="ja">アリッサライスマン</meta>
    <link refines="c" href="audio/author.mp3" rel="voicing" media-type="audio/mpeg" />

    <meta property="belongs-to-collection" id="s">eBook Format Overviews</meta>
    <meta property="collection-type" refines="s">series</meta>
    <meta property="group-position" refines="s">2</meta>

    <link href="meta/additional_metadata.xml" rel="record" media-type="application/xml" properties="onix" />
</metadata>

The <metadata> element itself needs the displayed xmlns:dc attribute value. Content-wise, there are four types of element which can be placed inside the <metadata> element:

  1. dc:-prefixed elements drawn from this list
  2. non-self-closing <meta> elements with defined property attributes (henceforth called ‘<meta> elements’ or synonyms thereof)
  3. self-closing <meta> elements with defined name and content attributes (henceforth called ‘EPUB 2.0.1 <meta> elements’ or synonyms thereof)
  4. <link> elements.

There are no ordering restrictions; feel free to intermix all of these in whatever arbitrary order you find most convenient.

dc:-prefixed elements

Let’s start with the dc:-prefixed elements. There are three mandatory ones, of which at least one of each has to be present: <dc:title>, <dc:language>, and a <dc:identifier> element with a defined id attribute with the same value as that of the package’s unique-identifier.

There can be arbitrarily many additional elements of any of these types (with the id being optional on all <dc:identifier>s aside from the mandatory one), along with arbitrarily many ones of any of the other types in the Dublin Core metadata set. Every dc:-prefixed element can optionally have an id attribute, and is required to contain at least one non-whitespace character of text in its contents.

If there’s more than one fitting value for a given element (e.g. if a rendition has multiple titles, or multiple authors, or multiple publishers, or suchlike), multiple instances of that element should be used, one for each such value.

To briefly summarize each of the dc:-prefixed elements:

Those last five (from <dc:coverage> onward) are, in practice, not likely to be all that useful, although I listed them anyway for the sake of completeness.

<meta> elements

<meta> elements are the EPUB format’s system for tagging any metadata not covered by the dc:-prefixed elements. Of particular note, they can be used, not only for tagging metadata on the rendition, but also for tagging metadata on other metadata, thereby allowing for arbitrarily-complex nested metadata structures.

Each <meta> element is required to have a property attribute, whose value defines what sort of metadata it is. In terms of optional attributes, it’s allowed id, dir, and/or xml:lang, all of which are already familiar, plus the two new attributes scheme and refines.

The scheme attribute, if present, defines what system or scheme the meta element’s value is drawn from, if the value is drawn from some specific schema. (Whether it’s needed or not will generally be a function of which property you’re using, and I’ll note it where relevant in my property summaries below.)

The refines attribute, if present, defines what other element of the OPF the <meta> element is providing metadata on. Its value should be a relative IRI (typically of the form #ID, where ‘ID’ is the ID of an element elsewhere in the OPF), aimed at either another metadata element (allowing for arbitrarily-long chains of refinement) or a file elsewhere in the EPUB (identifiable by reference to its manifest entry (more on the manifest below)).

The required contents of a <meta> element will vary based on its property value; but it’s required, like the dc:-prefixed elements are, to contain at least one non-whitespace character. (With rare exceptions. Technically speaking, the rule is that it needs to contain at least one character after whitespace normalization, and different properties are allowed to normalize whitespace differently; but in practice most properties normalize whitespace by trimming all leading and trailing whitespace, which will remove all characters unless there’s at least one non-whitespace one.)

With prefixes, essentially anything can be defined as a <meta> property. Various among those will be covered later in this summary, and people interested in extending the EPUB format can define their own additional ones. But here’s the default vocabulary of properties which can be used without prefixes:

(There’s also a deprecated property, meta-auth, which I’m not going to bother going into detail about.)

Unless otherwise specified, any given element can only be refined with a given one of these properties once. (So, for example a single <dc:creator> can’t have two separate file-as refinements, although two separate <dc:creator>s can have one apiece, and a single <dc:creator> can be refined with one of each of two separate properties (e.g. one file-as and one role).)

<meta> elements involving any of the above properties are entirely optional, aside from the specific case where, if a <dc:subject> is refined with an authority, then it also needs to be refined with a term.

I will, for the most part, not be going into namespaced properties here; there are too many of them, and their uses are too scattered. But there’s one specific one which warrants immediate discussion, and that one is dcterms:modified.

Unlike every other <meta> element, the presence of a dcterms:modified-property-bearing <meta> is mandatory. Its content should be a representation of the UTC time at which the EPUB was last modified, structured as YYYY-MM-DDThh:mm:ssZ, where YYYY is year, MM is month, DD is day, T is the character T, hh is hour, mm is minute, ss is second, and Z is the character Z. (So, for example, 2022-01-01T00:00:01Z for the first second of 2022. Relative to the UTC time zone, so this would represent a time near the end of 2021 for anyone in an earlier time zone.)

The presence of an element of this form, with no refines value, is mandatory. It’s also mandatory that there be only one such element; any other dcterms:modified <meta> present in the metadata needs to have a refines value clarifying what it is, exactly, that it’s marking modification time on.

Whenever a given EPUB is changed, even if the default rendition in particular hasn’t been changed, the default rendition’s dcterms:modified <meta> should be updated, in order to make it easier for readers to be aware of the update and thus not conflate the updated EPUB with an older version of itself.

EPUB 2.0.1 <meta> elements

EPUB 2.0.1 <meta> elements, unlike EPUB 3.2 ones, are very simple and straightforward. They have two required attributes, no optional attributes, and no contents. The attributes are name and content; name is a string representation of a metadata element’s name, and content is a representation of that element’s value, leading to an overall product looking something like <meta name="translated-from" content="ja" />. No prefixing, no refinement, no standardization, just pure “here’s an arbitrary metadata name/value pair”.

EPUB 2.0.1 <meta> elements are supported in EPUB 3.2 purely for the sake of backwards compatibility, for use if you want to tag your EPUB with non-dc:-prefixed metadata which non-EPUB-3.2-supporting EPUB 2.0.1 readers will still be able to comprehend. If your reader supports EPUB 3.2, the EPUB 2.0.1 <meta> elements will be ignored; so you shouldn’t rely on them as your primary form of non-dc:-prefixed metadata, only as a thing to optionally include as fallbacks for your EPUB 3.2 <meta> elements.

<link> elements let you link to files which sit outside of the OPF and its manifest but which are nonetheless associated in some way with the rendition.

A <link> element has two required attributes, and one conditionally-required one. The required two are href and rel, where href is an IRI reference to the linked file (absolute if external, relative if internal) and rel is a space-separated list of values describing what sort of association the thing being linked has the rendition. If the link is an external one, then media-type is also required, with its value identifying the media type of the linked file; if the link is internal, then media-type is optional.

In terms of always-optional attributes, a <link> element can have id, properties, and/or refines. Much as on <meta> elements, refines is used to indicate that the linked thing is relevant to a specific object-being-refined rather than to the EPUB as a whole; its contents should be of the same form as the contents of <meta> elements’ refines attributes. properties, meanwhile, is a space-separated list of values which can disambiguate what sort of file is being linked in case the media type isn’t enough. (For instance, when linking a file whose media type is application/xml, to disambiguate exactly what sort of XML file it is.)

Much as with <meta> property values, links’ rel and properties values can be essentially arbitrary given prefixing; however, here are their respective default vocabularies for use without prefixes:

rel values:

properties values:

(There are also a variety of deprecated rel values, which I’m not going to go into detail about: marc21xml-record, mods-record, onix-record, xml-signature, and xmp-record.)

You can’t link a file which is listed in the <manifest> (described below). You can, however, link a file which is embedded within a file listed in the <manifest> while not being listed there directly, identifying it by fragment; if you do, that file has to be of one of the media types which EPUB supports without need for fallback.

Even if you link something, that doesn’t mean the reader will have any idea what to do with it; be accordingly cautious about relying overly heavily on <link> elements. (This warning applies especially for external linking, but still to a non-negligible extent even for internal linking.)

<manifest>

The <manifest> contains a list of the rendition’s constituent files. A simple <manifest> might look something like this:

<manifest>
    <item href="text/toc.xhtml" media-type="application/xhtml+xml" id="toc" properties="nav" />
    <item href="images/cover.svg" media-type="image/svg+xml" id="cover" />
    <item href="text/chapter 01.xhtml" media-type="application/xhtml+xml" id="ch1" />
    <item href="text/chapter 02.xhtml" media-type="application/xhtml+xml" id="ch2" />
</manifest>

…and a more complicated one might look something like this:

<manifest>
    <item href="images/cover.svg" media-type="image/svg+xml" id="cover_svg" />
    <item href="images/cover.blend" media-type="application/x-blender" id="cover_blend" fallback="cover_svg" />

    <item href="text/toc.xhtml" media-type="application/xhtml+xml" id="toc" media-overlay="overlay" properties="nav" />

    <item href="text/chapter 01.xhtml" media-type="application/xhtml+xml" id="ch1" media-overlay="overlay" />
    <item href="text/chapter 02.xhtml" media-type="application/xhtml+xml" id="ch2" media-overlay="overlay" />

    <item href="audio/chapter 01.mp3" media-type="audio/mpeg" id="ch1a" />
    <item href="audio/chapter 02.mp3" media-type="audio/mpeg" id="ch2a" />

    <item href="overlay.smil" media-type="application/smil+xml" id="overlay" />

    <item href="https://example.com/font/externally_linked_font.otf" media-type="font/otf" id="font" properties="remote-resources" />
</manifest>

The only attribute allowed on the <manifest> is, optionally, an id. Content-wise, it should contain of one or more <item> elements.

Each <item> is required to have an href attribute (whose value should be an IRI reference to a file, absolute if external or relative if internal), an id attribute, and a media-type attribute identifying the referenced file’s media type. Under certain circumstances, it may additionally be required to have a fallback attribute and/or a properties attribute, each of which I’ll discuss below. And it can optionally have a media-overlay attribute whose value should be the ID of another manifest item, where said item is the media-overlay-attribute-bearing item’s associated media overlay document (more on those below).

Every file which is involved in the rendering of the rendition—excepting the OPF itself, and any <link> elements’ referenced files—should be listed as an <item> in the manifest.

If a file is of a media type other than those listed above as not requiring fallbacks, or is referenced from the <spine> (discussed below) while not being an XHTML or SVG file, it’s required to have a fallback attribute, whose value should be the ID of another manifest item for the reader to fall back on displaying if it doesn’t know how to display the item with the fallback.

Multiple fallback-laden <item>s can be chained together in sequence, such that item A falls back on item B, item B falls back on item C, and so forth, with the full sequence of fallbacks being called the item’s “fallback chain”. An item of a fallback-requiring media type needs its fallback chain to contain an item of a non-fallback-requiring media type, while a non-XHTML-or-SVG file referenced from the spine needs its fallback chain to contain an XHTML or SVG file. Fallback chains aren’t allowed to cycle; they need to bottom out eventually.

By default, a reader will go down an item’s fallback chain until it finds something it knows how to display, then displays that. Readers are allowed to be opinionated in such a way as to display items from further down the fallback chain than that instead; but, in practice, they probably won’t. As a result, there’s not usually all that much reason to provide fallback chains for items of non-fallback-requiring media types, even if in theory it might be useful to do so (e.g. to provide a static fallback for a script-containing XHTML file).

One edge case: you’re allowed to include items with ordinarily fallback-requiring media types without any fallback, if those items are never referenced from the <spine> or rendered directly in the files which are referenced from the <spine>.7 This can be used to pack the EPUB with easter eggs, or datasets, or suchlike; stuff which will be irrelevant to the reader’s rendering of the book, but which people might want to pull out of the book ZIP to view separately.

Much like <link>s above, <item>s can have properties attributes. Unlike <link>s, there are circumstances under which it’s mandatory for an <item> to have a defined properties attribute. Much like a <link>’s properties attribute, an <item>’s properties attribute’s value should be a space-separated list of values, which can be more-or-less arbitrary prefixed things, but with a default list of non-prefix-requiring values. Those values are:

None of the above requirements propagate up through layers of embedding. If a script-free XHTML file uses an <iframe> to embed a scripted one, for example, only the latter of the two needs the scripted property value.

<spine>

The <spine> element contains a list of files which make up the EPUB’s primary content, the ones whose content would be printed if the EPUB were a print book; and it defines a linear reading order through those files. A <spine> might look something like this:

<spine>
    <itemref idref="cover" linear="no" />
    <itemref idref="ch1" />
    <itemref idref="ch2" />
</spine>

The <spine> can optionally have an id attribute, a page-progression-direction attribute, and/or a toc attribute.

The page-progression-direction serves as a global definition for which way the rendition’s pages turn; it can have value "ltr" (left-to-right), "rtl" (right-to-left), or "default" (let the reader do whatever it wants). If no page-progression-direction is defined, the reader will treat it as "default".

The toc attribute exists purely for the sake of backwards compatibility with EPUB 2.0.1. Its value, if present, should be the id of an EPUB 2.0.1 NCX file in the <manifest>, to be used as a source of table-of-contents information by EPUB 2.0.1 readers which don’t know how to read EPUB 3 files and thus can’t use the navigation document.

Then, content-wise, the <spine> needs to contain one or more <itemref> elements. Each <itemref> needs an idref attribute whose value is the id of an <item> in the <manifest>, that item being the one being reffed. Optionally, an <itemref> can also have an id attribute, a linear attribute, and/or a properties attribute.

By default, when advancing through the book, readers will display the spine’s itemreffed files in the order they’re listed. The linear attribute allows you to more explicitly control this behavior. Its value can be set to either "yes" or "no", where "yes" is the default behavior, and "no" indicates that the relevant itemref is not part of the book’s linear reading order and should be skipped over rather than displayed when advancing through the book. This is useful for, for example, footnotes, or other things which should be reachable by link but which aren’t really part of the main book flow. (However, not all readers respect the linear attribute; some will ignore it and treat the full content of the spine as linear.) At least one entry in the <spine> is required to have either linear="yes" or no linear attribute at all; the rendition can’t be entirely nonlinear.

The <spine>’s properties attribute follows the same pattern as <link>s’ and the <manifest>’s properties attributes. Its value should be a space-separated list of values; prefixes allow infinite extensibility; there’s a non-prefix-requiring native vocabulary. In the <spine>’s case, there are only two non-prefix-requiring values to worry about—page-spread-left and page-spread-right—and I’ll discuss them both below, in the Fixed Layouts section, rather than here.

The <spine> is required to contain <itemref>s to all files in the EPUB which are hyperlinked from files which are, themselves, itemreffed from the spine; it shouldn’t be possible to follow a link from a file in the spine to one in the book but not in the spine. For similar reasons, it needs to contain <itemref>s to all files in the EPUB which are hyperlinked from the navigation document (discussed below), even in the event that the navigation document itself isn’t in the spine.

Any <item> itemreffed from the <spine> has to be XHTML or SVG, or else to have a fallback chain containing an XHTML or SVG file. No one <item> should have more than one <itemref> pointing at it.

<guide>

Optional. Not really part of EPUB 3.2 proper, but allowed by EPUB 3.2 for the sake of backwards compatibility with EPUB 2.0.1; you can include a <guide> in your OPF for use by EPUB 2.0.1 readers which can’t understand EPUB 3 navigation documents, to fill a similar function to that filled by the navigation document’s landmarks element. For details of how these are structured, see the Guide section of my EPUB 2.0.1 summary.

<collection>

Optional. A <collection> element “defines a group of related resources”, and yes, that is as vague as it sounds; they’re intended more as a point of extensibility for EPUB subformats than as something to be used in normal EPUBs, although you can use them in normal EPUBs, albeit with the warning that readers are unlikely to do anything with them.

Unlike the <metadata>, <manifest>, <spine>, and <guide>, of which you can have only one apiece, you can have arbitrarily many <collection> elements in your OPF’s <package>. Each one needs a role attribute, whose value should be either one of the role names from the EPUB Collection Roles Registry or else an absolute IRI pointing at a third-party role definition; this attribute describes what the <collection> is a collection of. And then, off in the realm of optional attributes, each collection can have dir, id, and/or xml:lang attributes.

Contents-wise, a <collection> can optionally contain a <metadata> element, following which it’s required to contain some nonzero number of <collection> and/or <link> elements, <collection>s before <link>s if both are present. (It can have zero or more <collection>s, it can have zero or more <link>s, but it has to have at least one of one of those two.) So you can nest <collection>s in arbitrarily-elaborate tree structures.

A <collection>’s <metadata> element is much like the OPF’s, with the following exceptions:

A <collection>’s <link> elements are much like <metadata> <link> elements, with the following exceptions:

Each <link> is required to “reference a resource that is a member of the group”, where ’the group’ is presumably the previously-vaguely-gestured-at “group of related resources” that the <collection> is defining.

Since <collection>s are mostly allowed for the sake of extensibility, rather than as part of core EPUB functionality, it’s required that any given rendition be renderable even by readers which don’t support <collection>s; you’re not allowed to make a rendition which is only comprehensible to <collection>-supporting readers. (Non-<collection>-supporting readers, to be clear, won’t automatically error upon seeing <collection>s; they just won’t do anything with them.)

Each OPF file has an associated navigation document. This is an XHTML file (and thus traditionally has a .xhtml extension), with certain structures contained within it which allow it to serve as a machine-readable source of navigation information for the rendition as a whole. Most prominently, this includes being where the rendition’s table of contents is defined. A simple navigation document might look something like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head>
    <title></title>
</head>
<body>
    <nav epub:type="toc">
        <ol>
            <li>
                <a href="text/Chapter 01.xhtml">Chapter 1</a>
            </li>
            <li>
                <a href="text/Chapter 02.xhtml">Chapter 2</a>
            </li>
        </ol>
    </nav>
</body>
</html>

A more complex one might look like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head>
    <title></title>
</head>
<body>
<nav epub:type="toc">
    <ol>
        <li>
            <span>Section 1</span>
            <ol>
                <li>
                    <a href="text/Chapter 01.xhtml">Chapter 1</a>
                    <ol>
                        <li><a href="text/Chapter 01.xhtml#sec1">Section 1.1</a></li>
                        <li><a href="text/Chapter 01.xhtml#sec2">Section 1.2</a></li>
                    </ol>
                </li>
                <li>
                    <a href="text/Chapter 02.xhtml">Chapter 2</a>
                    <ol>
                        <li><a href="text/Chapter 02.xhtml#sec1">Section 2.1</a></li>
                        <li><a href="text/Chapter 02.xhtml#sec2">Section 2.2</a></li>
                    </ol>
                </li>
            </ol>
        </li>
        <!--Section 2 and onward go here-->
    </ol>
</nav>
<nav epub:type="page-list" hidden="">
    <ol>
        <li><a href="Chapter 01.xhtml#p1">Page 1</a></li>
        <li><a href="Chapter 01.xhtml#p2">Page 2</a></li>
        <li><a href="Chapter 01.xhtml#p3">Page 3</a></li>
        <!--Et cetera, continuing into Chapter 2 and onward when relevant-->
    </ol>
</nav>
<nav epub:type="landmarks" hidden="">
    <ol>
        <li><a href="cover.svg" epub:type="cover">Cover</a></li>
        <li><a href="Chapter 01.xhtml" epub:type="bodymatter">Start</a></li>
    </ol>
</nav>
</body>
</html>

Alongside whatever other XHTML content might be included in it, the navigation document is required to contain exactly one <nav> element with an epub:type attribute of "toc", and can optionally have additional ones with other epub:type values (more on possible epub:type values below). These elements—<nav>s with defined epub:type values—serve as the machine-readable parts which allow the navigation document to do its job.

(It’s recommended in the specification, likely for reasons of not confusing readers, that the navigation document not contain any <nav>s which don’t have epub:type attributes.)

All such <nav>s are required to adhere to the following structure: the <nav> is required to contain an <ol>, optionally preceded by a header (<h1>-<h6>, or <hgroup>). The <ol> is required to contain at least one <li>. The <li> is required to contain either an <a> or a <span> which can contain any normal HTML (or, more precisely, HTML Phrasing Content), followed (optionally for <a>, required for <span>) by an <ol>, which if present needs to meet the same structure criteria as the first <ol>. (Thus allowing for arbitrarily deep nesting.) There are no restrictions on what attributes any of these elements can have.

Each <a> or <span>’s contents need to correspond to a nonempty text label following whitespace normalization, or else the <a> or <span> needs a title or alt attribute giving it a textual rendition. If it contains any HTML which lacks an intrinsic text alternative, it specifically needs a title attribute, whose value should be a text label for use by any readers which can’t or don’t want to display the full content.

Semantically, what this all translates to is:

The epub:type attribute admits arbitrary values. However, there are three values which readers are likely to understand on a <nav>. Those values are:

The "toc" is the only mandatory one of those three. There should be exactly one "toc" in the navigation document. Its hrefs need to be ordered linearly relative to the <spine>’s ordering: you can’t href to an earlier spine item after a later one, or to an earlier fragment within a given item after a later one.

The "page-list" is optional, but if it’s present there should only be one of it. Its hrefs need to be ordered to match spine order and in-file order, much like the "toc"’s do. Ideally, although not mandatorily, its internal structure should be flat rather than nested. To improve ease of reading, you can optionally include self-closing <pagebreak> elements in the target files to annotate exactly where the page list is linking to.

The "landmarks" is, like the "page-list", optional but with a maximum of one present. Each <a> element descended from it needs a defined epub:type attribute providing semantics on what the pointed-to landmark is, and you can’t have more than one link with the same type pointing to the same file or fragment.8 The default epub:type vocabulary, usable without prefixes, is here; for more on epub:types, see the XHTML section below.

If you give a <nav> an epub:type value other than those three (e.g. for use by an EPUB subformat), it’s required that the relevant nav start with a human-readable header. (Whereas the header’s presence is optional, for those three values.)

To hide a <nav> or a subsection thereof from the HTML view of the navigation document while keeping it machine-readable, you can add the attribute hidden="" to the relevant element. This can be useful if you’re using the navigation document as your inline table of contents but don’t want to distract your readers with an inline page list, for example.

Since the navigation document is, at its core, an XHTML file, it can be part of the spine, same as any other XHTML file can. But it’s not mandatory that it be part of the spine, if you’d rather have it not be.

Content Documents

Having now summarized the essentials of the OPF and the navigation document, let’s get on to summarizing the files which comprise the actual book content which people read: the XHTML and/or SVG files, along with the various supplemental files (such as CSS) which support them.

In EPUB 2.0.1, specific HTML and CSS versions were defined as the official ones for use in one’s books. Not so, in EPUB 3.2; in EPUB 3.2, there are no version numbers specified, so you can be as close to the cutting edge as you want. (Although, of course, as you get closer to the cutting edge, the odds will go up that readers won’t know how to parse everything, so there’s a tradeoff there.)

Prefix-declaration can be performed in XHTML and SVG files as in OPFs, with two exceptions. First: prefixes are declared in the root <html> element of the XHTML or the root <svg> element of the SVG. (Instead of in the root <package> element of the OPF, which is nonpresent in XHTML and SVG). Second: the prefix attribute lives in the OPS namespace (http://www.idpf.org/2007/ops), which is traditionally prefixed as epub:; thus you need to declare your use of that namespace before you can use the prefix element. (Which will, if you’re being traditional, then be rendered as epub:prefix.)

XHTML

XHTML files (as defined in the HTML Living Standard, albeit referred to there as ’the XML syntax’ rather than as ‘XHTML’) are likely to comprise your primary book content. They traditionally have a .xhtml extension.

(I should note: all discussion here about XHTML files applies to the navigation document, too, since it itself is an XHTML file.)

For the most part, standard HTML rules apply, albeit with various EPUB-specific extensions to and restrictions on the format (summarized below). However, reader support for certain HTML features is limited, even aside from those restrictions. In particular, readers are explicitly not required to support scripting, HTML forms, or the HTML DOM, and as such many don’t.

Semantic Tagging

For purposes of general semantic tagging, you can (given the epub: namespace import discussed above) apply an epub:type attribute to any element in your XHTML file. You saw one use of these in the context of the navigation document, to annotate the <nav>s for purposes of machine-readability; but the tag is intended as a fully-general semantic tagging mechanism, not just as a means of tagging <nav>s in specific.

The attribute’s value should be a whitespace-separated list of property values. As is the pattern with these things, there’s a default unprefixed vocabulary, plus the opportunity to use arbitrary other vocabularies given prefixes. However, the default vocabulary this time is somewhat too large to conveniently summarize in full here; instead, I’ll just link it, here (Archive), for perusal by anyone interested. (The linked page, unlike most of the EPUB definition, is a pretty straightforward read, just long.)

Two prefixes are reserved for use with epub:type even without declaration, although declaring them is still recommended for maximum compatibility. Those prefixes are:

The semantic annotations you add to a given element with epub:type aren’t allowed to be redundant with the element’s default semantics. So, for instance, if you’ve got a prefixed vocabulary with a "paragraph" property defined, you shouldn’t attach that property to a <p> element; the <p> element already denotes paragraph-hood.

For purposes of additional machine-readable semantic annotation, you can also use RDFa Core attributes and Microdata attributes in your XHTML, although readers won’t necessarily know how to interpret them.

Text-to-Speech

For purposes of enabling text-to-speech reading of your book, you can use two SSML attributes, alphabet and ph. In order to use these, you’ll need to use the W3C Speech Synthesis namespace (https://www.w3.org/2001/10/synthesis), which is traditionally prefixed as ssml:; thus, they’ll end up being written as ssml:alphabet and ssml:ph, respectively. They each inherit from the equivalent attributes on the SSML phoneme element except where otherwise specified.

ssml:alphabet specifies a phonetic alphabet for use by ssml:ph attributes. You can attach it to any element in the XHTML tree; a given ph will, if the element it’s attached to has a defined alphabet, use that one, or, if not, use the alphabet of its nearest ancestor which does have a defined alphabet. There’s no official registry of alphabets, despite what the SSML specification claims; but the most likely alphabet for a given reader to support, if it supports any, is "ipa".

ssml:ph, meanwhile, should have as its value a phonemic or phonetic representation of the text of the element it’s attached to, as written in whatever alphabet it’s using. It can be attached to any element, with the exception that you can’t have a ssml:ph-tagged element as a descendant of another ssml:ph-tagged element. Instead, when an element with descendants has an ssml:ph attribute, that attribute’s value should represent the text of all of its descendants, in the order they appear within the XHTML.

Other Attributes

There are deprecated attributes, switch and epub:trigger, which I won’t bother discussing here.

You can also have arbitrary other attributes defined by third parties. (Designers of readers, for instance, might define attributes which allow books to be fine-tuned to display nicely in their specific readers.) They need to be namespaced to namespaces which are neither http://www.w3.org/1999/xhtml nor http://www.idpf.org/2007/ops.

Embedded MathML and SVG

You can embed MathML in your XHTML, albeit only a subset thereof.

<math> elements need, for the most part, to contain only Presentation MathML. The exception is within <annotation-xml> children of <semantics> elements, which can also contain Content MathML. When Content MathML is included in an <annotation-xml> element, the element’s encoding attribute’s value has to be either "MathML-content" or "application/mathml-content+xml", and its <name> attribute’s value has to be "contentequiv". You can’t include any deprecated mathML.

Readers are required to support Presentation MathML, but support for Content MathML is optional, so the former is more likely to display correctly than the latter is.

You can also embed SVG in your XHTML, following the SVG rules summarized below. If you do so by linking an SVG file, CSS styles applying to the XHTML file won’t apply to the SVG; if you do so by way of defining the <svg> element inline in the XHTML, though, the XHTML’s CSS styles will apply to the SVG.

When embedding SVG, if the SVG uses any prefixes, those prefixes need to be declared on the embedding XHTML’s root <html> element.10

Content and Specific Elements

When writing text in your XHTML, if that text contains codepoints from any of the Private Use Area Unicode ranges, it’s required that said text be styled with embedded (not merely externally linked) fonts which contain glyphs for those codepoints.

It’s unnecessary to use <rp> elements, when using ruby tags in your HTML; they’re designed as a fallback method for readers which don’t know how to handle ruby tags, but EPUB 3 readers are required to know how to handle ruby tags, rendering them pointless.

It’s not required, but it is recommended, that you avoid using <embed> elements to embed anything which includes scripting, because <embed> doesn’t have a fallback mechanism for use if the reader doesn’t support scripting. Instead, the recommendation is to use <object> elements, which do have an intrinsic fallback system.

Normally, you can reference fallback-requiring image formats within <img> elements, given appropriate fallbacks in the manifest. However, if an <img> element is a child of a <picture> element, you can’t do that; <img> children of <picture>s need to reference only non-fallback-requiring image formats. Moreover, any <source> siblings of those <img>s are also banned by default from referencing fallback-requiring media types, although they’re allowed to do so if they explicitly specify in their type attributes what fallback-requiring format they’re referencing.

On the opposite side of things, <track> and <video> elements (including <video>s’ child <source> elements) can refer to ordinarily-fallback-requiring media types even without any defined fallbacks, as can <link> elements whose rel values are set to "pronunciation".

SVG

While XHTML is the standard default format for EPUB content, SVG (whose files traditionally have a .svg extension) is an option too, for use in cases where the book’s content (or some part thereof) is fundamentally image-centric such that displaying images directly is a more appropriate rendering method than displaying XHTML containing embedded images. So, for instance, you might use an SVG for your book’s cover image, since there’s nothing there except the image. You might use SVGs as your primary book content format if your book is a comic book, for similar reasons. Et cetera.

SVGs’ elements can be annotated with epub:type attributes much like XHTMLs’ elements can, subject to all the same rules and restrictions.

The SVG’s <title> element has to contain only HTML phrasing content valid in XHTML.

When using a <foreignObject> element in your SVG, it has to contain either HTML flow content or a single HTML <body> element. (If the SVG is embedded in XHTML, then the <body> option isn’t allowed.) Its content has to be a valid XHTML document fragment. If it has a requiredExtensions attribute, said attribute’s value has to be "http://www.idpf.org/2007/ops".

The EPUB specification contains an inconveniently vague warning to the effect that not all SVG features are supported by all readers, with no detail as to which features tend to be missing. Be accordingly cautious, when including SVG in your book.

CSS

CSS in EPUB works mostly like it does elsewhere, but with a few discrepancies:

It’s also worth noting that some readers will struggle with some parts of CSS. In particular, fixed or absolute positioning on the CSS level may interact poorly with readers’ pagination (due to discrepancies between software-reported and actual viewport sizes, especially if the reader is also splitting its view into multiple columns), and some readers run on devices with high-latency screens which will render CSS animations and transitions poorly.

Scripting

Although reader support for it is, as previously mentioned, not guaranteed, you can use scripts in your EPUB, using either XHTML or SVG <script> elements.

When scripts generate files for display, it’s required that those files be of non-fallback-requiring media types, since there’s no way to dynamically implement fallbacks for them.

The EPUB specification draws a distinction between two sorts of scripts: “spine-level” scripts, where the script is used directly by one of the files in the spine, and “container-constrained” scripts, where the script is in a file not in the spine, then embedded in an XHTML file that is in the spine via <iframe>. Of the two, container-constrained scripts are likely to have fewer compatibility issues with readers, although even they still will often not work and thus should in practice be avoided when practical.

If a file contains spine-level scripting, it’s required to, when rendered by a reading system which doesn’t support scripting (or which supports it but has it disabled), still be consumable by the reader without substantial information-loss or reduction in the quality of the content. So you can’t make the script essential to the reading experience; it has to be purely an optional nice-but-nonmandatory frill.

Container-constrained scripts, meanwhile, are banned from modifying any file’s DOM11, and from changing the sizes of their containing rectangles. They can’t escape their containers, in other words.

The EPUB format includes a small extension to JavaScript, adding in an object, navigator.epubReadingSystem, which allows scripts to check the nature of the readers they’re being run on. It has two non-deprecated properties:

…plus the deprecated property navigator.epubReadingSystem.layoutStyle, which I won’t go into.

Furthermore, it has one method, navigator.epubReadingSystem.hasFeature, which takes as input a string representation of a feature name plus, optionally, a string representation of feature version number; if it recognizes the feature name, it returns a bool indicating whether or not the reader supports that feature, and otherwise it returns undefined. (Version is there so that, if there’s some feature which has evolved over time, you can avoid incompatibilities.)

There are six built-in features whose names any script-compatible EPUB reader will recognize (and which are officially versionless, such that there’s no need for a version parameter when querying them). Third parties can then define arbitrary additional features whose names readers may or may not recognize, albeit at the risk (since there’s no prefixing system in place) that reliance on them will break forward-compatibility with some future EPUB version in the event of a name collision. The six built-in features are:

Pronunciation Lexicons

Each XHTML document can be associated with zero or more PLS lexicons (traditional file extention: .pls) providing pronunciation information for use by text-to-speech systems. The associations should be indicated with HTML <link> elements with rel="pronunciation", type="application/pls+xml", and (ideally, not strictly mandatory) a defined hreflang attribute on each link indicating the language for which the linked lexicon is relevant.

If PLS lexicons and the previously-discussed SSML pronunciation attributes are both present in a given file, the SSML attributes will take precedence over the PLS lexicons.

Page Flow Control

With that, we’ve covered all of the essentials of the EPUB format. However, there exist various optional features which can be used to provide additional layers of functionality. Three such features—fixed layouts, page spreads, and rendition:-prefixed properties—allow you to control how the book is displayed, on the page-flow level. Several parts of these features are, unfortunately, pretty underdeveloped, such that the features end up less useful than they could be; but nonetheless they exist and seem worth summarizing.

Fixed Layouts

Fixed layouts, also known as pre-paginated layouts, are an EPUB feature which allows you to define the dimensions, in pixels, of the space in which your book is intended to be rendered. Thus, instead of reflowable text in the traditional HTML style, where the reader can change viewport dimensions in arbitrary fashion and have the text automatically adjust, fixed layouts will ensure that the book has the same layout irrespective of viewport dimensions, with readers employing black bars or equivalent technology as needed to hold to this requirement.

(However, the requirements on how readers render the content within the defined dimensions are relatively limited. You can trust that dimensions will be the same between readers; you can’t trust that, for example, font size or line spacing will be. Thus, while not entirely meaningless, fixed layouts don’t get you anywhere close to e.g. PDF-level control over how page contents are displayed.)

To define a fixed layout for an XHTML file, add a <meta> tag to the document’s <head>, with the <meta>’s name attribute set to "viewport" and with its content attribute’s value being, to simplify somewhat, "width=X, height=Y", with X and Y being replaced with numbers (representing numbers of pixels).12 So, for example, <meta name="viewport" content="width=1200, height=800" /> gets you a 1200x800 viewport.

To define a fixed layout for an SVG file, add a viewBox attribute to the root <svg> element. This is already a defined part of the SVG specification, technically, rather than part of EPUB per se. But, to briefly summarize the EPUB-relevant bit: the viewBox attribute’s value should be a space-separated list of four numbers, of which the first two do SVG things I won’t attempt to describe here, the third is width in pixels, and the fourth is height in pixels. So, for a 1200x800 viewport, you’d do something like <svg [...] viewBox="0 0 1200 800">.13

Page Spreads

The spine has two non-prefix-requiring properties, both of which I previously refrained from summarizing: page-spread-left and page-spread-right. These properties allow you to define page spreads, making sure the reader renders a given pair of pages horizontally adjacent to one another. (That is: a spread of pages A and B will place A on the left side of the screen and B on the right side of the screen.) They apply both to fixed and non-fixed layouts’ pages, although in both cases only when the reader is set to display spreads (which can be controlled through the rendition:spread property, discussed below).

The page-spread-left property, when applied to a spine itemref, indicates that the item should be rendered on the left half of a spread, even if this requires that the reader insert a blank page before it. The page-spread-right property indicates the same, but for the right half of a spread.

When used together on a sequential pair of <itemref>s—one with page-spread-left, then the next with page-spread-right—these properties will add up into a true spread, with the two items being displayed side-by-side.

(The specification fails to discuss what happens when a file marked as part of a spread exceeds one page in size.)

These two properties—alongside the below-defined rendition:page-spread-center—take precedence over the CSS page-break-before property, if that property is present.

You can only have one of page-spread-left, page-spread-right, or rendition:page-spread-center on a given itemref. No marking an item as belonging only on the left and only on the right simultaneously, or anything along those lines.

rendition:-prefixed properties

There exist various properties from the EPUB Package Rendering Vocabulary, usable in the OPF via the reserved prefix rendition:, which allow you to control your book’s page flow in various ways. These properties are:

(There’s also a deprecated one, rendition:viewport, which I won’t go into.)

For the spine <itemref> properties overriding global default <meta> properties, you can only have a maximum of one override per <meta> per itemref. You can’t mark an item as having both rendition:flow-paginated and rendition:flow-scrolled-doc, for instance. (But you can mark it as having both rendition:flow-paginated and rendition:orientation-landscape, since those override different <meta>s.)

Media Overlays

Media overlays are an EPUB feature which let you set up hybrid text-and-audio books, automatically scrolling through the text and highlighting the part currently being read as the audio plays. Not all readers support them—in fact, while I haven’t surveyed the market fully, I’d go so far as to predict that most readers don’t support them—but they’re really cool, and enable a far richer set of playback features than currently-conventional pure-audio audiobooks support, so I very much hope that people will adopt them further over time.15

Although media overlays technically can be laid over both XHTML and SVG, they’re primarily optimized for XHTML, with an explicit lack of guarantee of consistent behavior when used on SVG. So, in this section, I’ll be assuming that the content files containing the text with which the audio is being synced are XHTML ones.

A media overlay file (traditionally equipped with a .smil extension) might look something like this:

<?xml version="1.0" encoding="UTF-8"?>
<smil version="3.0" xmlns:epub="http://www.idpf.org/2007/ops">
    <body>
        <seq epub:textref="../Text/Chapter 01.xhtml">
            <par>
                <text src="../text/Chapter 01.xhtml#par1sen1" />
                <audio src="../audio/Chapter 01.mp3" clipBegin="00:00" clipEnd="00:13.56" />
            </par>
            <par>
                <text src="../text/Chapter 01.xhtml#par1sen2" />
                <audio src="../audio/Chapter 01.mp3" clipBegin="00:13.56" clipEnd="00:23.2" />
            </par>
            <!--Imagine a bunch more pars here, because realistically there's probably more than one paragraph-->
        </seq>
        <seq epub:textref="../Text/Chapter 02.xhtml">
            <!--More pars go here-->
        </seq>
        <!--More seqs go here-->
    </body>
</smil>

Syntax

The root element of the media overlay file (which, in traditional EPUB fashion, is yet another XML subformat) is the <smil>, which is required to have a version of "3.0" and which can optionally have an id attribute and/or an epub:prefix attribute. (The latter of which is equivalent to the XHTML epub:prefix attribute described in the section on content documents, and, like in the XHTML case, requires that you first add in an xmlns:epub="http://www.idpf.org/2007/ops" declaration, as do all the future epub:-prefixed attributes I’ll be mentioning throughout my summary of the media overlay format.) It should contain a single <body> element, optionally preceded by a single <head> element.

(The <head> is there only for extensibility. Its only allowed contents are a single <metadata> element, which can contain arbitrarily many elements from any namespace, leaving room for subformats to do their own things with it. So you can in practice ignore it.)

The <body> has three optional attributes: epub:type (as previously discussed in the context of XHTML), id, and epub:textref. The epub:textref’s value, if present, should be a relative IRI to the XHTML file the media overlay is overlaying over, optionally with a fragment identifier to the specific element of that file which is being overlayed over. Either way, this is mostly useful if you’re running a one-media-overlay-file-per-XHTML-file pattern, rather than using a single media overlay file for the whole book. Content-wise, the <body> should contain a nonzero number of <par> and/or <seq> elements intermixed in any order, such that you’re allowed to have zero <par>s or zero <seq>s but aren’t allowed to have zero of both.

<par> elements serve as the actual core of the media overlay format, with each one defining a pairing of text and audio to be shown and played in parallel. Attributes-wise, they can optionally have epub:type and/or id. Contents-wise, they need to contain a <text> element and, technically optional but in practice you’re generally going to want it there, an <audio> element.

(The <audio> can be not-included if the <text>’s target fragment is an image, audio, or video embedded directly in the target XHTML, or if the <text>’s target fragment is intended to be read via text-to-speech instead of via pre-recorded narration.)

The <text> element needs a src attribute pointing by relative IRI to a text fragment (a fragment identifier is mandatory here), and can optionally have an id attribute; it can’t have any contents. The <audio> element similarly lacks contents, and similarly needs a src attribute, this time pointing by relative or absolute IRI to a non-fallback-requiring audio file. In terms of optional attributes, the <audio> can optionally have an id, as well as clipBegin and/or clipEnd attributes. Those latter two are very important, if you’re not running on a one-audio-file-per-sentence basis or something. Each of those attributes takes as its value a SMIL clock value, representing respectively where in the target audio file to start playback and where in the target audio file to end playback in order to play the section corresponding with the <text>. Thus you can use them to, for instance, have one audio file per chapter, but still sync the text to the audio one sentence or paragraph at a time.

(SMIL clock values allow a wide range of input formats, but to oversimplify and thus let you avoid having to read the not-maximally-easy-to-read documentation there: the basic format is Hour:MM:SS.fraction, where Hour is an arbitrarily-large integer number of hours, MM is minute count from 00 to 59, SS is second count from 00 to 59, and fraction is arbitrary-precision fraction-of-a-second. The Hour: part and the .fraction part are both optional and can be left off if unnecessary.)

And then we get to the <seq> element, which is the media overlay format’s implementation of the same sort of nested structure that we saw previously in e.g. the navigation document’s <nav>s. A <seq>, like the <body>, can optionally have epub:type, and id attributes, and then contains <seq>s and/or <par>s, at least one of at least one of those, intermixed in any order, thus allowing for arbitrary nesting. Their only difference from the <body> is that their epub:textref is mandatory rather than optional; this is to make sure that the reader knows what text-chunk the <seq> corresponds to, when piecing together the linear audio path through the book.

Practicalities

Any given XHTML (or SVG) file can be associated with only a single media overlay file. The inverse, however, is not the case. So you can have a media overlay file per chapter (or however you divide your book filewise), but you can also just have a single big media overlay file covering your whole book, or large sections thereof, or suchlike.

Within a given media overlay file, the <body> represents the main linear playback sequence; order of elements within the <body> corresponds with playback order. <seq>s represent subsequences. And <par>s represent individual text:audio pairs to be played back in sync. Standard implementation is to have each <par> correspond to a sentence or a paragraph of the book’s text, along with the audio rendition thereof.

While in theory you can have all your <par>s as a flat linear sequence within the <body>, in practice, there are two good reasons to instead use a nested structure with <seq>s approximately corresponding to your table of contents structure. The first is for the sake of human-readability, making it easier to understand the structure of the media overlay file on a skim. The second, and more important, is to enable reader-level nuanced understanding of the playback structure. Through use of <seq>s, it becomes possible for the reader to offer easy “skip to the next section” or “skip to the next chapter”-style playback-control options, whereas those would be a lot less convenient given a fully flat structure. Thus I highly recommend use of <seq>s rather than just <par>s.

Not every element in the source XHTML has to have an associated <par>; only those with audio narration. You can also include <par>s without associated audio; the three major use cases for this are:

  1. Pointing at text without audio-file-based narration, in order to have it be narrated by text-to-speech.
  2. Pointing at non-narrated image embeds, so that the playback still will visually focus on them rather than skipping over them.
  3. Pointing at audio or video embeds, which the media overlay playback will then play directly, without requiring that users pause the media overlay playback in order to play the embeds.

You should avoid using scripts to control playback of any audio or video which is going to be controlled by the media overlay; otherwise there’s a risk of the overlay playback and the script interfering with one another.

Ordering of elements within the media overlay playback sequence is required to line up with the ordering of the corresponding text. So you can’t have your overlay go from a file later in the spine to one earlier in the spine, or from a fragment earlier in a given file to one later in that file. (This is why <seq>s are required to have epub:textref values, in order to make this easier to enforce.)

epub:type annotation of the overlay, while not mandatory, can be helpful, because it lets readers recognize and offer the option of skipping over parts of the book content which readers might want to examine in detail but also might want to bypass. Specific types which the EPUB specification points out as particularly likely to be useful here are "footnote", "endnote", "pagebreak", "table", "table-row", "table-cell", "list", "list-item", and "figure".

Marking media overlays in the OPF

If a given XHTML or SVG file has an associated media overlay, then its manifest <item> has to have a media-overlay attribute whose content is the manifest ID of the relevant overlay.

Off in the metadata, meanwhile, there are a variety of media:-prefixed properties you can put on <meta> elements, all relating to media overlays. One of them, in particular—media:duration—is mandatory, so I’ll talk about that one first.

You need a top-level media:duration <meta> element, not refining anything, whose value should be a SMIL clock value (as described above in the context of <par>s) representing the summed-up playback duration of all <audio> elements in the book’s media overlay files. Additionally, for each media overlay file, you need a media:duration <meta> element refining it and listing the sum duration of all <audio> elements in that file.

Off in the realm of non-mandatory media:-prefixed <meta> properties, there are a few:

Note that media:active-class and media:playback-active-class aren’t guaranteed to be supported even by readers which otherwise support media overlays.

Accessibility

I’ll close things off with a summary of the EPUB accessibility guidelines, which are a set of guidelines for how to design one’s EPUB for maximum ease of access to readers with disabilities or unusual preferences which might otherwise interfere with their book-reading. These aren’t an EPUB feature, per se—they’re very much guidelines rather than rules, dealing far more with fuzzy subjectivity and far less with easily machine-checkable well-formedness than the rest of the EPUB specification—but they’re still a worthwhile thing to at least be aware of when making EPUBs, whether or not you intend to follow them in full, since following them even partially can potentially be helpful in widening your book’s audience.

At a high level, the structure of the EPUB accessibility guidelines is: there are various bits of metadata you can put in your EPUB in order to indicate ways in which it can or can’t be easily accessed. A book which includes this metadata, plus is distributed in a manner that won’t impair users’ assistive tech, is known as a Discoverable EPUB, one which meets the minimum bar of “assistive tech can discover and tell you the ways the book is or isn’t accessible”.

From there, there are then two sub-specializations for additional layers of accessibility: Accessible EPUBs, which are discoverable EPUBs that also conform to various standardized requirements for general-purpose accessibility, and Optimized EPUBs, which are discoverable EPUBs optimized for accessibility to some specific audience, potentially at the cost of general-purpose accessibility. The two aren’t mutually exclusive—a single EPUB can be both Accessible and Optimized—but neither does either imply the other.

(These technical terms, ‘Discoverable’ and ‘Accessible’ and ‘Optimized’, collide inconveniently with normal day-to-day terms. An EPUB can potentially be accessible while still not being Accessible. Take care not to let the technical meanings and the normal meanings run together in your mind; otherwise you’ll be setting yourself up for confusion.)

Discoverable EPUBs

For an EPUB to count as Discoverable, it needs to meet two requirements: inclusion of discovery metadata (which allows readers to easily discover a book’s accessibility status), and accessible distribution (which allows readers-in-the-other-sense, as in people reading the book, to properly include the book in their workflow).

Discovery Metadata

All discovery metadata should be listed by way of <meta> elements in the OPF’s <metadata>, not refining any other elements. It can also optionally, for backwards-compatibility, be additionally used by way of EPUB 2.0.1 <meta> elements, albeit with standard disclaimers that those need to be in addition to rather than instead of the normal <meta> elements.

There are a total of seven discovery metadata properties, of which four are mandatory to include in an EPUB for it to count as Discoverable, while the remaining three are optional. Each can be included arbitrarily many times, not just zero or one. The mandatory four are:

The optional three are:

Including these in <link>ed metadata records won’t fill the inclusion requirement for the required ones, if they’re not also in the OPF.

Accessible Distribution

There are two requirements, when distributing an EPUB, for its distribution to be counted as accessible (and thus for the EPUB to be counted as Discoverable).

First: if the EPUB is being distributed together with an ONIX record or similar external metadata-containing record, that record should contain as much of the accessibility-related metadata—the discovery metadata, as well as any metadata that might have been added in the process of meeting the accessible EPUB or optimized EPUB requirements—as can be included in it.

Second: if the EPUB is being distributed with DRM, the DRM has to be designed in such a way as to not impair any assistive technology people might use to help with their reading. In practice, almost every DRM scheme will impair at least some assistive technologies—a custom-built assistive reader isn’t going to be any more able to decrypt a DRM-laden EPUB than any other not-the-official-decryption-tool reader is—so this is going to end up collapsing down into the EPUB being DRM-free, at least unless the DRM-builders get very creative in some way I’m currently failing to anticipate.

Accessible EPUBs

For an EPUB to count as Accessible—the general-purpose “this will probably be accessible for most people” criterion—it needs to be discoverable, it needs to meet all the WCAG Level A accessibility criteria (and it’s recommended, although not mandatory, that it meet all the level AA ones, too), and then there are some additional EPUB-specific recommendations and requirements on top of the WCAG ones.

In terms of recommendations, there are two major ones.

First: if your EPUB is designed to be the EPUB equivalent of a statically-paginated book (print or PDF or suchlike), it’s recommended that you include metadata marking where the page breaks are in that statically-paginated edition. In practice, there are a few actionable steps to take here:

Second: if you’re using media overlays (which, themselves, help accessibility, although the specification doesn’t go so far as to recommend their inclusion), you should use epub:type annotations to help with skippability. Also, you should make sure to include a media overlay for the navigation document.

So that’s the recommendations. Now, in terms of mandatory EPUB-specific inclusions, there are two, both of which go in the OPF’s metadata.

First: you need a <link> with a rel value of "dcterms:conformsTo" and an href value indicating the highest level of WCAG conformance your book meets. The three allowed values are:

Second: you need a <meta> with a property value of "a11y:certifiedBy", refining nothing, whose content should be the name of whatever person or organization has certified that the book is accessible as advertised. (This could potentially be the publisher, or some third-party accessibility consulting service the publisher outsources to, or the author, or a variety of other figures.)

You can optionally supplement your a11y:certifiedBy tag with two additional pieces of metadata: another <meta>, also not refining anything, with property a11y:certifierCredential, naming whatever formalized credentials the certifier might have relevant to marking their certification as trustworthy; and a <link>, with rel "a11y:certifierReport", whose href points at a report the certifier has written up with a more detailed assessment of the book’s accessibility.

Optimized EPUBs

For an EPUB to count as Optimized—the “specially engineered to be accessible for This Specific Demographic” criterion—it needs to be discoverable, and its OPF metadata needs to contain a <link> with a rel of "dcterms:conformsTo", linking to whatever standard or guideline of accessibility-to-a-specific-demographic it’s optimized in accordance with. And then it needs to be optimized in accordance with that standard.

If the link isn’t sufficient to make clear, to the unfamiliar, what being-optimized-to-that-standard entails—if the standard’s specification is paywalled, for instance—then it’s also required that the metadata include a <meta>, refining nothing, property schema:accessibilitySummary, whose content should be a description of how the book has been optimized.

Conclusion

This has been my summary of the EPUB 3.2 format. It is… really dramatically more elaborate than EPUB 2.0.1, and makes up for this by being also-dramatically more powerful, able to do all sorts of interesting things that EPUB 2.0.1 can’t. I hope this writeup will help to make writing EPUB 3.2 files more accessible, for anyone interested in doing so.

At the time of this writing, EPUB 3.3 is a work-in-progress; you can read the latest draft here. I don’t know when it’s going to be officially released, but EPUB 3.0 was released in 2011, EPUB 3.0.1 in 2014, EPUB 3.1 in 2017, and EPUB 3.2 in 2019, so if the timing stays rasonably on-pattern I’d expect 3.3 to be released some time this year. (2022, for any future readers.) EPUB 3.3 is going to be entirely backwards-compatible with 3.2, so this summary should hopefully remain useful—if no longer quite as complete—even after 3.3’s release.


  1. EPUB Media Overlay format is a subset of SMIL format. ↩︎

  2. This applies only at the ZIP level; some file formats, such as .png, have their own non-Deflate compression built-in, and this rule doesn’t exclude files of those formats, only the application of further non-Deflate compression to them as part of the zipping process. ↩︎

  3. For those unfamiliar with IRIs (as I was, prior to researching this post): IRIs are like URIs, but extended to also support Unicode. Any valid URI is a valid IRI, although not all valid IRIs are valid URIs. See here for details if interested. ↩︎

  4. By ‘file-relevant-to-rendering-the-book’, I mean to exclude e.g. web links which are purely there to be clicked on and thereby opened in-browser, but whose content will never be displayed in any way in-book. ↩︎

  5. It also says, literally just a single paragraph earlier: “When the rights.xml file is not present, no part of the container is rights governed at the [EPUB file] level. Rights expressions might exist within the contained Renditions.” It’s unclear how this statement is supposed to be compatible with the statement that no part of the container is rights governed if rights.xml is nonpresent; I prioritized the latter over the former in my summary on the principle that, when in that sort of ambiguous situation, it tends to be wise to assume the less convenient reading rather than the more convenient reading. ↩︎

  6. At the time of this writing, this page is no longer being served. Nonetheless it’s the standardized IRI for use with the onix: prefix as defined in the EPUB 3.2 specification. You can read an archived version of the page here, or get links to the latest versions of the specification here↩︎

  7. Scripts are an exception here. Much as they’re allowed to render external files not in the manifest, so similarly they’re allowed to render internal files lacking otherwise-necessary fallback chains. ↩︎

  8. The specification’s phrasing is ambiguous regarding whether this means you can’t have two links of the same type to different fragments of the same file. I expect that you can, though, since otherwise they wouldn’t have bothered the ‘or fragment’ qualifier at all. ↩︎

  9. As of the time of this writing, this page is no longer being served. Nonetheless it’s the standardized IRI for use with the prism: prefix as defined in the EPUB 3.2 specification. You can read an archived version of the page here, or, for what seems to be the new home of the PRISM specification (although I haven’t vetted it beyond skimming the table of contents), see here↩︎

  10. It’s unclear whether this is instead of or in addition to being declared in the SVG’s root svg element. ↩︎

  11. The precise phrasing in the specification is that a container-constrained script isn’t allowed to “contain instructions for modifying the DOM of the parent Content Document or other contents of the EPUB Publication”. It’s not entirely clear to me whether ’the parent Content Document’ means the XHTML file in whose <iframe> the script-containing file runs, or whether it means the script-containing file itself; out of caution, I’d recommend avoiding editing any DOMs at all, but the phrasing leaves room for an interpretation wherein editing the DOM of the script-containing file is allowed even if editing any other file’s DOM isn’t. ↩︎

  12. The fully-complicated summary is: it has to be CSS Device Adaptation Module Level 1 syntax, but readers are only required to (and thus, in practice, most likely only will) understand the width and height expressions. ↩︎

  13. The [...] in this example should be understood to represent textual omission of irrelevant attributes, rather than an actual sequence of brackets and periods within the SVG. ↩︎

  14. The specification doesn’t discuss how readers are supposed to deal with part of the book being set to scrolled-continuous and other parts being set to not-that. ↩︎

  15. The cynic in me says that this won’t happen because, in the current state of the book market, audiobook rights and non-audio ebook rights tend to be sold separately and thus it would be hard for either rightsholder to pull together a proper text-and-audio EPUB audiobook. The idealist in me says that, no, workarounds to realign the incentives exist, we just need to be clever about finding them. ↩︎


Tags: EPUB