What is EPUB
EPUB stands for electronic publication. It is an open file format used for eBooks. While eBooks typically refer to digital books, EPUBs is generically meant to include things that were represented as PDFs. EPUB is the distribution and interchange format standard for digital publications and documents based on Web Standards and is an IDPF (now a part of the W3C) standard.
What the EPUB format does
EPUB defines a means of representing, packaging and encoding structured and semantically enhanced Web content — including XHTML, CSS, SVG, images, and other resources — for distribution in a single-file format. EPUB allows publishers to produce and send a single digital publication file through distribution and offers consumers interoperability between software/hardware for unencrypted reflowable digital books and other publications. The EPUB file is a zip archive so it is portable.
- XML structures
- HTML and CSS resources
And – with EPUB 3
- audio and
- video assets.
More information on EPUB 3 can be found at EPUBZone website EPUB 3 overview.
Using open Web Standards in EPUB brings many advantages to the publishing industry:
- Web Standards are interoperable, meaning they aim at being usable on any kind of device; so is the EPUB standard.
- Developers of the EPUB specification benefit from the work of the entire Web community. As an example, ebooks accessibility is leveraged by the work done by the W3C on the subject [WAI].
- Developers of EPUB authoring solutions can create such tools as variants of Web authoring solutions. Developers of reading applications can use as core for their rendering engine an off-the-shelves Web browser engine.
By using Web Standards, the publishing industry avoids reinventing the wheel, albeit the publishing industry must continue to adapt this “wheel” to the chapters and pages of ebooks so that their historical context is preserved. As the reference format for distribution and interchange in the digital publishing indstry for books and media, EPUB allows publishers to produce and send a single digital publication file through digital distribution networks and offer consumers interoperability between software/hardware for reflowable or fixed-layout ebooks.
History of EPUB
2007 EPUB 2It was initially standardized in 2007 as a successor format to the Open eBook Publication Structure or "OEB" which was developed in 1999.
EPUB3 vs EPUB2
Added and/or improved in EPUB3:
- HTML5: EPUB 2 supports XHTML 1.1 and DTBook. With the support of the XML propoerties of HTML 5 in EPUB 3, it is now possible to use more detailed semantic markup (e.g. use
- Semantic Inflection: A new
epub:typeattribute, when added to HTML 5 markup, defines the precise nature of structural markup, in line with the publisher intended book semantics.
- Navigation: EPUB3 defines a new human-and-machine readable grammar for the navigation document, based on the HTML 5
navelement. It replaces the EPUB 2 .ncx file which now deprecated.
- SVG documents: They can now appear directly in the spine (they no longer need to be nested within an xhtml file).
- MathML: The XML markup language dedicated to the presentation of mathematical notations is now a first class citizen in EPUB publications.
- Content switching: It has been simplified by having its processing model defined so that it does not require document preprocessing.
- EPUB Navigation Documents: supersedes the NCX grammar used in EPUB 2.
- Linking: Linking schemes have been added. At the moment there’s only one available. Please refer to the Canonical Fragment Identifiers.
Scripting and Interactivity
- Triggers: Trigger is an element included in HTML5 for EPUB that allows declarative bindings of activation events (such as “play”, “pause” for an audio event)
- Bindings: You can now script your own handles for uncommon media files.
Styling and Layout
- Fixed Layout: see Reflowable vs Fixed Layout.
- Added modules from CSS3: It also includes alternate style tags, allowing the creation of custom viewing modes, such as day, night, etc.
- Embedded Fonts: EPUB 3 requires Reading Systems to support the OpenType and WOFF font formats for embedded fonts in conjunction with the CSS @font-face rules.
- Font Obfuscation: A new normative section on Font Obfuscation [OCF3] has been added the Open Container Format specification.
Rich Media and Speech
- Media overlays: With the possibility of adding audio, EPUB includes a way to synchronize it with the text.
- Text-to-speech: The possibility of a text-to-speech ebook is now implemented (using properties such as SSML attributes in XHTML content documents.
- Audio and video: EPUB 2 has support for raster images only. Thanks to HTML 5, EPUB 3 publications can reference audio or video assets via the
videotags, and therefore audio and video assets can be natively processed by modern browser engines.
- Publication Metadata and Identity: New but mandatory metadata has been added,
dcterms:modifiedDate on which the resource was changed.
- Resource Metadata: there are new properties attributes on the Package Document, allowing the declaration of new metadatas about the resources.
Removed or changed:
- DTBook File format similar to HTML, with special regard to the requirements of the visually impaired.
- Out-of-Line XML Islands an XML document that is not authored in a Preferred Vocabulary.
- Tours A tag to mark points of interest in a publication
- Filesystem Container
- NCX Part of the specification for digital talking book DTB and is used in ePub documents to define the Table of Contents (TOC).
- 2.0.1 meta element meta element annotating the version