Overview of Lightweight DITA (XDITA and HDITA)

The goal of this proposal is to align a lightweight DITA profile in XML with an equivalent markup specification based on HTML5. This is not a complete specification, just something to start the discussion going. There's still lots of room for change, as well as for adding specific mappings for additional semantics for learning and training content, epubs, or other formats.

While XML-based publishing chains remain the industry standard for many content-centric industries (such as publishing, pharmaceutical, and aerospace), concerns have been raised about their complexity, especially as a barrier to new adopters or contributing authors.

Many of the responses fall into one of two categories:

  • Simplify the XML model, as seen with many of the lightweight DITA tools that have come to market over the last few years

  • Rebase on an HTML5 model, as seen with O'Reilly and more recently Pearson

The challenge with the lightweight DITA approach is that historically it has not been standardized, so each implementation introduces a slightly different flavor of lightweight DITA; in addition, there are some adopters who are reluctant to accept any solution that uses XML in any way, simply because of the requirement to do at least one transform step as part of the publishing process.

The challenges with the HTML5-based approach are again based on a lack of standardization: each new extension of HTML5 introduces its own additional semantics and constraints, locking the content into a particular tool or vendor pipeline. The additional semantics and constraints also may require a custom authoring environment, resulting in another barrier to content portability, without the advantages of authoring-time validation that an XML-based approach provides. Finally, even though the approach may eliminate processing steps for the case of simple content, more complex content scenarios – such as content reuse and filtering, or indexing and link redirection – require additional processing steps that reintroduce the complexity of an XML-based approach, without the advantage of existing standards-based solutions.

This proposal suggests a third way – defining both a lightweight XML model based on DITA that can be used for validated authoring and complex publishing chains, and a lightweight HTML5 model that can be used for either authoring or display.

The two schemes – provisionally named XDITA and HDITA - are designed for full compatibility with each other as well as conformance with the OASIS DITA and W3C HTML5 standards. They give HTML5 users a set of standardized mechanisms to access the power and flexibility of DITA's reuse and specialization capabilities, and give DITA users a way to integrate and interact with HTML5-based content systems without complex mapping or cleanup steps.

The key areas for alignment are as follows:

  • Structural elements – paragraphs, lists, etc.

  • Specialized sections

  • Inline markup – highlighting, links, etc.

  • Tables

  • Images and multimedia

  • Navigation/Maps

  • Attributes (including ARIA)



Structural elements

The common structural elements in XDITA and HDITA are almost identical. This is because DITA was originally designed by deliberately borrowing markup conventions from HTML, and HTML has since evolved to add more semantic elements in line with the additional semantics that DITA had added.

HTML5 and DITA are now close enough to achieve a reasonable and semantic mapping with the application of a few simple constraints. The elements below were chosen based on their utility and simplicity. Some elements that exist in both DITA and HTML5, such as block quotes and figures, have still been omitted to keep the overall complexity low.

XDITA

HDITA

<topic>

<article>

<title>

<h1> (in <article>) or <h2> (in <section>)

<shortdesc>

<p>

<body>

No equivalent (ignored)

<section>

<section>

<ul>

<ul>

<ol>

<ol>

<li>

<li>

<dl>

<dl>

<dlentry>

No equivalent (ignored)

<dt>

<dt>

<dd>

<dd>

<p>

<p>

<pre>

<pre>



Specialized structural elements

DITA also has elements associated with structural specializations such as concept, task, and reference. In HDITA, the specialized topics and sections are represented by generic article or section elements with a custom attribute that uses a simplified form of the DITA class syntax.

To align with HTML5, specialized sections for <address> and <aside> are added to XDITA.

XDITA

HDITA

<concept>

<article data-hd-class="concept">

<task>

<article data-hd-class="task">

<reference>

<article data-hd-class="reference">

<example>

<section data-hd-class="topic/example">

<context>

<section data-hd-class="task/context">

<prereq>

<section data-hd-class="task/prereq">

<steps-informal>

<section data-hd-class="task/steps-informal">

<postreq>

<section data-hd-class="task/postreq">

<refsyn>

<section data-hd-class="reference/refsyn">

<address> (from section)

<address>

<aside> (from section)

<aside>



Example

The following table shows a side-by-side example of a simple article with no specializations used.

XDITA

HDITA

<topic>

<title>The point of it all</title>

<shortdesc>I can sum it up here</shortdesc>

<body>

<p>I can say some more stuff</p>

<section>

<title>Stuff</title>

<p>And so on</p>

<ul>

<li><p>This</p></li>

<li><p>Is</p></li>

<li><p>A List</p></li>

</ul>

<section>

<title>And more stuff</title>

<p>With its own explanation</p>

<dl>

<dlentry>

<dt><p>This</p></dt>

<dd><p>Is explained</p></dd>

</dlentry>

<dlentry>

<dt><p>This</p></dt>

<dd><p>Is also explained</p></dd>

</dlentry>

</dl>

</section>

</body>

</topic>

<article>

<h1>The point of it all</h1>

<p>I can sum it up here</p>

<p>I can say some more stuff</p>

 

<section>

<h2>Stuff</h2>

<p>And so on</p>

<ul>

<li><p>This</p></li>

<li><p>Is</p></li>

<li><p>A List</p></li>

</ul>

<section>

<h2>And more stuff</h2>

<p>With its own explanation</p>

<dl>

 

<dt><p>This</p></dt>

<dd><p>Is explained</p></dd>

 

 

<dt><p>This</p></dt>

<dd><p>Is also explained</p></dd>

 

</dl>

</section>

 

</topic>



Inline markup

 

DITA and HTML5 have many of the same inline semantics, although XDITA separates some elements into optional domains. For the sake of increasing alignment, these optional elements will be packaged together into a single domain with element names matched to HTML5.

XDITA

HDITA

<strong> (specialized from <b>)

<strong>

<em> (specialized from <i>)

<em>

<sup>

<sup>

<sub>

<sub>

<cite>

<cite>

<ph>

<span>

<data name="x" value="y">

<data title="x" value="y">

<a href=".."> (specialized from <xref>)

<a href="...">

<code> (from <codeph>)

<code>

<var> (from <varname>)

<var>

<kbd> (from <userinput>)

<kbd>

<samp> (from <systemoutput>)

<samp>

User-added specializations

<span data-hd-class="...">



Tables

The table model has been deliberately constrained to allow only simple forms of tables without column- or row-spanning. However, XDITA is still using a constrained form of the CALS <table> model instead of <simpletable>, so that more complex tables could be pulled in from other DITA sources using the conref attribute if needed.

XDITA

HDITA

<table>

<table>

<title>

<caption>

<tgroup>

No equivalent (ignored)

<thead><row><entry></entry></row></thead>

<tr><th></th></tr>

<tbody>

No equivalent (ignored)

<row>

<tr>

<entry>

<td>



Images and multimedia

The image and object elements are nearly identical between HTML and DITA, since DITA borrowed the markup from HTML. However HTML5 adds new <video> and <audio> elements, which can be added to XDITA as specializations of, and replacements for, <object>.

XDITA

HDITA

<image href=".." height="100" width="100">

<alt>...</alt>

</image>

<img href="..." alt="..." height="100" width="100"></img>

<audio> (from object)
<controls value="y"/> (from param)

<audio controls>

<video> (from object)

<controls value="y"/> (from param)

<poster value="..."/> (from param)

<video controls poster="...">

<source value="..." type="..."/> (from param)

<source src="..." type="..."/>

<track value="..." type="..."/> (from param)

<track src="..." kind="...">

<fallback><p>..</p></fallback> (from desc)

<p>...</p>



Example

Here's an example of a video element in both XDITA and HDITA. The fallback content appears first in the DITA case to match requirements of the DITA specification.

XDITA

HDITA

<video>

<fallback><p>Here's a video of stuff you can't see.</p></fallback>

<controls value="y"/>

<poster value="screengrab.png"/>

<source value="mymovie.mp4" type="video/mp4"/>

<source value="backupformat.xyz" type="video/xyz"/>

<track value="captions.vtt" type="captions"/>

</video>

<video controls poster="screengrab.png">

<source src="mymovie.mp4" type="video/mp4"/>

<source src="backupformat.xyz" type="video/xyz"/>

<track value="captions.vtt" kind="captions"/>

<p>Here's a video of stuff you can't see.</p>

</video>



Navigation/Maps

DITA maps can serve many different functions, most of which (such as aggregation, link management, variable management, or taxonomy management) have no direct equivalent in an HTML5 universe. However, HTML5 has added a <nav> element that provides a match for at least the role of navigation tree or table of contents.

If a DITA map is being used for other purposes, it can still be mapped to a <nav> container for the sake of editing or viewing in an HTML5 context, even if it is not providing an equivalent function. DITA-specific features such as key attributes are managed using custom attributes.

XDITA provides a slimmed-down version of DITA maps without relationship tables or advanced link management features; they are still capable of managing taxonomy values, link indirection, and variable text, as well as simple TOC-like navigation.

XDITA

HDITA

<map>

<nav>

<topicmeta>

No equivalent – ignored

<navtitle>

<h1> (For a navtitle that functions as title for whole map)

<data name="x" value="y" />

<data title="x" value="y"/>

No equivalent – topicrefs nest directly

<ul>

<topicref href="...">
<topicmeta><navtitle>link text</navtitle>
</topicmeta>

</topicref>

<li><a href="...">link text</a></li>

Examples

The following examples show a map used purely for navigation, and a map used to manage a taxonomy of classification values, or a list of variable text strings (the same pattern is used for both taxonomy and variable use cases).

You can see that XDITA requires a number of container elements for compatibility with the DITA specification that can be skipped on the HDITA side.

XDITA

HDITA

<map>

<topicmeta>

<navtitle>Navigation</navtitle>

</topicmeta>

<topicref href="abc.dita">

<topicmeta>

<navtitle>Topic A</navtitle>

</topicmeta>

</topicref>

<topicref href="bcd.dita">

<topicmeta>

<navtitle>Topic B</navtitle>

</topicmeta>

<topicref href="b123.dita">

<topicmeta>

<navtitle>Topic B1</navtitle>

</topicmeta>

</topicref>

</topicref>

</map

<nav>

 

<h1>Navigation</h1>

<ul>

<li><a href="abc.html">Topic A</a></li>

 

 

 

 

<li><a href="bcd.html">Topic B</a>

<ul>

 

<li>

<a href="b123.html">Topic B1</a>

</li>

</ul>

</li>

 

 

 

<nav>

<map>

<topicmeta>

<navtitle>Taxonomy/variable management</navtitle>

</topicmeta>

<topicref keys="abc">

<topicmeta>

<navtitle>Value A</navtitle>

</topicmeta>

</topicref>

<topicref keys="bcd">

<topicmeta>

<navtitle>Value B</navtitle>

</topicmeta>

<topicref keys="b123">

<topicmeta>

<navtitle>Value B1</navtitle>

</topicmeta>

</topicref>

</topicref>

</map

<nav>

 

<h1>Taxonomy/variable management</h1>

<ul>

 

<li data-hd-keys="abc">Topic A</li>

 

 

 

 

<li data-hd-keys="bcd">Topic B

<ul>

<li data-hd-keys="b123">Topic B1</li>

</ul>

</li>

 

 

 

<nav>



Attributes

The attribute model is where DITA and HTML5 differ the most, and where DITA adds the most capabilities that have no direct HTML5 equivalent.

  • Architectural attributes (domain, ditarchversion, class)

  • Filtering attributes (props, others)

  • Linking and content reuse attributes (id, conref, href )

  • Variable management/link indirection/taxonomy management (keys/keyref)

  • Localization attributes (dir, xml:lang, translate)

  • Accessibility attributes

Architectural attributes

There are two architectural attributes (@domains, and @DITAArchVersion) that need to appear on the root elements for DITA structures (<map> or <topic>) or their HTML5 equivalents (<article> or <nav>). Along with class mechanisms, these attributes allow for the roundtripping of content between XDITA and HDITA.

DITA has two mechanisms for expressing the semantic class of an element: an optional user-added @outputclass attribute, and the @class attribute, which stores the element's semantic identity as well as its specialization ancestry.

In XDITA, the @class attribute is not actually stored as part of each element instance, but only as part of its declaration in the schema (DTD/XSD/RNG) file. Since HDITA has no schema to validate against or derive default attribute values from, any semantic classes have to be directly expressed as an element value. To minimize the amount of attribute noise in the HDITA expression, only a simplified set of class values are stored, and only for specialized elements.

XDITA

HDITA

@domains (on <map> or <topic>)

@data-hd-domains (on <article> or <nav>)

@DITAArchVersion (on <map> or <topic>)

@data-hd-DITAArchVersion (on <article> or <nav>)

@class="-xxx/yyy" (on all elements)

@data-hd-class="yyy" (on <article> or <nav>)

@data-hd-class="xxx/yyy" (on specialized <section> or <span> elements)

@outputclass="xxx" (on any element)

@class="xxx" (on any element)



Filtering attributes

HTML5 has no equivalent to the @props attribute and its equivalents or specializations in DITA. When an HDITA document has the requirement for metadata within a document to enable adaptive display based on contextual criteria such as audience or geography then it can be enabled using custom attributes that map to XDITA and can make use of OASIS standard DITA filtering logic.

XDITA

HDITA

@props

@data-hd-props

@abc123 (any attribute specialized from @props)

@data-hd-abc123



In both XDITA and HDITA, an attribute's specialization history is found by looking in the @domains attribute of the containing <map>/<topic> or <nav>/<article>.



Linking and content reuse

The basic ability to create hypertext links is virtually identical between HTML5 and DITA, although the DITA syntax for @href includes the id of the containing <topic> when targetting lower-level elements, to provide a simple form of namespacing that helps protect link targets from clashing when multiple topics are assembled into a single document. This can be easily resolved during transform between XDITA and HDITA.

HTML5 has no equivalent to the DITA @conref attribute, which allows reuse of any element with an @id attribute. So it requires a custom attribute, which would require special code to process at display time, or a preprocessing step to resolve at publishing time.

XDITA

HDITA

@id (required on <map>, <topic>; optional on all others)

@id (required on <nav>, <article>; optional on all others)

@conref (on section, table, paragraph, and list elements)

@data-hd-conref (on section, table, paragraph, and list elements)

@href (on <a>, <topicref>)

@href (on <a>)

@href (on <image>

@src (on <img)



Variable management/Link indirection

Variable text and link indirection are both accomplished in DITA using the @keyref attribute to code the reference, and the @keys attribute on a <topicref> to provide either the link destination or variable text. Again HTML5 has no equivalent native capability, so a custom attribute is required, which will require either special processing at runtime or a preprocessing step.

XDITA

HDITA

@keys (on topicref)

@data-hd-keys (on <li> inside <nav>)

@keyref (on <image>, <a>, and <topicref> for link indirection, on all inline elements for variable text)

@data-hd-keyref (on <img> and <a> for link indirection, on all inline elements for variable text)



Localization attributes

HTML5 adds the @translate attribute, which aligns it nicely with DITA. While there are more attributes that could be added and aligned, for lightweight purposes three should be sufficient.

XDITA

HDITA

@dir

@dir

@xml:lang

@lang

@translate

@translate

Accessibility attributes

Since DITA assumes a preprocessing or publishing step, most of its accessibility features, such as ARIA roles and table navigation cues, are added during the publishing step, with the exception of captions or alternative text for non-text media. Since HTML5 does not assume a preprocessing step, accessibility attributes must be manually maintained in the source, although they can be stripped out and re-added whenever the content is transformed between its XDITA and HDITA expressions.

XDITA

HDITA

Generated from source semantics.

@role



Edited to correct typo: "dita-hd-" attributes to "data-hd-" attributes (HTML5 allows custom attributes that start with "data-")

Not sure how to fix the column headers, but the left column is XDITA and the right column is HDITA throughout.

Michael Priestley

XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I