Revision of DITA Processing from Wed, 2008-12-17 14:17

DITA Processing in the Open Toolkit

...or, What's all that scrolling through the command window when I run a build?

Short explanation

When you run a build in the DITA Open Toolkit, a lot of processing goes on before your topics are built to XHTML, PDF, or your favorite output format. A big chunk of that processing is collectively known as the "preprocess" step; most DITA builds consist of the common preprocess, followed by output specific processing. For example, a build to XHTML runs the preprocess, followed by a final rendering step to convert each topic to XHTML. A build to Eclipse XHTML runs the preprocess, followed by that rendering step and a step to create Eclipse indexing and navigation. During the preprocess steps, your files are copied to a temporary directory and modified several times.

Detailed explanation

The following steps take place during the typical preprocessing stage. If you know that you do not use some aspects of DITA processing, such as inline linking, it is possible that you could skip those stages to optimize your process. Note that in many cases the order is significant; changing the order may produce different results. Also note that this information is not yet considered complete (we are actively updating this info for DITA-OT 1.5):

  • Generate file lists - these lists determine what files are used by the following steps
  • Filter based on ditaval and insert debug information; updated files are placed in the temp directory. All files are touched during this process.
  • Evaluate conref push in maps and topics (new for DITA-OT 1.5).
  • Evaluate conref in maps and topics. Only files that use conref are modified, though files with conref targets are also accessed.
  • Move metadata from maps to topics. Pushes index terms, product info, and other metadata from the map into referenced topics.
  • Evaluate keyref (new for DITA-OT 1.5)
  • Pull in <coderef> references (new for DITA-OT 1.5). Codref is a new DITA 1.2 element that may reference external files, such as code samples, that are rendered inline.
  • Evaluate references from one map to another; merge maps into a single map
  • Mappull step - pull metadata from topics into the map (updates titles, short descriptions), and make inherited information explicit. For example, if a topicref inherits scope="external", that value will be made explicit.
  • Chunking - process both the map and topics to resolve chunk commands. May create new topics and/or modify existing topics.
  • Maplink step - generate links based on the map - navigational, reltable, etc. Makes use of the updated titles and short descriptions pulled in with mappull. Creates a temporary file with all of the generated links.
  • Move generated links from the temporary file into referenced topics.
  • Topicpull step - resolve references within topics. For example, resolves link text for in-line cross references that were specified without text. Runs on every topic.
After the preprocess, processing branches based on the selected output type. XHTML processing converts every individual DITA file to XHTML, and may convert the map to some form of navigation. PDF processing merges all of the DITA content into a single file with the "topicmerge" process, and then converts that file to XSL-FO for rendering with your favorite FO processor. Other output formats may make use of the merge or XHTML process, and may add their own additional steps to generate entirely new formats.

Why are the steps done in that order?

The order of the preprocessing steps is often significant, and changing the order may produce different results.

  • Filtering is done first because it saves us from extra processing of elements that will never be used. If a branch of the map is filtered immediately, files referenced from that branch will never be accessed and do not need to be present for a build.
  • Conref is evaluated before metadata is pushed so that metadata elements with conref do not cascade within the map or get pushed into topics; if they did, the same conref would have to be resolved in many locations rather than one. It appears before keyref so that all key definitions are stable in the maps (location in the map helps determine which key takes priority).
  • Keyref appears before map references are flattened into a single map because when a key is defined twice, map nesting depth determines which definition is used.
  • Maps are flattened before the Mappull step because it simplifies inheritance of metadata attributes within the map.
  • Chunking takes place before maplink because chunking may create new files and targets for the links.
XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I