DITA and DocBook (Day 2 of DITA North America 2006)

The day kicked off with Susan Carpenter of IBM - one of DITA's first real users, and a big influence on its evolution - talking about process, and how DITA maps can be used to distribute workload among writers, manage reviews, manage translation - in her words, the maps become the process currency of the team. As always an excellent presentation with lots to chew on.

The highlight was Norm Walsh's speech on DITA and DocBook. He noted that they do they have different characteristics: DocBook is large but very flexible, DITA is more constrained and explicitly focused on topic-based authoring. All good so far.

He then took on the hype around specialization, which is DITA's main extension mechanism: he pointed out that there are very real costs associated with defining new tags that have nothing to do with the technology, eg analysis and modeling, and that if you add a new tag without new behavior, there is nothing distinguishing it in the output to the user, and in some cases there may be no logical equivalent for the new tag, and the processing must be updated to get any meaningful output.

All this is true as far as it goes. DocBook is a larger doctype with subsetting and parameterization capabilities but does not support creating new tags, while DITA (looking just at topic and map) is a very small doctype that allows extension by creating new tags, and although new tags are possible in DITA, they aren't completely free either. Generally speaking, it's a difference of starting big and identifying the subset you need, or starting small and growing (either by specializing, or by adding new modules) as you go.

The topic-based authoring approach does make DITA more prescriptive than DocBook, and also delivers a lot of very specific value, in terms of providing a scalable unit of reuse that is also for readers a unit of use, so that extensive reusability can be achieved without compromising usability. But it also means that rewriting may be required to move to the architecture: it's not just a technological decision, it's also a decision about what the best model is for your content.

As far as specialization hype goes, it is definitely true that analysis is a necessary part of successful specialization: I can whip up a working message specialization in an hour, but it takes six months to gather content experts, review various message content models, and ensure you've got something acceptable to a wide range of message authoring situations within a company.

It is also true that some tags will only be useful if there is processing associated with them. The syntaxdiagram specialization in DITA's programming domain is a good example. However, most of the specialized tags in DITA do not require specialized processing support. A quick check of the DITA Open Toolkit should confirm this - and even the support the toolkit does provide is in most cases value-add rather than required. In other words, syntaxdiagram is the exception, not the rule. When you create a new kind of list, it doesn't stop being a list - and so displaying it as a list doesn't stop being appropriate.

We are seeing real savings in DITA specialization: first, it allows us to customize in cases where we simply wouldn't have before because of technical cost; second, it has been reported as reducing the time to deployment of a new feature by 50%, effectively by reducing the technical costs (the analysis costs remain constant). IE, you still need to know what you want, but once you do, it's a lot easier to get.

Stealing from my own specialization workshop, where I create a new specialization with seven new tags:

  • Still work involved to define the seven new elements
  • But no work to define the other 100-odd already defined
  • No work to get those tags enabled in existing processes (most cases)
  • No work to get the content integrated in existing books and Webs (most cases)
  • And reuse by reference means you can pick up enhancements to both base design and base processes when you want to.
  • The real work is in figuring out what your tags need to be. What DITA does is simplify the mechanics of getting those tags into an authoring and processing environment you can test with your users - shorten the feedback cycle, improve more quickly.
  • A rapid prototyping architecture that scales to a production one.

That said, specialization isn't necessary for everyone, and as more and more people do specialize and share those specializations through the dita.xml.org focus area, the need to specialize should continue to shrink: you don't need to specialize, after all, if someone else has already done it for you.

So DITA and DocBook do have some differences, and different approaches. Norm and I both agree there's value in each other's architectures, and we both want to work to get seamless interoperability between them.

My dream is that I can someday point a DITA map at a DocBook section, and have it turned into a topic, pulled into my map, and managed by the DITA map just as if it were a native topic (links managed, metadata assigned, pulled into PDF or HTML, etc.). And on the reverse side, that a DocBook section could point out to a DITA topic or map, and have it turned into DocBook content, and published seamlessly as part of the DocBook pipeline.

This is not just a DocBook-DITA problem: this is a general document interchange problem. At the least, I'd like to get ODF, the other OpenOffice document format, be an equal participant in the reuse story, so we can reuse DITA and DocBook in ODF, DocBook and ODF in DITA, and DocBook and DITA in ODF.

It's in all our interests to get along, so while the technical discussions are useful for evolving our architectures, it's helpful to keep the goal in mind: the right architecture to make your content usable, and the right architecture to make your content reusable.

XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I