Level Four: Automation and integration

Wiki page: Submitted by Bob Doyle on Tue, 2008-03-25 18:43. Last updated on Mon, 2008-09-15 21:14.

Introduction to DITA Maturity Model
Level 1: Topics
Level 2: Scaleable reuse
Level 3: Specialization and customization
Level 4: Automation and Integration
Level 5: Semantics on demand
Level 6: Universal semantic infosystem

Once content is specialized, you can
leverage your investment in semantics
with automation of key processes, and
begin tying content together—even across
different specializations or authoring
disciplines. For example, you can share
common content across marketing and
training, or share common processes and infrastructure throughout your content life cycle.

Scenario

The software division of a large technology company stores their content in a CMS, which allows all the teams in the division to reuse the content. At this level, they have moved beyond single-sourcing of content and achieved multiway reuse. Product descriptions created by the marketing team can be reused by the technical publications group to create product overviews, and by the training group to create product tours. At the same time, product architectural specifications created by technical publications can be reused by training, technical support groups, and the marketing team.

The following figure illustrates how content created by different teams can be reused in multiple deliverables by multiple teams across the division.

Figure 6: Content reuse across teams

Reusing its content across the teams in the division, the company can save a signifi cant amount of money by translating the content source rather than each deliverable that instantiates the content.

Investment

Organizations need a CMS to effectively control and automate the content development life cycle. In addition to storing content and providing versioning control, the CMS provides workfl ow automation support that assists authors in creating, reusing, and publishing. However, the investment in implementing a CMS is non-trivial in terms of preparation and cost.

In preparation for a CMS implementation, you must understand the structure of the content and where it is appropriate for reuse. This requires a significant amount of research, planning, and coordination to identify the reuse possibilities, requirements, and standards across disciplines. In addition, you need to defi ne a robust metadata model to support the content model and apply it to all topics. Lastly, you must have agreed-upon content development processes in order to automate them with workfl ow control. This requires consensus and support from all stakeholders in the content life
cycle. The cost for implementing the CMS includes the following items:
• Price of the CMS software
• Hardware to run it and store the content
• Resource time to prepare and plan for implementation
• Resources to customize and maintain the CMS
• Resource time for training stakeholders to use it

Although such an undertaking may seem daunting, the initial implementation is a one-time cost but the improvements in speed and efficiency will allow you to recoup the investment in a minimal amount of time.

A translation management system is another key automation and integration investment to manage and automate content localization. If you are translating content into more than one language, you must have processes in place to handle this additional work. A translation management system provides automated process management for translating content and integrates into the CMS workflow support.

To implement a translation management system, you must have a defined translation process that can scale to meet your localization needs as they increase, and you must understand the requirements for a scalable system. In addition, you must build your translation memory, which is the library of localized content.

Return

The return on investment in a CMS is the ability to reuse content across disciplines and automate the content development workflow. If content is not stored in a repository that provides easy retrieval through metadata, it will be impossible to reuse content across teams. In addition to obvious characteristics such as automated status change notification and reporting, workflow support enables you to see quickly what information is reused in which topics. This crucial feature of this fourth level of adoption enables true reuse and mitigates the risk of inadvertently propagating change throughout the content set.

The following figure shows how users can share content stored in multiple repositories.

Figure 7: Multiple users sharing content from multiple repositories

Traditional publishing and translation processes involve sending each deliverable out for translation. Although you can leverage the translation memory for the content in each deliverable, the translation vendor must compare each deliverable to the translation memory to determine what content is new and what needs to be translated. If you have multiple deliverables with the same content, you pay for each analysis pass. If you have multiple deliverables with similar but non-identical information, you pay for the analysis pass, as well as the cost to translate each “version” of the information. Organizations that produce multi-language documentation can incur large, unnecessary costs if they have to multiply the number of languages by the number of versions of the content for each release.

In contrast, because DITA is an XML topic-based architecture, you send only the source topics that contain changed content to the translation vendor. This means that you can control the content in smaller units, and thus the amount of content the vendor analyzes for each language is significantly reduced. In addition, if you are reusing content rather than rewriting multiple versions of it, you simply pay to translate the original source instead of multiple versions of the same information. Content that is translated at the source rather than at the level of each deliverable, radically changes the translation cost structure. The ability to translate content at the source, combined with the ability to identify changed content and thereby reduce the actual amount of content by reuse, gives you greater control over the translation process and your overall localization costs.

By automating workflow support with a CMS and integrating the translation process, you can reuse content with confi dence across teams and realize significant savings when localizing to multiple languages.

DITA features used

This adoption level uses the following DITA features:

Metadata

DITA provides some basic metadata attributes for all topics, including author, audience, resource ID, keywords, and index markers. Maps also have default metadata, including copyright information and critical dates. However, specializations provide additional, deliverable-specific attributes. For example, the bookmap specialization includes book-specific metadata including book identifi cation numbers and publication data.

Translation and language attributes

DITA provides the translate and xml:lang attributes to support localization. The translate attribute “indicates whether the content of the element should be translated or not.” The xml:lang attribute identifi es the language into which the content should be translated. You can specify these attributes at the element, topic, or map level.

Generalization for cross-specialization reuse

When reuse happens across different content types, issues of cross-type validation can quickly result: some of the semantics in the source may not be valid in the context of reuse. For example, a <step> is allowed in a task topic but not in a concept topic. But since a <step> is just a specialized type of list item (<li>), you can reuse a <step> any place where a <li> is allowed by stripping away the extra semantics that do not apply in the new context. In this way, you can reuse the content of a <step> between tasks and concepts, even if the specialized semantics and structure only apply in the source type.

There is a translate attribute on every topic (and, if I remember correctly, just about every element). However, the language attribute is useful so the processes can determine the codeset to use for output. Are you saying that you'd be unable to designate the language to which you are translating in the resulting files?

(Of course, it could very well be that I'm misinterpreting your post, for which I apologize, if I am.)

Julio J. Vazquez

SDI

jvazquez@sdicorp.com

919-354-1123

I suspect that there would be organizations that actually start automating and customizing key processes before they actually start specializing topics or domains. To me, it makes sense to start the automation process as a proof-of-concept even before you determine that the base DITA model may be insufficient for your information needs.

IMHO, specialization would come once you get the base working the way you'd like and as automated as possible. Then, as you specialize, you may only need to add a small delta to your processes if the default automation does not meet the needs of those modifications.

Thoughts?

Julio J. Vazquez

SDI

jvazquez@sdicorp.com

919-354-1123

Julio, I think you're absolutely right that organizations might begin automation process activities before specializing. The process of defining and refining those processes could be valuable input to specializations in a DITA architecture.

We took the "level 3 before 4" approach, however, because we knew one thing: If we didn't specialize from the outset, we probably never would. EMC has a lot of content, our conversion process is extensive, and we have Documentum as a CMS to control the content. Once information is in a particular format in DITA, it's hard to change that format. For example, we developed a cli_reference topic specialization from the outset because we knew people would never have the time/motivation to convert a plain ol' reference topic to that format (if the topic started out as a generic reference type).

We have made small updates to our specialized architecture after its debut, but surprisingly, not many. This will probably change as new organizations (such as our UXD and training groups) begin sharing the technical publications source objects in our repository, however.

Paul Masalsky

EMC

I agree with you also, Paul, about the specialization may be a little more difficult if done after the automation but, I don't think it's insurmountable, just a matter of fitting in the change of format while also juggling other responsibilities dictated by new features you're documenting for a product. I find that those priorities do more to keep folks from switching to a new specialization (besides how welll the specialization is documented) than the timing of the specialization.

I would hazard to guess that most of the topics that exist today are still base DITA topics because of inertia rather than desire. I think that, for the most part, specialization is an option that, while it may indicate DITA maturity, is optional enough that I might consider it an addendum to the model rather than an integral part of the model.

Julio J. Vazquez

SDI

jvazquez@sdicorp.com

919-354-1123