Integrating DITA at Idiom Technologies

Interview by Kay Ethier, Bright Path Solutions and Scott Abel, TheContentWrangler.com.

In Integrating DITA at Idiom Technologies, we ask Willie Williams, Senior Technical Writer at Idiom Technologies, Inc. to talk about the enterprise globalization software provider's venture into the world of the Darwin Information Typing Architecture (DITA).

Integrating DITA at Idiom Technologies

Bright Path Solutions asked Willie Williams, Senior Technical Writer at
Idiom Technologies, Inc. to talk about the enterprise globalization software provider’s venture into the world of the Darwin Information Typing Architecture (DITA). Idiom Technologies® optimizes the globalization supply chain by aligning global enterprises, language services providers, and translators. Proven WorldServer™ software solutions enable global organizations to expand market reach and accelerate multilingual communication by automating translation and localization.

BPS: Why did Idiom decide to adopt DITA?

WW: Idiom went to
XML to benefit from its:

Reusability

Extensibility

Portability

Ease of Globalization (Translation & Localization)

Idiom chose DITA because:

It’s becoming a widely adopted standard, governed by an international body (OASIS)

It already offers an open toolkit, which is growing steadily

It supports specializations for meeting business needs

Tool developers can easily integrate it with editors (like Arbortext Epic, Adobe FrameMaker, and Blast Radius XMetaL) and provide XSLTs for PDF, HTML, and other output formats

A framework can be implemented in which administrators can achieve processing customizations without conflict with the transforms and processing distributed in the open toolkit

BPS: What content did Idiom use DITA to create?

WW: The WorldServer OpenTopic User Guide and the OpenTopic Customization Guide were both written entirely in DITA, as was the Supported Configuration Guide. The other WorldServer guides (the Installation Guide, the Administrator Guide, and the User Guide) have all been converted to DITA, and are being updated in DITA now. WorldServer whitepapers have been converted and are also maintained in DITA.

BPS: Did Idiom have to specialize (customize) DITA for its purposes? If so, what did you need to specialize and why?

WW: I asked one of our software engineers, Chris Wong about that. Here’s what he said, "We used a specialization of the DITA map: the bookmap/bookinfo elements were not and still are not part of the DITA standard. We needed bookmap/bookinfo for book-like deliverables, since it has appropriate metadata and structure for book semantics such as chapters and front matter content. Bookmap/bookinfo will almost certainly be part of the upcoming DITA 1.1 standard. This was the only specialization needed for our product documentation.

I also asked Chase Tingsley of Idiom about specialization. He added, "for the ‘DITA Cookbook’ (a document Idiom provides its clients that provides examples for using all DITA elements) we created specialized topics for "cookbook entries" with a particular structure.

Custom structure for the specialized topics simplifies the editing process for people who aren't familiar with DITA, and made it very clear how to go about laying out a glossary entry.

BPS: What lessons learned can you share with readers about specializing DITA?

WW: Authoring in a DITA-based system forces consistency--it simply won't allow 'illegal' structures. Besides consistency, this results in improved portability and ease of globalization. When you've put a topic-based content repository scheme in place, you can create a wide variety of output (guides, online help, tutorials, Web content, specifications, and marketing collateral) from the same repository.

BPS: Legacy content conversion can be particularly challenging. What were the issues you had to overcome when converting MS Word content to DITA XML?

WW: Most of the material I converted was in unstructured FrameMaker. I did convert a dozen whitepapers (around 100 pages) to XML. There are Word-to-XML converters available now, but the job was small enough that I chose to simply create XML "dummy" files using our own DITA templates and then cut-and-pasted all the text.

BPS: What advice can you give those who are trying to determine how much time it will take to convert legacy content to DITA?

WW: It took me 100 hours to convert 400 pages of unstructured FrameMaker content into DITA. The approach I took was to:

Clean up the styles in the source FrameMaker files (16 hours)

Create a mapping table that mapped these styles to DITA elements (8 hours)

Use the mapping table to get from unstructured to structured FrameMaker (1 hour)

Save the structured FrameMaker files as XML (1 hour)

Clean up the XML files to make them parse (24 hours)

Perform manually what didn’t come across in the conversion—see below (46 hours)

Create ditamaps for each book (4 hours)

I had to do the following tasks manually:

Apply hierarchy to nested topics and lists

Re-apply cross references

Cut-and-paste table cell text

Add graphics

If you have many thousands of pages to convert, this approach is too manual, and you should consider the following approaches:

Write XSLT scripts to run on the XML saved from the structured FrameMaker to apply hierarchical structure missing from the "flat" XML produced by Save As XML from the structured FrameMaker.

Use techniques added in Frame 7.2 to bring the cross-references, tables, and graphics across to XML in the conversion. (I had used Frame 7.1, because Frame 7.2 was not available yet.)

Consider the alternate technique of saving your unstructured FrameMaker source as MIF, and converting the MIF to XML programatically, using a utility like a DOM parser or Perl script.

BPS: Did converting unstructured content provide any challenges? If so, what were they?

WW: The primary challenge is that FrameMaker is not hierarchical. This makes it difficult to achieve nesting of topics, list items, and table elements.

All of the phases outlined above presented challenges:

The unstructured source itself was inconsistently styled

Much of the inline styling was done from the Format menu, rather than from the Character Catalog

There were Heading3’s nested in Heading1’s, with no intervening Heading2

There were triply-nested lists (for example, bullets within sub-numbered within numbered)

Because structural levels in FrameMaker are indicated only by the "header," and these headers all map to <title>, I was not able to capture the nesting hierarchy in the structured FrameMaker via the conversion mapping table.

Because of this "flat" structure, the XML produced by Saving As XML didn’t parse.

Because images were imported by reference, with no identifying tag, I had to bring the images over manually.

Because FrameMaker has an equation utility, but our DITA implementation does not support equations, I had to bring equations over as images generated from Visio.

Because I could not identify a style identifier for cross-references, I brought them over manually.

The only cell styles were cellhead and cellbody. These would map to <entry> within <row> within <tbody> or <thead> within <tgroup> within <table>. And then there’s the problem of not reproducing all the wrapping elements until all the "entries" are brought over.

BPS: How did you begin learning about DITA? What resources can you recommend for newbies?

WW: While there are no "DITA for Dummies" yet (that I know of), there is a lot of good material online:

The OASIS DITA Technical Committee web site

Introduction to the Darwin Information Typing Architecture (IBM DeveloperWorks)

DITA Yahoo forum for FrameMaker users Idiom is finalizing our DITA Cookbook, which we will make openly available to everyone. This cookbook gives usage examples for all DITA tags and attributes and can serve as a supplement to
OASIS DITA Language Reference.

BPS: Now that you’ve learned a little about DITA, what information was missing from those introductory materials? What things couldn’t you find information about? What was confusing?

WW: When I joined Idiom, the DITA decision had already been made. Having already authored in XML at SPSS, it was easy to create XML documents. I don’t find working in XML much slower than working in unstructured or structured FrameMaker, but the reusability and ongoing globalization benefits are significant.

Putting myself in the shoes of someone who hasn’t used XML at all, I think the quickest way to get up and running is to have someone with XML experience help you get up and running and be available for follow-up questions.

The most difficult stage is getting the environment set up. When it is set up, the XML authoring tools you use won’t let you make mistakes (unless you turn off validation).

You might have to request that the tools group add support for attributes controlling things like sidebars or callouts.

BPS: Are there others working on the DITA project with you? Are they geographically dispersed and what roles do they play?

WW: The tools developers for our DITA implementation are in Moscow, the Ukraine, California and here locally. The writers are local, and we are working closely to migrate our legacy documentation over to our content repository, which is organized for topic-level reusable files, recognizable by a file naming convention.

BPS: What benefits did DITA provide Idiom over the old way of creating content?

WW: We are already reusing content in multiple places, and only have to update one file to have the content change in these multiple documents. For our next release we intend to have the entire docset single-sourced from this content repository.

BPS: How much content (%) were you able to reuse?

WW: We are currently reusing about 5% of our docset. In the next release, we expect that figure to increase to around 60%, when we single-source the guides and the online help (which are currently not single-sourced).

BPS: How did DITA help with translation?

WW: We haven’t translated the docset yet. By the time we do, there should be enough single-sourcing that translation cost will be optimized because we’ll only need to translate reused content once.

Moreover, by working with smaller topic-oriented content units we will be able to adopt the "continuous translation" approach that we recommend to our customers. By translating smaller units as they are completed (but before they are assembled into completed books) we will be able to significantly shorten the translation timeline – as our customers do.

BPS: Did writers have any difficulties or issues using DITA? If so, what were some of the common ones?

WW: At first, primarily because of a FrameMaker bug in which the cursor didn’t align with the actual location you were editing, editing DITA in FrameMaker was frustrating. We’ve found a workaround for this.

We have some developers and managers using DITA now.

BPS: What challenges did you encounter?

WW: The next challenge for us at Idiom is to single-source the guides and the online help system, perhaps 50% of that material is overlapping. The challenge will be to structure the topics so they work both as guide material (mostly conceptual and reference) and help (mostly procedural). Both our guides and our online help need to describe why you do things.

BPS: Does Idiom plan to use DITA for other projects?

WW: Eventually, we want to use DITA for content written by all our developers, professional services, sales engineers, and managers.

BPS: What "lessons learned" can you share with others interested in considering a move to DITA?

WW: We’ve developed a "Best Practices" document warning against things like embedding links inside paragraphs rather than in a separate Related Information topic. Embedded links aren’t always reusable.

The general subject of how to structure topics for reuse has had books written about it. It’s definitely best to always think, when you are writing a topic, "Is there anything in here that might not work in another context?" Otherwise, when you find out later, trying to reuse the material, that a piece of it doesn’t work in the other context, you’ll have to go back and break up the original topic.

BPS: If you had a DITA wish list, what functionality would you add to DITA and why?

WW: DITA maps didn’t support books in the traditional sense. But with bookmaps about to be added, that hole will be filled. I can’t think of features I had to work around because there isn’t DITA support. But perhaps I will in time.

BPS: Can you think of any content types that would not lend themselves to DITA?

WW: DITA is topic-oriented and based on standards that are widely accepted. I can’t think of a reason why it wouldn’t be appropriate for Web content, sales and marketing collateral, and so on. In fact Idiom is working with Blast Radius and others on the application of DITA to these and other forms of content.

The link to http://www.travelthepath.com/structure4.html doesn't work anymore. Anybody saved the article?

Lee

XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I