Migrating legacy content

Advice for preparing unstructured documents for conversion:

Label all headings intended for topics with styles that indicate the intended topic type.

For example, in FrameMaker unstructured docs, a simple heading2 style used for level 2 headings may be changed in the templates to make several heading2 styles available (heading2concept, heading2task, etc.). This enables the DITA conversion programs to determine which type of topic was desired when the documents are converted.

This is not necessary if you use MIF2Go to convert unstructured FM content to DITA: MIF2Go can determine the correct infotype automatically, based on the styles used in the "body" section of the topic. For example, if you have a Heading2 followed by steps, MIF2Go will convert this to a task topic.

Try to fit all unstructured content to a DITA model. This involves moving all conceptual information out of task topics and into concept topics, moving tables that belong in reference topics out of concept/task topics, ensuring that all task topics have only one main procedure, moving prerequisites into a separate section before the main procedure in task topics, etc.

Clearly understand the difference between concepts and references and create guidelines you (and possibly others) will follow when you begin the task of chunking your legacy content. This is crucial to ensure you don't end up with "concepts" that are actually "references"...and vice versa.

Consider applying minimalism techniques early. Go through your content and make it
minimalist prior to chunking.

Ensure that all books are using the same paragraph and character tagging definitions. In Framemaker, all books should ideally be using the same paragraph and character catalogs.

Remove overrides to paragraph and character tag attributes. Replace one-off bold, underline, and italic settings with catalog-based character tags. Doing this helps any automation tools you might use to do a better job.

Use many of the items in your existing department style guide as a basis to create an Information Model that includes guidelines on how to use the collection of DITA elements. This model would define these elements in a way that would help you enforce the style and branding (look and feel) of all your docs. Having this Model gives authors the needed guidelines to develop new content in DITA. Put to paper the crucial items first (you'll discover what those are as you progress). This Model will develop and mature as time passes.

Ensure that the tag name is consistent throughout all books if you're using conditional tagging (such as that in Framemaker). In the DITA world, these tag names will become "metadata" (values for element attributes such as "audience", "platform", and "product"). These tag names should be defined in a metadata schema, which would be included in your Information Model.

Determine which content can be reused, at the topic level (EULA, copyright, preface info) and at the phrase level (company names, product names).

Carefully consider what is worth the trouble to store in a single location and import by conref vs. what is better left typed in as normal text. Going overboard with the conrefs can be a maintenance nightmare, but more reuse means less writing work and lower translation costs. You may find you have several sections that serve the same purpose and are almost the same. Consider writing one generic section that can work in place of the several.


Contributors to this page:

- Paul Masalsky, EMC
- Yves Barbion, Scripto
- Jerry Pope
- Derek Adams, InfoPros
-Jan Brandego

See also:

-DITA-users mailing list thread

See - Making Friends with Your DITA-Unfriendly Documents -http://dita.xml.org/making-friends-your-dita-unfriendly-documents

XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I