Making Friends with Your DITA-Unfriendly Documents
By:
Don Bridges & Mikhail Vaysbukh
Data Conversion Laboratory
DITA is a hot topic in the 'Tech Docs' arena, and for good reason. DITA is an open standard that addresses many of the needs of technical documentation producers - most notably content reuse needs. The big question for many companies, once they've determined that authoring in this new standard would be beneficial for them, is what to do with the treasure trove of existing documents, known as legacy documents. Would they be useful converted into DITA, and is it worth the effort?
We found that it is indeed worth it, and that documents can be converted at far lower cost than rewriting and re-authoring, getting you moving forward faster. Converting a stack of still valuable "older" documents to your new DITA-based system could give you a big boost in getting started. But you do need to prepare in order for this to be a smooth process.
This article discusses five common problems we've seen in the course of doing more than twenty conversions to DITA XML, where we've taken traditionally written documents, reorganized them per DITA rules, incorporated subject matter expertise to assure proper tagging, and created finished DITA XML documents.
DITA documents are structured differently than traditionally written documents, and incorporate a lot more tagging than most traditional documents, and also show less tolerance for the creative approaches authors sometime use to solve last minute problems encountered in getting a document out. The potential problems identified below, can either be dealt with before the conversion starts while the writers can work with legacy authoring tools that they are familiar with; or as part of the conversion process using a combination of software and editorial re-writes (if the requirements are clearly defined).
Issue 1: Tables That Aren't Tables
Much formatting in traditional documents is about looks - how to make the document "look right" to best express the thought the writer is projecting. Therefore table structures are often used to line up information in a particular way and to present a certain look. This is especially true in HTML documents written for the web. In the example "note" below, a table might have been the easiest mechanism to align that light bulb image with the rest of the text.
Note: This is an example of a note inside a table. |
We find that it's useful to review documents in advance looking for such ambiguities ("is this a table or a note") and either apply an explicit tag or rewrite the text segment as follows:
Note: This is an example of a note inside a table.
Issue 2: Multiple steps within a single task topic.
Some documents may be authored to contain multiple procedures under a single heading. Since documents are usually broken up into DITA Topics at the heading level and the <task> topic does not allow multiple procedures under the same task, documents with multiple procedures under the same heading present additional ambiguity and challenge during conversion to DITA XML. If the original layout and structure are required to be preserved, the first sequence of steps would need to be tagged as a list and the last sequence of steps would be tagged as <steps>. For example in the sample text below the need to use a list to tag the first group of steps could have been avoided if before conversion this section was broken down into two sections: 1) Chain Removal, and 2) Chain Installation.
Chain Removal and InstallationBefore you can install a new chain, you'll need to remove an old one. To do that follow the procedure below:
- Inspect chain for "master link", if any. Disengage master link according to manufacturer's instructions.
- If no master link is present, place a roller of the chain fully in the primary cradle of the chain tool.
- Drive chain-tool pin until it contacts chain rivet.
- For most non Park Tool brand chain tools, turn handle 5 complete turns. Use care not to drive out chain rivet. For Park Tool CT-3, drive T-handle until it is stopped by C-clip. For Park Tool CT-5, drive T-handle until body stops screw.
- Back out chain-tool pin and lift chain out of cradle.
- Grab chain on either side of protruding rivet. Flex chain toward the protruding chain rivet then pull on chain to separate.
- Remove from bicycle by pulling on rivet end of chain.
Next step is to install the new chain rivet:
- Re-install chain on bike with protruding rivet facing toward mechanic.
- Open empty outer plates slightly and insert inner plates. Push inner plates until hole aligns with chain rivet.
- Back chain-tool-pin into tool body to make room for chain rivet.
- Place roller into primary cradle with chain rivet facing chain tool pin.
- Drive chain rivet back into chain, taking care to center rivet exactly between both outer plates. If more chain rivet appears on one side of outer plate than other, push rivet until it is evenly spaced.
- Inspect for tight links and repair as necessary
Issue 3: Task/Procedure authored as a table in the input file
Variations of tasks and/or procedures authored as tables in the source document present additional complexity to the conversion process in cases when they need to be deconstructed into <task>s with <step>s since it is not clear, even to a human reader, what order the paragraphs should be read in. This would be better handled if all topics being converted to tasks were authored using a simpler flow.
Example 1.
Step |
Action |
1 |
Check the phase sequence in the mains with the phase sequence indicator. |
2 |
Check the direction of rotation of impeller. If a dry installation is made, check the direction of rotation through the inlet elbow access cover. |
Example 2.
Step |
Question |
If No... |
If yes... |
1 |
Is an alarm signal indicated on the control panel? |
go to step 2. |
Comment: If it is out of order, contact a repair shop.
|
2 |
Can the pump be started manually? |
go to step 3. |
Comment: If it is out of order, contact a repair shop.
|
Issue 4: Presence of untitled tasks / topics in the source and referencing only page numbers.
In most cases legacy manuals need to be "chunked" (i.e., broken down into smaller segments). For example a typical document might be organized in "Chapters" while DITA will require that they be broken down into smaller topics based on the heading levels. This is normally done based on the existing heading titles in the input document. Often there are "implied topics" that are not explicitly identified, and there are references to page numbers which exist in the print version, but will cease to be useful in the DITA XML output. This all increases the risk that not all topics will be correctly identified and adds ambiguity to resolving cross references to the untitled topics.
For example in the text you may have something like:
???? |
Lost or forgotten password - Browse to the location of the private recovery key used for the project. (See page 121 for information about creating a recovery key.) Enter the password for the private part of the recovery key. Enter and confirm a temporary password for the user. You must communicate this temporary password to the user separately. |
On page 121 you would find an untitled task like below:
???? | To set a recovery key:
You can specify the key to be used for access to encrypted ACE instances. If you specify password protection for an ACE master and want to be able to reset the password for a deployed ACE instance from that master, you must specify a recovery key before you create the package that includes the virtual machine. a. Click Set recovery key. The Recovery Key dialog box appears. b. In the Recovery Key dialog box, select Use recovery key to configure a recovery key. c. To use an existing PEM format key pair, click Browse for Existing Key to navigate to the public key of the pair you want to use. To create a new… |
This case would be better handled if the task on page 121 had been titled and referenced by a title rather than a page number.
Issue 5: Having more than two levels of steps.
DITA only allows two levels of steps (<step> and <substep> below it), so when source data has more step levels it's better handled if the source is re-authored to keep the number of step levels to a maximum of two. The best approach to re-authoring this kind of material depends on the individual case - possible options include using bulleted lists below the second level or re-authoring text to remove one level.
- See instructions below.
- Install the Hex Coupler Guard as follows:
- Spread the inner guard and place it over the coupler.
NOTE: Do not spread the inner and outer guards more than necessary for guard installation. Over spreading the guards may alter their fit and appearance.
- With the inner guard straddling the support bracket, install a cap screw through the hole in the support bracket and guard located closest to the pump. Do not tighten the capscrew.
- Spread the outer guard and place it over the inner guard.
- Install the outer guard cap screws by following the step stated below which pertains to your particular pump:
- For pumps with a motor saddle support bracket:
Ensure the outer guard is straddling the support arm, and install but do not tighten the two remaining cap screws.
- For pumps without a motor saddle support bracket:
Insert the spacer washer between the holes located closest to the motor in the outer guard, and install but do not tighten the two remaining cap screws.
- For pumps with a motor saddle support bracket:
- Position the outer guard so it is centered on the shaft, and so there is less than a 1/4" of shaft exposed.
- Holding the guard in this position, tighten the three cap screws.
- Close the cover.
- Install the Hex Coupler Guard as follows:
Wrap-up
As is true for any standardized approach, moving to DITA XML requires change in the organization in many ways, but if you determine that it's worth it for your organization, then conversion of your legacy documents can give you a big head start at a cost much lower than re-authoring - but doing it right requires that you carefully review your documents in advance to make the conversion process as smooth as possible.
See also
- Login to post comments
- 3628 reads