January meeting of Boston DITA Users group

Hi all,
Last night's meeting with Nancy Harrison was the second best attended so far (after our Dave Schell kickoff this year).
Nancy discussed getting your content ready for migration to DITA.
She stressed the importance of separating the preparation/reorganization of content from the conversion/migration step. The use case is an organization with a large amount of documentation - based on her own experience at Rational getting their documentation to conform to IBM's practices (internal DocBook and then DITA).

She said there is always structure to be found in old unstructured documentation.  Discover the regularities and see which tags may have semantic meaning. It may be styles or fonts that are always used for headings. For example,  Rational always used a typewriter font to indicate code. You may discover templates were used..
You will need to do both Top Down analysis (look at your table of contents) and Bottom Up at the same time.
In the TOC see which entries might be DITA topics, concepts, task and reference types.
From the bottom up, look for lists, tables, paragraphs, etc. that might suggest the right chunks.
You will build conversion tables that map all these identified items to DITA types.  She showed an example Framemaker document in which every character tag, paragraph tag, variable, condition, etc. was given a description and a corresponding DITA element.
DITA IDs were named after the original Framemaker tag. All the variables were collected in a single boilerplate document. The unique IDs then allowed them to be "conref'd" for use anywhere.
Nancy encouraged a prototype project, in which sample documents representative of all the documentation would be converted first. Develop those scripts until a large fraction (80-90%) can be successfully migrated. Don't try for perfection. Some documents will always need manual cleanup.  Strike a balance between effort put into automatic conversion and final cleanup.
You may use a conversion tool and/or a consultant/vendor to do the actual conversion. Framemaker offers tools to assist in conversion from unstructured to structured, as does Arbortext with the Epic EXchanger.
At http://www.cmsreview.com/Tools/Migration I have identified a number of such conversion/migration tools to help you move legacy content into a new CMS.  I include brief descriptions from the websites.
(Numbers in parentheses are the Google PageRank™ of the website).

    * Cambridge Docs (6)
      The xDoc Converter Desktop is a point-and-click application designed to make the process of transforming your legacy content into "meaningful XML" simpler. Any XML schema or DTD can be specified as the output, and multiple source formats can be consolidated into a single XML stream.

      With the xDoc Converter Desktop, you can manage your conversion project without writing custom code OR manually converting documents

    * eTouch (6)
      eTouch Content Migration solutions enable migration of existing content from the current infrastructure to the new environment with speed, accuracy, and efficiency. This automated, efficient, and unobtrusive process ensures that large volumes of content can be migrated quickly and accurately without impacting your current production environment.

    * GEMS mService (6)
      mService Accelerated Data Migration can reduce the time of a data migration project by as much as 90% with efficient Acceleration Packs for process repeatability. mService Accelerated Data Migration delivers productivity out-of-the-box, and can be completely customized to meet specific customer requirements.

    * Indigen Victor (6)
      The Victor migration platform allows retrieving, analysing and categorizing the content of an old site to export it to a web content management tool. Containers (appearance, layout) are also processed by the platform to ensure complete web migration.

    * Kapow Mashup Server (6)
      The Kapow Mashup Server makes the data migration task much easier and faster. Any Web site or application with a Web interface, regardless of location or platform, can be migrated into your system of choice. The solution is to use the browser front-end of each source system and via the visual design environment of the Kapow Mashup Server, to automate the extraction of content. Content is then loaded into the target CMS using the browser front-end or directly into the underlying database. This simple and efficient solution even converts your Web site formats and templates into your new CMS , automatically adapting style info to your new templates.

    * Metalogix Migration Manager (6)
      Metalogix Migration Manager 3.0 is a powerful content migration solution that facilitates the process of discovering, extracting, tagging, and loading legacy content from websites, file shares, intranets and existing business solutions into SharePoint sites. Migration Manager also provides tools for migrating lists and libraries between sites and servers.

    * Vamosa Content Migrator (6)
      Vamosa Content Migrator allows you to move your data from your web site or disk directories or exisitng content management system (CMS) into your required content store, standardising and enhancing it along the way.

      Vamosa Content Migrator allows you to crawl your existing infrastructure of web sites and disk directories. The web pages and documents are automatically captured, reformatted to suit the target system, have new template styles attached and automatically loaded in to your Content Management System.

      To fully support your migration to a CMS, Vamosa Content Migrator supports all data types for capture. Therefore all the files you have present on your web site or on your disk storage can be captured and moved to your CMS in an easy and consistent manner.

    * Watchfire WebXM™ (9)
      Watchfire® WebXM™ improves the speed, accuracy and reliability of your migration process. WebXM provides automated visibility into your website inventory so project managers do not have to conduct a manual audit or ask individual site owners to do so.

      WebXM can help managers decide what content is suitable for migration and which content to decommission by reporting on such things as Orphan Analysis, Outdated Pages, Deep Pages and Small Pages. WebXM can also be used to conduct an impact analysis to reduce problems ( e.g., broken links) that may result from directory or page changes.

Vamosa offers a free Community Edition of their content analysis and content migration tool that is a very good place to start.
XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I