Translation Memory

We are planning to migrate our legacy html data to XML, and hopefully DITA. Legacy data was created in RoboHelp HTML 2002. A big issue is the impact on existing Translation Memory.

We do in-house translation into 12 non-English languages. Translators use SDLX Trados and related Trados tools (Tag Editor) to localize from English source. Translators have an enormous existing Translation Memory for each language.

Is there a way to convert legacy English HTML to XML DITA and have Trados recognize the "content" as the same, even though all the tags and formatting will probably change in the conversion?

How have others migrating to DITA handled the issue of corrupting existing Translation Memory in the process?

The path we are currently investigating involves converting a few sample files in English and a test language to XML DITA. Then using the converted test language, and Trados WinAlign, to create a new test language Translation Memory. Then running the English XML DITA topics through the test language Translation memory to see how it handles the English topics.

This path is extremely long and expensive, considering the number of topics and languages.

Any info, links, references to white papers etc, suggestions would be greatly appreciated.

Thanks

The DITA Translation TC has created a best practices document for migrating TM to a DITA environment. This can be viewed at: http://www.oasis-open.org/apps/org/workgroup/dita-translation/document.p...

In addition it may be a good opportunity to break with proprietary formats and look at computer-aided translation (CAT) software based on Open Standards and Open Architecture such as XTM from XML-INTL.com or the Heartsome Tools from Heartsome.net

Best Regards,

AZ

XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I