DITA to XLIFF and back -- we all need to learn what it means

On September 18 and 23, 2008, the OASIS DITA Adoption Technical Committee will hold its first webinars to present the benefits of using DITA in a translation environments.

To learn more and register for these events, see the Events page on dita.xml.org

 I know that for many of you, the ROI you've promised to your senior management rests significantly on reducing translation costs. However, we have learned that, sometimes, sending a localization service provider (LSP) multiple small DITA files actually increases handling and administrative tasks and costs. The OASIS XML Localisation Interchange File Format (XLIFF) standard promises to solve the problem and facilitate the translation tasks.

Members of the DITA Translation Subcommittee and the XLIFF Technical Committee has joined together to present two exciting webinars that will help you understand the value of the DITA to XLIFF and Back roundtrip and show you exactly how it works. the September 18 session emphasizes the business problem we're trying to solve. The September 23 session leads you through the technical solution.

In my opinion, every publications team that has DITA topics to translate into multiple languages or is planning to move to DITA to reduce translation costs needs to attend these webinars. And, you need to tell you translation coordinators to attend as well.

We'd also like to invite the LSPs. So--if you're working with an LSP that is just learning about or struggling with DITA translation, invite them to attend as well.

If you are a CMS or a TMS vendor, come to the webinars to learn about XLIFF and understand how XLIFF can reduce processing complications and administrative costs.

 We're assembled international experts on the two panels. They really do know what they're talking about. You can tell I'm excited about this. Please attend.

If you don't know what XLIFF is (or DITA for that matter), this is your opportunity. I didn't know what XLIFF was a first either.

 

 

The webinars can be downloaded from here:

OASIS WEBINARS

The specific links are:

DITA to XLIFF and Back - Two Complementary Standards, 95 mins, 54.3M WMV

DITA to XLIFF and Back (Roundtrip) - Understanding the Technical Solution, 85 mins, 67.2M WMV

Stefan mentioned the issue of using a proprietary solution like SDL Trados. True, the translation processing software like Trados can certainly accept individual DITA files and prepare them for translation. The processing software is DITA-friendly, at least most of the time.

The problem is the packaging. Sending multiple individual files for translation often leads to increase handling and administrative costs. XLIFF, besides being an OASIS standard and therefore non-proprietary, combines multiple files, such as all the content of a DITA map, into a single file. The packaging simplies the handling and ensures that all the DITA topics get translated.

We've run into problems in which topics got lost in the process. Send how 300 topics for translation and get 295 topics back translated and 5 topics not translated.

Come to the webinar and learn more from the real experts. (not me either!)

 

JoAnn, okay, I get the idea. Handling such a volume with so many individual files can be a challenge for a not that perfectly organized LSP. Even if it means to introduce one more additional step, too, it might be helpful in some scenarios to take this approach or use one of the "glue/unglue tools" out there. SDL Trados ships with  "SDL Trados Glue", that can be used to combine all the 300 files into one big file. Other TMS should have something similar.

However, please consider that depending on the size of your topics, the combined file can become quite big. I have run a quick and dirty test with a small dita topic:

simple sample dita topic xml source code

Now I have opened this dita topic in Trados Tag Editor (actually it's a simple drag and drop from Windows Explorer to TagEditor window) and saved the file as TTX (ctrl+s automatically creates the ttx). TTX is actually also a xml file that "masks" the native xml (or other file formats like FrameMaker mif, InDesign INX, RTF, DOC(X) and so on) and provides additional elements to create the "translation units".

In a second step I converted the dita file to xliff with a command line tool. Then I opened the xliff file in Trados TagEditor (again, that's as simple as drag and drop as SDL Trados ships with support for both dita and xliff) and saved the file as ttx. Now, let's have a look at the file size:

Comparing file size dita.xml dita-xliff and their trados ttx files

My sample dita xml is as small as 367 byte. The translation ttx file created from it is 3,779 byte. That's ten times bigger (looks like 4 times here due to strong roundings to KB in windows explorer but is actually ten times more).

Now, the xliff file I created from my dita file is 1,300 byte. Compared dita to dita-in-xliff that's 3.5 times more in file size. The ttx file created from the dita-xliff file is actually 9,350 byte. That's – wow – 25 times bigger than my original dita xml file! And it's 2.5 times bigger compared with the ttx created directly from the native dita xml file. Now, why is that? Let's compare the native dita topic with the xliff-masked dita topic in the translation software:

comparing native dita xml file with xliff-masked dita xm file in SLD Trados TagEditor

As you can see there is quite some increase of elements.

So, let's assume, you have 1 MB of dita xml. It will become as big as 3.5 MB as xliff xml file and 25 MB in the (yet untranslated) translation file. The translated (bi-lingual) file will be as big as at least 50 MB. That is, the translation software's xml parser will have to parse roughly 50 MB. Remember: The souce language xml was only 1 (one!) MB here. Now, 300 xml dita files might be even bigger than one MB.

Let's take these numbers and multiply them by then. I'm pretty sure that your translator will get into serious trouble. Please remember that many translators are not always that technology addicted like you and me and sometimes have really old machines with many year old versions of their TMS.

I suggest to "group" the 300 topics and send the combined topic groups. I'm sure, your translators will appreciate this ;-)

Hi,

I watched the two webinars which were very informative -- especially the second one with the hands-on demos.

Following on from the discussion about an extra conversion, I think the tools used in the webinar actually use the xliff file, whereas a tool like Trados apparently (according to Stefan) converts the xliff to its own ttx format.

The collecting of files in a single xliff file would be an answer to my concern of having to deal with so many topics and using the ditamap to collect these files is more good news, especially if you're working like me with just a file-based system for small test projects. So now I'm looking forward to seeing Bryan's tool for collecting DITA topics and translating the xliff file using one of the new tools.

Having said this, being able to work with DITA xml files directly in a tool like Trados is quite exhilarating and has worked very well for some small projects I have written, translated and published.

Thank you, both teams, for the information provided in the webinars.

Ray Lloyd

Hi,

 

You can translate DITA files with many tools, including Trados. However, there are things that Trados can't do, like read a DITA map and follow all its links, parsing each of them to extract the translatable text from the associated DITA topics in order to create a complete TTX file that you can translate.

DITA has a special elements named <conref> that lets you reuse content. Trados doesn't understand it and can't create a translation project from a DITA map.

DITA is powerful because it lets you use specialization. You create your own DTDs or XML Schemas, put them in a catalogue and author your DITA files with your favourite good XML editor. Trados doesn't understand OASIS catalogues and doesn't know how to deal with custom DITA-based elements.

XLIFF is an OASIS standard and there are XLIFF-based tools that already support OASIS catalogues. Support for DITA customization is already provided by XLIFF-based translation editors, but I have not seen it in proprietary solutions.

 

Best regards,

Rodolfo

--

Rodolfo M. Raya

http://www.maxprograms.com

Interesting post. I'm looking forward to attend the webinar. I'm wondering why an additional conversion process like DITA > XLIFF > TMS > XLIFF > DITA step is better than DITA > TMS > DITA. Every translator with an up to date version of SDL TRADOS can open your dita xml files direclty in the translation software, translate the file and save back as dita xml. You can even use SDL TRADOS to verify and validate the dita. I'm looking forward to learn what benefit you would gain from introducing two additonal steps in the process chain...

The simple answer to Stefan's question is to ask another question: Why is DITA so successful. The answer is simple: as an Open Standard it is much better and cheaper to use and implement than any propriatory solution. Open Standards create a level playing field where companies have to compete based on a clearly defnied specification. Closed solutions cause lock in and the continual upgrade treadmill. They are often not very well though out either, because they have not been the subject of public scrutiny and peer review as is the case with Open Standards. The fact that you can go DITA -> XLIFF means that you do not have to use a propriatory solution, but a much better and cheaper tool based on Open Standards and an Open Architecture such as OAXAL. As with DITA Open Standards translate into better solutions and more choice.

Andrzej, I did not doubt the value of dita or the advantages of open standards over proprietary solutions at all. Actually I totally agree with you (and recommend it to our clients as well) to rely on open standards. And dita perfectly fits for many technical writing scenarios and has (meanwhile) a quite wide tool support. However, Dita is a standard "only" and a standard is a standard. It's not an editing tool. And if you think of all the xml tools to "edit" DITA and think of authoring tools for writing technical documentation: Most of them are software tools from professional companies providing "proprietary" software. But with these proprietary tools like XMetal, Oxygen or my favorite Adobe FrameMaker you can create perfect open standard files.

The fact that you can go DITA -> XLIFF means that you do not have to use a propriatory solution (...)

I don't agree that putting an open standard (xliff) over an open standard (dita) makes you more independent from proprietary solutions. Either your proprietary or open source translation solution provides support for dita or xliff or not. It is very likely that a today's version of a translation memory system will provide support for xliff. But it is also very likely that it will come with native support for dita. At least all the bigger translation memory systems today have a dita support. I don't see that packing my dita into xliff first, then spreading the xliff files to the translators and in the end unpacking the translated xliff back to dita (which will hopefully work - translators tend to make more and more errors the more complex the tagged material is) is in any way more productive than simply sending out my dita files and getting back translated dita files.

So, my question was: Why wrapping an open standard like dita into another open standard like xliff if everyone can handle the dita directly?

 

P.S.:

an Open Standard it is much better and cheaper to use and implement than any propriatory solution

I agree that using open standards has many advantages. But cheaper? You can download Open Office, fire it up and write down you documentation. That's cheap.

However, creating costs for visiting conferences and trade fairs, holding meeting after meeting to compare standards and discuss them, buying books, spending hours in the web, spending weeks (and each working hour creates costs!) for learning dita, spending days and weeks to figure out the necessary tools and processes (ending up buying a professional tools like FrameMaker because you have to create e.g. great looking PDF documentation as well) managing a long and hard learning curve and before and after buying tools buying in consulting and training.

Frankly, I did not see a single company at least over here in Germany that has not spend at least some hundred hours to implement dita in a tech writing scenario. Costs are usually in the higher five-digit area and can become a 6-digit budget if introducing a cms is required as well.

Switching to an open standards like dita has many obvious advantages. So many, that companies are willing to spend ten-thousands of dollars for this. Switching to an open standard like dita for documentation today creates value and protects your investments in content creation. But its everything else but cheap.

Stefan, XLIFF is a localization format so, apart from the obvious reason that many translation tools will support this format out of the box, it has features for providing a lot more localization information than present in the DITA content itself. One simple requirement could be that UI terms in a DITA software manual should not be translated. Within XLIFF you can specify that these terms should be 'locked' for translation. You could counter that this requirement can easily be added to the DITA content using an attribute but XLIFF can host a lot more localization oriented metadata (such as translator instructions or previous translations). I believe it's not such a good idea to "pollute" the DITA content with that type of metadata? And even this simple requirement could get complicated quite easily when one target language should have UI terms translated and another target language not.

Without a format such as XLIFF you are at the mercy of the translator or LSP to do the right thing with your XML content. They have to configure the translation tool for each XML format they encounter. I am sure that it is possible to provide a reasonable DITA to XLIFF mapping which gets it right most of the times (if it can deal with DITA specializations intelligently). However, I am even more sure that a more adaptable process is required to handle the more complex requirements of real-world translation projects.

I hope to find good ideas about this in the webinar. My own thinking goes in the direction of yet another standard, or specification, which hasn't been mentioned yet. Using ITS one can specify a kind of rulesheet that is used to annotate the DITA with localization information. These rules can then be used to drive the DITA to XLIFF mapping. At the moment I'm a bit dissapointed at the uptake of ITS (none?) in all the commercial localization tools that I know of, especially seeing that they already have problems putting all the information that is present in XLIFF to good use.

If I'm allowed to dream for a little bit, I could see a translation tool that doesn't really need XLIFF (only maybe for cases where the source is non-XML) and works directly on any XML content aided by something like ITS. In that way it would also be much easier to, for example, re-use existing HTML stylesheets to provide a good quality in-context preview and validate the DITA (or any other XML format) in it's original format during the translation. In that case I neither would see the point of first going from DITA to XLIFF.

XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I