DITA metrics at the CMS/DITA 2013 conference

I attended several sessions on metrics, including the conference keynote, and a few key points jumped out at me:

  • The wrong metrics can be worse than no metrics at all

  • Metrics should be stable over time to allow for tracking

  • Start by figuring out what decisions you need to make – then gather the metrics you need to make that decision

Gerry McGovern, in his keynote, refuted the value of volume – suggesting that manuals became bloated in response to marketing demands for a heavier box to sell, and in response to the costs associated with binding: it was cheaper to print one book with redundant content, than to print ten with more specific content.

On the web, problematic metrics like page views can drive content teams to increase number of pages, driving traffic to the site but not necessarily driving revenue or user satisfaction.

Gerry cited several examples of companies culling their websites, deleting up to 90% of their content and seeing sales go up and support calls go down.

We need metrics that measure outcomes, not inputs, Gerry urged. Can the user find and download the software they need quickly and easily? Can the user do the research they need prior to purchase?

I found myself agreeing with Gerry on many of his points, but also had a few quibbles. While I absolutely agree we should be measuring outcomes rather than inputs, I felt that his message was coming dangerously close to encouraging a particular input (reduced page count) as a panacea for sales and support issues. Perhaps it's a necessary counter to the prevailing push to publish more content, but the goal of reducing volume in itself is no more meaningful than the goal of increasing volume.

If we go back to the book metaphor, perhaps the issue isn't just the volume of pages but also the way they're bound. If you have a website that covers one product, maybe it only needs one “binding”; but if it covers a hundred products, then maybe you need filters and collections that can be set as scopes for search, so that the website can behave for a given user as if the only content that existed was the content that was relevant to their needs or context.

I was heartened to see recognition of the value product documentation has for sales, as a credible research tool for potential customers. But I think it's important to recognize some of the other roles documentation plays as well. And some of those roles, oddly enough, might have very little to do with page views. For example, if you sell a product based on its ability to do a thing, you may be obligated to document how to make it do that thing, not only for sales but for legal reasons. And if you provide certification in how to use that product, you may be obligated to provide a public reference for the capabilities or skills you are certifying. Even if no one views them, they need to be there.

That said, the basic message was one I whole-heartedly agree with – measure outcomes, not inputs. And certainly a huge challenge with findability is the volume of content, and there may be organizational resistance to deleting content that needs to be addressed.

But at the same time, reducing volume isn't the only tool available to us: you can provide ways to limit search scope, and you can also choose different chunking levels for certain types of content. Gerry described a chunking solution in one of his other examples – the documentation for a set of command-line options was overwhelming the search results for common GUI tasks, and the answer was to move all the options documentation into a single file, where it was still findable through search, but no longer hogging all the results real estate. This feels like another “binding” solution – whether at the level of the whole book/collection, or at the level of the individually indexed/returned page, grouping and scoping content can help prevent low-traffic content from cluttering up common search results, but without the potential harm of deleting content that is, despite being low-traffic, vital to some portion of your user population.

After all, your product may have one 1 system administrator for every 1000 regular users, but if the sysadmins can't do their jobs, neither can anyone else.

I attended two other sessions on metrics: Chris McGowan of EMC, who talked about articulating value through metrics, and Peter Fournier of Samalander, who talked about picking a good baseline metric for DITA reuse.

Chris's presentation was a good complement to Gerry's keynote, focusing on the value of metrics in making a specific decision, and even to the requirements of a specific decision-maker. If you don't know what your metrics are needed for, how can you be sure you're collecting the right ones? It builds logically on Gerry's point of measuring outcomes not inputs: you need to measure the outcomes that matter for a specific decision-maker, or at least for specific types of decisions.

Peter's presentation was at a lower level, and definitely about how to measure inputs, rather than outcomes – reuse isn't an end in itself, after all, or a measure of utility. But at the same time, given the importance of reuse as part of the value proposition for DITA, having a meaningful way to measure reuse, over time and across organizations and architectures, is likely to be useful to any set of metrics that measures the return on investment for a DITA adopter. One of Peter's key points was the importance of selecting a measurement that was stable over time, and he argued that topics – or any measure that depended on chunking and writing choices – was not stable over time. Instead, a simple word count was likely to be more stable, remaining the same no matter how topics are merged or split. I can definitely see the point of this, although at this point we have so many metrics based on topic counts, and such a stable definition of a topic at IBM, that I'm not sure how easy it would be for us to change.

Finally, a note about Mark Lewis's book, DITA metrics 101. It's a practical how-to book for constructing the business case for DITA adoption in an accountable, measurable way, with metrics on cost savings, authoring productivity, and reduced time to market. It doesn't cover some of the higher-level measures of utility that Gerry and Chris talked about – for example, ways to measure the ROI you get from your content in terms of sales growth or diminished support costs. But what's there today is eminently practical and immediately useful to any organization considering the move to DITA and topic-oriented authoring.

XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I