How do you determine the size of a topic?

In pursuit of the ultimate techCom information architecture

Well, this subject has been discussed lately in various blog posts (see for example Tom Johnson blog post about chunking). We chunk content in small pieces called topics. To just chunk ad-hoc as you write will most probably lead to a messy environment; especially if several writers co-operate in a content creation team.

But wait; why do we chunk content into topics in the first place? The obvious answer you often hear is “to allow single sourcing and reuse”. But what does that mean? To me there are two perspectives of why we chunk content. First of all there is the consumer perspective; the user. A user wants an answer when stuck with the product. Then the user does not want to wade in content that do not apply to the current situation. Thus the answer shall be given in a single standalone topic.

The other perspective is the producer perspective. We as technical communicators want to be effective, meaning that we do not want to manage the same content in many places. Chunking doesn’t really solve the consumer perspective since we often merge several topics into a deliverable (using a DITA map) and the user must anyway wade in the “topic pool” to find the answer. Of course we can provide faceted search environments to help the user narrow down a set of topics to search in.

So what factors determine the size of a topic? We need a “topic size constraining strategy” that guides us to determine the topic size. There are several topic size constraining factors. First of all we must ask our self why we (as technical communicators) create content in the first place. It is to provide users of technical products the answers they need when using the product. So how do we capture the questions? What are users searching for? The types of question are probably different from user to user, but let’s say that there was some sort of unified “resolution” of questions for a particular product. Then there is no point in splitting a particular answer in two or more topics. Why would you do that?

The type of questions users ask can be transformed into information types. DITA has the task, concept and reference information types as default, but probably you need more detailed types. Having three information types means that we would end up with three topics then? Of course not. There are other topic size constraining factors.

Let’s look at the technical product. It is made up from several components (UI elements, functions, features, hardware sub-assemblies etc). The user is not using all components every time the product is used, but using a specific part of the product when a question arises. So we can model the product components from a user perspective and, voila, we have another topic size constraining factor. Then we need one task, concept and reference topic for every part on the product the user can possible interact with. This is important; we must model the product from a USAGE perspective; not from a manufacturing, logistic, product data exchange etc, perspective.

But here the producer perspective interferes; product development is often modularized, meaning that a new product can be built from reusing components (code snippets, hardware modules etc) from existing products. We as technical communicators must mimic this behavior and relate our topics to the product development components to be able to build a new manual for the new product, reusing the corresponding topics. This allows us to implement efficient release management processes. This may mean that we must split a topic in two topics to be able to reuse efficiently.

Furthermore, another topic size constraining factor is the user knowledge level or user type. Maybe there is novice, intermediate and advanced users (or different types of user like admin, installator etc). There you go; you need three different task topics for the very same part on the product.

And, the product might be possible to use in different environments, such as a dessert, a work shop, an office etc. The way the user is supposed to handle the product may differ depending on environment. Aha! Another topic size constraining factor. Let’s say there is three different types of usage environments where the user must handle the product differently. Then you need nine different task topics for a given part on the product; three for environment A, three for environment B and three for environment C.

But hey there is more. In many situations we combine, what would be many topics, into one topic XML file and use filtering conditions to be able to filter out a specific version of the topic. Maybe the instruction for our novice and intermediate user are to 90% the same, then there is no point in having two separate topic files to manage. Or we could use conref or keys to single source even further.

Al these topic size constraining factors should be captured in a taxonomy before you start to create topics. And the taxonomy is used as a “granularity grid” which guides the writer when determining the size. Each topic must then carry metadata from the taxonomy. You shouldn’t step out of the grid and chunk a topic in smaller pieces; if you do so then what’s the point? Reuse possibilities comes naturally when you have a robust taxonomy. There is no need to reuse something smaller than the grid predicts.

How do you develop a taxonomy? Look at SeSAM, which is a pre-defined taxonomy for technical communicators. You can start from SeSAM if you do not want to develop a taxonomy from scratch.

XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I