How is a topic signaling its content?

In pursuit of the ultimate techCom information architecture

The first thing a user does when viewing a topic found in a search system, is not to read it, but evaluate if it contains the answers the user is looking for. How can we help our users determine if the found topic contains the answers? Let’s explore the how, but let’s first take a look at the search process in detail.

Users of technical documentation are searching, not reading, to find answers. When searching they express keywords. A user enters keywords in a search engine to get a list of topic, tries to find facets in a faceted search environment that matches the keyword or scans a table of contents to find occurrences of the keyword. The user picks out a topic to go to and the result is a snippet of text (our topic) on the screen or on paper. The user starts to quickly skim the topic to find out if it actually contains the answers. The user is not reading, but skimming the topic for occurrences of keywords in the topic title, short description, text body etc. If the user concludes that the topic is not containing the answers, the user starts to search for another topic in the same or in another search system. If the user concludes that the topic contains the answers the user does a more detailed reading to actually understand and incorporate the knowledge embedded in the topic. This is a proposed search process for user of technical communication based on John Guthries search process in school text books (for more information see 2010 winter issue of Communicator).

A user has a search vocabulary and there are not two users having the same vocabulary. The “size” and content of a search vocabulary depends on many factors such as age, experience, gender, mental model etc etc. A note here is that users opening up topics by clicking a component in a context sensitive software interface, are actually not searching.

How is a topic signaling its content? This is what this blog post is all about. It’s all about metadata, classification etc.

A topic is treating something in the real world. Let’s call it the subject. It is often advised that a topic shall treat only one (1) subject. The topic is then describing the subject from various angles. One way of helping the user to determine what a topic is all about is to mark up all key words related to the subject by using <keyword> and format all keywords in bold or italic. Then the user can focus on the highlighted keywords to determine the topic content.

But there is a more profound way of telling the user what a topic is all about, then highlighting parts of the topic content. It’s about classification. Classifying content means to put a label on the content where some of the labels do not appear in the topic. Providing the classification can help the user determine if the topic contains the answers. But how do we make the topic classification visible to the user? Here is where the taxonomies come into play. Let’s first explore topic classification and then look into various ways of providing the classification to the user.

Developing taxonomies. Metadata is used to classify content into topics. As discussed in previous blog posts, the metadata taxonomies should preferable be defined before starting to create topics. The taxonomies mirror the type of content your end users need, and the most important task as a technical communicator is to develop user centered taxonomies. The type of information end users need must be assessed from what they can/must do (goals), assumed prior knowledge etc. Since there are not two users having the same search vocabulary, you may consider to develop one set of taxonomies per user! This is of course not possible, but you may consider to develop separate set of taxonomies for different markets, customer types etc.

Consider the following taxonomies for a technical product, the CO2 gas detector:

A task oriented taxonomy:

  • Task
    • Installing
      • Assembling 
      • Mounting
      • Connecting
    • Operating
      • Switching on
      • Resetting
      • Customizing

A product oriented taxonomy (developed using for example STEP etc):

  • CO2 gas detector
    • Functionality (software)
      • Indication of CO2 level
      • Alarm for toxic level of CO2
    • Hardware
      • CO2 detection module
      • CO2 alarm indication module

And yet another taxonomy:

  • User type
    • Novice
    • Intermediate
    • Advanced

Creating and classifying content into topics. Topics can be identified and created as soon as the taxonomies are available; for example one (1) task topic telling how a novice user is mounting the CO2 detection module or one (1) task topic on how an advanced user is switching on the alarm for toxic level of CO2 or one (1) conceptual topic written for an intermediate user about the possibility to indicate CO2 levels.

The taxonomies help us determine the size of a topic. Each topic is classified to one or several values in each taxonomy. But this is another issue which will be explored in following blog post. Stay tuned!

Topic titles and short descriptions are metadata. The topic title and the short description in DITA is in fact metadata that reveals the topic content. The classification can help us to create the topic title and the short descriptions. Well built taxonomies can help us make topic titles that are extrovert rather than introvert. Extrovert since they are user centered. A problem is when the topic title contains words that are very different from the taxonomy vocabulary. A task topic, classified to “Mounting”, “CO2 detection module” and “Novice”, having a title like “Detection unit” is not revealing the topic content. Instead the topic title would be something like “Mounting the CO2 detection module”.

Showing the complete classification explicitly. We often only use the current node in the taxonomy, which is used for classification, to build the topic title. The parent nodes in the taxonomy (for example Task in the task oriented taxonomy above) are often not supplied to the topic title or short descriptions, but yet they play an important role to allow users in finding out what the topic is all about.

Imagine a case where the topic is not having a manually written title, but a set of written out keywords preceding the topic body:

  • Task: Installing: Mounting
  • CO2 gas detector: Hardware: CO2 detection module
  • User type: novice

The above is telling more than a topic title like “Mounting the CO2 detection module”?

The search system reveals the topic content. The search system is a layer on top of the topics. Examples of a search system is the search engine (like google), the list of facets in a faceted search navigation, the keyword index or the table of contents. A textual or image based search system should be built up from the taxonomies. A search engine should search for keywords not only within the actual topic body but also within the taxonomies (regardless if the classification metadata is internal or external to the topic).

A user traversing a textual search system, like a table of contents or faceted search environment is selecting nodes when drilling down in the hierarchy to pick a topic. The user is selecting a child in each node and each selection means that the user is picking up metadata to help determine the topic content. Let’s say that a user is traversing the task oriented taxonomy above. Selecting “Task”, “Installing” and “Mounting” allows the user to (hopefully) understand that the topic is about the task to mount something when installing the product. This is helpful when the topic content is not saying anything about the task to install something.

A user selecting a topic from a result list, when using a search engine, is not picking up metadata in the same way. Providing the full classification explicitly within the topic, meaning not only showing the classified value but all its parents in all taxonomies, can help a user determining if the topic is the one (when the user is picking a topic from a search result list).

Using a search engine can mean that a user is not finding a topic, even if it exists. Let’s say that the user is entering keyword “Unit” when trying to find the topic on how to mount the CO2 detection module. The user will never find a topic since the word “Unit” is not used by the writer; instead the writer is using “Module”. The search vocabulary is different from the design vocabulary. “Unit” is synonymous to “Module” in this case. A user traversing the table of content may have better chance to succeed since the parent value “Hardware” is visible. Hardware is semantically including both Unit and Module, whereas Module is semantically excluding Unit.

Building one (1) static classification hierarchy. A static hierarchy like a table of contents built from topic titles, where neither the topic titles are written nor the organization of topic are built using a systematic approach, is difficult to find something in.

Consider building one (1) static hierarchy like the table of contents or the topic storage folder structure in a CMS, when a topic is classified into several taxonomies. How would you combine the above taxonomies into one folder structure? For example:

Task

  • Installing
    • Assembling
      • CO2 detection module
        • Task topic:
          • Novice user: Assembling CO2 detection module
          • Intermediate user: Assembling CO2 detection module
          • Advanced user: Assembling CO2 detection module
        • Concept topic: About CO2 detection module
      • CO2 alarm indication module
    • Mounting
      • CO2 detection module
      • CO2 alarm indication module
    • Connecting
      • CO2 detection module
      • CO2 alarm indication module 
  • Operating
    • Switching on
      • Indication of CO2 level
      • Alarm for toxic level of CO2
    • Resetting
      • Indication of CO2 level
      • Alarm for toxic level of CO2
    • Customizing
      • Indication of CO2 level
      • Alarm for toxic level of CO2

Where would a user, looking for conceptual information about the CO2 detection module, start? What is your choice when building one static hierarchy to organize content? Do you prefer a task or product oriented structure? Or does your manual contain both, either as a mix or as separate structures? The worst case is when you are building one organizational hierarchy by combining values from several taxonomies, ad hoc without any systematic approach, meaning that there are no design patterns the user can learn and follow. Finally, do you have a communicated strategy on how to classify and organize topics in your content creation team? Or are the classifying principles in the head of a specific member in the content creation team, making life hard for all the others trying to find a topic?

Furthermore it is possible to classify values in a taxonomy, using another taxonomy. Then we are talking about typing individual metadata (metadata about metadata). Topicmaps allows us to do this. The value "CO2 gas detector" in the product oriented taxonomy above could be classified to a value "Products" defined in another taxonomy. Also relations between subjects in the same or in different taxonomies can be established. In DITA 1.2 the new subject scheme feature is used to create an external classification.

SeSAM is a new design methodology for technical communicators. SeSAM includes a set of pre-defined taxonomies that can help DITA users to determine the topic granularity as well as how to organize and classify topic. 

XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I