Design for implementing OASIS Item 12026

What user need will be met by this feature?

Users can define various glossary entries which serve as specific terms or acronyms. Such glossary entries will be expanded to appropriate form according to the context automatically.

What is the technical design for the change?

Key definitions and references are already prepared during key definition parsing process, so we mainly focus on how to resolve keyref accordingly in appropriate contexts. Here we have several points about while resolving the keys for glossary terms:

1. How to determine in what context a key is referenced? The description on how to distinguish introductory context from other contexts in “DITA Proposed Feature #12026 and #12038” is not clear enough to define all possible contexts that would be encountered. Thus we classify the contexts into three specific situations for the time being. More accurate definition and classification of different contexts should be considered thoroughly and added later.

a) The very first appearance of a keyref pointing to a certain glossentry in a deliverable book context should be expanded to its surface form; otherwise an acronym form (or other forms appropriate) should be used. Intuitively, the very first topic entity that we encounter during preprocess (either directly in a topic dita file or indirectly referenced in a ditamap file) would be an introductory context for the whole book. Thus, all keyrefs appearing in this first topic should be expanded to surface form. Keyrefs in other contexts are replaces with other appropriate forms of the corresponding glossentry. Here is a potential pitfall. The same target (glossentry) may appear for many times even in the same topic, and might be associated with and referenced by different keys. For example, key “A” and key “B” may be associated with the same target file “glossary.dita”. Within the same topic “A” and “B” are both referenced though, they should be treated differently. The one appearing first possibly deserves an expansion but the other doesn’t. This case requires that we should consider all keyrefs resolved to the same target in one place. These context process lies in logic of keyref resolution java code which determine what text are resolved for keyref. We need to override logic for normal keyref and add code for glossary specific logic

b) The terms appearing in online documents should have a hover tooltip of its surface form. Unlike the processing of situation (a), terms in online documents doesn’t need to be expanded and it’s also hard to determine which context is introductory or not because user may access the pages in any possible sequence. Thus we think using a preferred form defined in glossentry as the term’s appearance and its surface form as a hover tooltip would be enough. Adding hover tips majorly deals with logic in xsl which generate link for the keyref source. We need to add code to not only get the target of the link but also get the hover tips. The current code only gets the link target.

c) Terms appearing in copyright declaration should be expanded to its surface form.

Note: Other contexts should be considered, as a warranty related context may need surface forms of certain terms.

2. How the contents of glossentries are saved and used. We prefer to use a lazy-loading mechanism as follows:

a) First we need a Hashtable to store the target that is defined with a key as the hash-key and an appropriate form of the term as the hash-value. For example:

 

<glossentry keys="aKey" href="glossgroup.dita#aTerm"/>

 

defines a key with target “glossgroup.dita#aTerm”. In glossgroup.dita a glossentry is defined as:

 

<glossentry id="aTerm"> <glossterm>A term</glossterm> <glossBody> <glossSurfaceForm> A term that should be resolved (aTerm) </glossSurfaceForm> <glossAlt> <glossAcronym>AT</glossAcronym> </glossAlt> </glossBody> </glossentry>

 

Then the entry in Hashtable would be stored as (glossgroup.dita#aTerm, corresponding form). The corresponding form refers to a string combining the surface form and other preferred forms together separated by a stick. The reason why “aKey” is not used as hash-key is mention above. Since multiple keys could refer to the same target, it would be a waste of memories to store different keys with the same value and it may also cause problems in detecting whether it’s the first appearance of a certain term.

b) When a keyref is being resolved, first we check that if there is an existing entry in the Hashtable by the target (e.g. glossgroup.dita#aTerm) associated with the reference key (e.g. aKey). If no such entry exists, the target should be parsed and corresponding surface form together with other preferred forms should be loaded and saved with the key into the Hashtable. According to the context this keyref is being parsed, appropriate form is applied.

c) If there is an existing entry in the Hashtable, appropriate form is retrieved and used.

Update: a new reader is need for parsing glossentries to obtain corresponding terms or forms.

Review: After some more specific discussion, we decide to put the implementation in xslt processing stage because xslt is able to do so and we don't want to mess up with codes too much. @xml:lang is taken in consideration as another preference parameter during xslt processing. If no keys are specified when referencing a glossary, we will try to guess what term would be approperiate accoring to context AND @xml:lang. The preferred lang specified as @xml:lang in <abbreviated-form> or <glossref> is examined to see if we can find a matching language term. If no match found, a term with @xml:lang set to "en" is used. If no term has @xml:lang set to "en", the first term with no @xml:lang is adopted.

What sections of the toolkit will be impacted by the change?

KeyRef resolution, XSLT transformation

XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I