(Design for implementing OASIS Item 12007)

Part of Plans for DITA Open Toolkit 1.5

Plans for implementing keyref linking in the DITA Open Toolkit

 

What user need will be met by this feature?

Users can use @keys in <topicref> to define keys and utilize those specified keys by @keyref and @conkeyref.

What is the technical design for the change

1. Parse key definition
Key defnitions are defined in ditamap
We will parse ditamap, find all the key definitions during first step of preprocessing. Definitions include three parts, key name, target and content, so key defs should be stored in the form of triplet. Based upon the spec, key defs are defined by using breadth-first parse method. And there should be no duplicated key defs in a single ditamap. When we find a key definition which has been defined before with the same key name, the previous one should be used and the current one should be ignored. This job will be done by GenList module. For the purpose of avoiding to re-parse ditamap in the next step, the content of key definition will also be
stored. We need a data structure to store this kind of data, we can use HashTable<String, String>, target and content can be combined into one sigle string using a stick to seperate them, and this table will be passed down to the other modules in preprocess.
In order to facilitate next related modules, the name of dita file which contains @keyref should also be stored, the list name is keyreflist. @conkeyref can be processed in DebugAndFilter, just replace it with conref. This resolution of conkeyref may demand that conreflist should include the name of the file which contains @conkeyref.

Note:Because the content of key definition mignt contain @conref, we can not simply store the content during the first parse of ditamap files in GenList step. The content should be read when we resolve the keyref. In Genlist module, we only store the key name, target of the key and the location where the key is defined. The key definition location is necessary because the same key name can be defined in different ditamap and we use the breadth-first method to determine which one should be the effective key. We need the key definition location to keep the memorize the effective key and prevent reading the wrong content for the key in later step.

2. Resolve keyref
We prefer to use a new Module to resolve @keyref, name it KeyrefModule for the time being. You can see bullet 4 for details about this module. And where should we put this step in preprocess? Because keyref may pull attributes and content to its element, three modules may be influenced. After considering the MoveMeta, MapPull and TopicPull, we decide to insert it between MoveMeta and Mapref. The reasons are listed as following:
MapPull: This module will pull some link metadata information of topic into <topicref> which refers to it. If KeyrefModule is behind MapPull, metadata information of topic will not be pulled in by MapPull, because MapPull cannot recognize keyref attribute. So it is reasonable to put KeyrefModule before MapPull, and make the MapPull take effect with regard to keyref.
TopicPull: This module will pull some link meta information from the source into target. Note that it parses all the dita topic files. If KeyrefModule is behind it, cross referece between topic using keyref couldn't be processed, because TopicPull cann't recognize keyref attribute as MapPull.
MoveMetaData: This module will move all metadata recorded in map into topics. According keyref spec, content in key definition and elements referencing the key may be combined. If the keyref is used in map and the contents are combined, then metadata may be moved twice or more into a single topic. So KeyrefModule should be after MoveMetaData.

3. Resolve conkeyref
We put the resolution of @conkeyref into DebugAndFilter, reslove all the conkeyrefs and replace them with conref. Leave it to Conref for conref resolution. As proposed in before, conref list includes conkeyref, so conref list doesn't need to be updated in DebugAndFilter.

4. KeyrefModule
For tihs iteration we only consider redirecting link, the problems of combining contents and attributes are left in future iteration. The input of KeyrefModule are one HashTable containing all the information of key definitions and conref list containing all the file using @conkeyref attributes. We may need another class to help this module to fulfill its function, name it KeyrefParser temporary. KeyrefParser, will parse the file containing keyref, and resolve the keyref with related key definition in HashTable depend on key reference's formation. In the future iteration, we will utilize the content in HashTable, but now redirecting link only use the target value recorded in hastable.

Note:The contents of key definition has not been ready for design yet, they should be read from the map where it is defined. We will use a reader to do his job, name it KeyrefReader and design the detail of it when spec for keyref content is ready.

 

More update on design for Milestone 5:

Contradiction between Item #22 and #27

In the spec, the author said that
 Item #22 a. "if the key reference element is empty, matching element content, if any, from the key definition element is used."
 Item #22 b. "If the key reference element is empty, matching element content from the key definition element is used."
According to this we think that if the key reference is not empty, matching element content is not used. However, the Item #27 seems to have a little contradiction. It says, "Content from a key reference element and a key definition element is combined following the same procedures that are used for combining metadata between maps and other maps, and between maps and topics." It seems that no matter whether the key reference has content, the content of key definition and key reference should be combined.

In this design doc we assume that if the key reference is not empty, we also combine the contents.

First collect key definitions using a reader called KeyrefReader, then parse the file which uses keyref attribute. If there is
need to combine matching element content, the information collected can be used.

Matching element content falls into one of two categories:
a) Matching content for key references contained in keyref or conkeyref attributes on elements which do not also carry an href or
mapref attribute (cite, dt, keyword, term, ph, indexterm, index-base, and indextermref and their specializations) is taken from the
keyword or term elements within keywords within topicmeta. If more than one keyword or term element is present, the matching content
is taken from the first element.
b) Matching content for key reference elements that carry the keyref attribute, but which do carry an href or mapref attribute
(author, data, data-about, image, link, lq, navref, publisher, source, topicref, xref, and their specializations) includes all
elements from within the key definition element that are in context within the key reference. Elements that are not in context
within the key reference element directly or after generalization are not included or are filtered out.

Three issues about keyref resolution:
1. keyref is used in <topicref>: 
    <topicref keyref="a">
        <topicmeta>
            <keywords>
                <keyword>efg</keyword>
            </keywords>
        <topicmeta>
    </topicref>
    key definition is like this,
    <topicref keys="a" href="a.dita">
        <topicmeta>
            <keywords>
                <keyword>abc</keyword>
            </keywords>
        <topicmeta>
    </topicref>
Because MoveMetaData is before KeyrefModule, if we combine the content in KeyrefModule, there is no effect in a.dita
(<keyword>efg</keyword> cann't be moved in to a.dita). We can make MoveMetaData aware of keyref to solve this problem.
AS an alternative solution, we can also put keyref resolution before MoveMetaData but don'tcombine contents in this
situation(@keyref in topicref).

2. keyref is used in <topicref> with an href and format="ditamap"
<topicref keyref="mytest"  href="a.ditamap" format="ditamap"/>
If mytest is defined in a.ditamap, mytest=target(a.ditamap), emit a warning ,  remove keyref, reserve href.
If mytest is defined in b.ditamap, mytest=target(b.ditamap), remove href="a.ditamap", reserve keyref

3. keyref is not used in <topicref> or its specialization
The content is combined according to Item #23.
For example    <cite keyref="a">
                <keyword>abc</keyword>
            </cite>   

Attributes combine
Attributes are combined in the same way as attributes on source and target conref elements. When attributes are combined,
the attributes on a key definition element take precedence over the attributes on a key reference element.
id will not be combined or subsituded, keys will not be combined, format and scope which control the effect of href will be
subsitued, other attributes including chunk, copy-to, query, search, navtitle etc are substitued too.


Matching element content and attributes are combined by KeyrefParser. KeyrefReader collects the information of key definitions.

 

What sections of the toolkit will be impacted by the change?

Genlist, DebugandFilter, conref resolution, move meta, map pull, topic pull.

 

See also

none

During stage 1: this table will be passed down to the other modules in preprocess.

By serializing the hashtable to a flat (or XML) file?  I hope so. There is no guarantee that the GenList module is running in the same Java VM as the Keyref module, so it can't just leave the table in memory.

Can the key's content contain references to other files (e.g., in an href)?  If so, how is the base URI for these maintained?

Stage 2: we decide to insert it between mapref and mappull.

Are we sure that mapref can't be affected by the Keyref module? I have a gut feeling that it can, especially because it has to rewrite hrefs of pulled submaps.

 

Deborah Pickett
Moldflow
Melbourne, Victoria, Australia

XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I