Diff for Design for Item 12031 CVF support in DITA OT

Fri, 2009-05-15 13:39 by robanderMon, 2009-05-18 01:37 by zhou.youyi
Changes to Body
Line 6Line 6
 
--------------------------------------------
 
--------------------------------------------
 
</pre>
 
</pre>
-
<pre>
+
<p>
 
CVF provides DITA adopters with a method for defining controlled values as part
 
CVF provides DITA adopters with a method for defining controlled values as part
 
of their content. This approach has the following benefits.
 
of their content. This approach has the following benefits.
-
1. Adopters can easily define controlled values for the DITA attributes and
+
</p>
-
other purposes without special technical knowledge.
+
<ol>
-
2. Parnters can share the definition of controlled values so that, when
+
<li> Adopters can easily define controlled values for the DITA attributes and
-
building their combined content, a common set of filtering and flagging
+
other purposes without special technical knowledge.</li>
-
values can be applied.
+
<li> Parnters can share the definition of controlled values so that, when
-
3. Tools have the option to validate controlled values for attributes
+
building their combined content, a common set of filtering and flagging
-
against the definitions.
+
values can be applied.</li>
-
4. Controlled values can be used to classify content for filtering and
+
<li> Tools have the option to validate controlled values for attributes
-
flagging at build time but can also scale for retrieval and traversal at
+
against the definitions.</li>
-
runtime if sophisticated information viewers are available.
+
<li> Controlled values can be used to classify content for filtering and
  +
flagging at build time but can also scale for retrieval and traversal at
  +
runtime if sophisticated information viewers are available.</li>
  +
</ol>
  +
<p>
  +
&nbsp;
  +
</p>
  +
<p>
 
After inspecting the specification of CVF proposal, we come to the conclusion
 
After inspecting the specification of CVF proposal, we come to the conclusion
 
that CVF entries are actually a tree-like structure which can be extended at
 
that CVF entries are actually a tree-like structure which can be extended at
Line 34Line 41
 
a scheme. SAX is good for parsing documents as text streams but not good at
 
a scheme. SAX is good for parsing documents as text streams but not good at
 
dealing with structural operations.
 
dealing with structural operations.
  +
</p>
  +
<p>
 
Our approach takes schema maps as input sources, parses them into document tree
 
Our approach takes schema maps as input sources, parses them into document tree
 
models and then merge them together to form a complete scheme. The resulting
 
models and then merge them together to form a complete scheme. The resulting
Line 42Line 51
 
SAX currently encountered be N, we will modify SAX logics in GenListReader to
 
SAX currently encountered be N, we will modify SAX logics in GenListReader to
 
build our document tree as:
 
build our document tree as:
-
In startElement:
+
</p>
-
if M is a subject scheme map (its root element is subjectScheme)
+
<div class="source" style="font-family: '[object HTMLOptionElement]','Consolas','Lucida Console','Courier New'; color: #000000">
-
if E == null, T = E = new document root.
+
<span style="color: #000000">In startElement:</span>
-
if P is of subjectdef type
+
<br />
-
for each node X in E's children
+
<br />
-
if X has the same key value as P
+
<span style="color: #000000">if M is a subject scheme map (its root element is subjectScheme)</span>
-
E = X, we found an extension point
+
<br />
-
if no valid extension point found
+
<span style="color: #000000">    if E == null, T = E = new document root.</span>
-
X = new P
+
<br />
-
add X to E's children
+
<br />
-
E = X, need to add X's children to E
+
<span style="color: #000000">if P is of subjectdef type</span>
-
if P is of schemeref type
+
<br />
-
add P's href value to waitList as merging A scheme to B scheme is
+
<span style="color: #000000">    for each node X in E's children</span>
-
equal to merging B to A, thus just leave it to be parsed later
+
<br />
-
if P is of enumerationdef type
+
<span style="color: #000000">        if X has the same key value as P</span>
-
X = new P
+
<br />
-
add X to E's children
+
<span style="color: #000000">            E = X, we found an extension point</span>
-
E = X
+
<br />
-
if P is of attributedef or elementdef or defaultSubject type:
+
<span style="color: #000000">    if no valid extension point found</span>
-
X = new P
+
<br />
-
add X to E's children
+
<span style="color: #000000">        X = new P</span>
-
E = X
+
<br />
-
In endElement:
+
<span style="color: #000000">        add X to E's children</span>
-
if E is not the document root
+
<br />
-
E = E's parent node
+
<span style="color: #000000">        E = X, need to add X's children to E</span>
  +
<br />
  +
<br />
  +
<span style="color: #000000">if P is of schemeref type</span>
  +
<br />
  +
<span style="color: #000000">    add P's href value to waitList as merging A scheme to B scheme is</span>
  +
<br />
  +
<span style="color: #000000">    equal to merging B to A, thus just leave it to be parsed later</span>
  +
<br />
  +
<br />
  +
<span style="color: #000000">if P is of enumerationdef type</span>
  +
<br />
  +
<span style="color: #000000">    X = new P</span>
  +
<br />
  +
<span style="color: #000000">    add X to E's children</span>
  +
<br />
  +
<span style="color: #000000">    E = X</span>
  +
<br />
  +
<br />
  +
<span style="color: #000000">if P is of attributedef or elementdef or defaultSubject type:</span>
  +
<br />
  +
<span style="color: #000000">    X = new P</span>
  +
<br />
  +
<span style="color: #000000">    add X to E's children</span>
  +
<br />
  +
<span style="color: #000000">    E = X</span>
  +
<br />
  +
<br />
  +
<span style="color: #000000">In endElement:</span>
  +
<br />
  +
<br />
  +
<span style="color: #000000">if E is not the document root</span>
  +
<br />
  +
<span style="color: #000000">    E = E's parent node</span>
  +
<br />
  +
</div>
  +
<p>
 
When gen-list finishes, a merged scheme which contains all valid subject
 
When gen-list finishes, a merged scheme which contains all valid subject
 
definitions will be constructed in T. We need to output it as a persisten file
 
definitions will be constructed in T. We need to output it as a persisten file
 
for usage in later modules.
 
for usage in later modules.
  +
</p>
  +
<p>
 
Next, in debug-filter module, the resulting document tree will be use to
 
Next, in debug-filter module, the resulting document tree will be use to
 
validate and filter/flag topic contents. As CVF defines a hierarchical
 
validate and filter/flag topic contents. As CVF defines a hierarchical
Line 86Line 133
 
linux definitions in os subject and add all its values into filter map because
 
linux definitions in os subject and add all its values into filter map because
 
redhat, suse and ubuntu are linux too:
 
redhat, suse and ubuntu are linux too:
-
(Assumptions are the same as above, P is the current element in SAX)
+
</p>
-
// We use a cache map CM to accelerate the attribute binding search
+
<div class="source" style="font-family: '[object HTMLOptionElement]','Consolas','Lucida Console','Courier New'; color: #000000">
-
CM = new HashMap&gt;
+
<span style="color: #000000">(Assumptions are the same as above, P is the current element in SAX)</span>
-
for each element X in T's children
+
<br />
-
// enumerationdef only appears as direct child of root
+
<span style="color: #000000">// We use a cache map CM to accelerate the attribute binding search</span>
-
if X is of enumerationdef type
+
<br />
-
localname = null
+
<span style="color: #000000">CM = new HashMap&lt;String, HashMap&lt;String, HashSet&gt;&gt;</span>
-
elementname = &quot;*&quot;
+
<br />
-
for each element Y in X's children
+
<span style="color: #000000">for each element X in T's children</span>
-
if Y is of elementdef type
+
<br />
-
elementname = Y's @name
+
<span style="color: #000000">    // enumerationdef only appears as direct child of root</span>
-
continue
+
<br />
-
if Y is of attributedef type
+
<span style="color: #000000">    if X is of enumerationdef type</span>
-
S = CM.get(Y's @name)
+
<br />
-
if S == null
+
<span style="color: #000000">        localname = null</span>
-
put (Y's @name --&gt; HashMap) into CM
+
<br />
-
localname = Y's @name
+
<span style="color: #000000">        elementname = &quot;*&quot;</span>
-
continue
+
<br />
-
Z = find binding in CM with key=P's @att
+
<span style="color: #000000">        for each element Y in X's children</span>
-
if Z == null
+
<br />
-
Z = do a BFS in T to find @keys == Y's @keyref
+
<span style="color: #000000">            if Y is of elementdef type</span>
-
if Z != null and localname == P's @att
+
<br />
-
for each node V in {Z, Z's children}
+
<span style="color: #000000">                elementname = Y's @name</span>
-
if V is of subjectdef type and V's @keys == P's @val
+
<br />
-
for each node Q in {V, V's children}
+
<span style="color: #000000">                continue</span>
-
put (P's @att = Q's value --&gt; action) into filterMap
+
<br />
-
if Z != null
+
<span style="color: #000000">            if Y is of attributedef type</span>
-
S = CM.get(localname)
+
<br />
-
if S != null
+
<span style="color: #000000">                S = CM.get(Y's @name)</span>
-
A = S.get(elementname)
+
<br />
-
if A is not empty, then add Z into A
+
<span style="color: #000000">                if S == null</span>
-
else
+
<br />
-
A = new HashSet(Z)
+
<span style="color: #000000">                put (Y's @name --&gt; HashMap) into CM</span>
-
put (elementname --&gt; A) into S
+
<br />
-
else
+
<span style="color: #000000">                localname = Y's @name</span>
-
S = new HashMap
+
<br />
-
put (elementname --&gt; new HashSet(Z)) into S
+
<span style="color: #000000">                continue</span>
-
put (localname --&gt; S) into CM
+
<br />
  +
<span style="color: #000000">            Z = find binding in CM with key=P's @att</span>
  +
<br />
  +
<span style="color: #000000">            if Z == null</span>
  +
<br />
  +
<span style="color: #000000">                Z = do a BFS in T to find @keys == Y's @keyref</span>
  +
<br />
  +
<span style="color: #000000">            if Z != null and localname == P's @att</span>
  +
<br />
  +
<span style="color: #000000">                for each node V in {Z, Z's children}</span>
  +
<br />
  +
<span style="color: #000000">                    if V is of subjectdef type and V's @keys == P's @val</span>
  +
<br />
  +
<span style="color: #000000">                        for each node Q in {V, V's children}</span>
  +
<br />
  +
<span style="color: #000000">                            put (P's @att = Q's value --&gt; action) into filterMap</span>
  +
<br />
  +
<span style="color: #000000">            if Z != null</span>
  +
<br />
  +
<span style="color: #000000">                S = CM.get(localname)</span>
  +
<br />
  +
<span style="color: #000000">                if S != null</span>
  +
<br />
  +
<span style="color: #000000">                    A = S.get(elementname)</span>
  +
<br />
  +
<span style="color: #000000">                    if A is not empty, then add Z into A</span>
  +
<br />
  +
<span style="color: #000000">                    else </span>
  +
<br />
  +
<span style="color: #000000">                        A = new HashSet(Z)</span>
  +
<br />
  +
<span style="color: #000000">                        put (elementname --&gt; A) into S</span>
  +
<br />
  +
<span style="color: #000000">                else</span>
  +
<br />
  +
<span style="color: #000000">                    S = new HashMap</span>
  +
<br />
  +
<span style="color: #000000">                    put (elementname --&gt; new HashSet(Z)) into S</span>
  +
<br />
  +
<span style="color: #000000">                put (localname --&gt; S) into CM</span>
  +
<br />
  +
</div>
  +
<p>
 
When DitaValReader finishes, filterMap contains all possible filter actions.
 
When DitaValReader finishes, filterMap contains all possible filter actions.
 
The cache map CM we used is also useful in validating properties, so we will
 
The cache map CM we used is also useful in validating properties, so we will
 
process it and pass it to debug writer. In DitaWriter:
 
process it and pass it to debug writer. In DitaWriter:
-
S = CM.get(P's attribute name)
+
</p>
-
if S != null and S is not empty
+
<div class="source" style="font-family: '[object HTMLOptionElement]','Consolas','Lucida Console','Courier New'; color: #000000">
-
if S.keySet() contains &quot;*&quot;
+
<span style="color: #000000">S = CM.get(P's attribute name)</span>
-
A = S.get(&quot;*&quot;)
+
<br />
-
else if S.keySet() contains P's element name
+
<span style="color: #000000">if S != null and S is not empty</span>
-
A = S.get(P's element name)
+
<br />
-
if A != null and A is not empty
+
<span style="color: #000000">    if S.keySet() contains &quot;*&quot;</span>
-
for each subject tree K in A
+
<br />
-
do BFS in K for P's attribute value
+
<span style="color: #000000">        A = S.get(&quot;*&quot;)</span>
-
if not found
+
<br />
-
throw a warning that the property value is invalid
+
<span style="color: #000000">    else if S.keySet() contains P's element name</span>
  +
<br />
  +
<span style="color: #000000">        A = S.get(P's element name)</span>
  +
<br />
  +
<span style="color: #000000">    if A != null and A is not empty</span>
  +
<br />
  +
<span style="color: #000000">        for each subject tree K in A</span>
  +
<br />
  +
<span style="color: #000000">            do BFS in K for P's attribute value</span>
  +
<br />
  +
<span style="color: #000000">            if not found</span>
  +
<br />
  +
<span style="color: #000000">                throw a warning that the property value is invalid</span>
  +
<br />
  +
</div>
  +
<p>
 
Modifications will be mainly in GenListModule.java, GenListAndMapReader.java,
 
Modifications will be mainly in GenListModule.java, GenListAndMapReader.java,
 
DitaValReader.java, DitaWriter.java
 
DitaValReader.java, DitaWriter.java
-
</pre>
+
</p>
 
 
Revision of Mon, 2009-05-18 01:37:

Design for Item 12031 CVF support in DITA OT

Note: This is a design discussion about how to implement Controlled Values Files - a major item in the upcoming DITA 1.2 standard - within the DITA Open Toolkit. For information about this new OASIS feature, please see the approved proposal here: http://www.oasis-open.org/committees/download.php/26359/IssueControlledV... along with the post-approval clarifications: http://wiki.oasis-open.org/dita/scheme_map_clarifications --------------------------------------------

CVF provides DITA adopters with a method for defining controlled values as part of their content. This approach has the following benefits.

  1. Adopters can easily define controlled values for the DITA attributes and other purposes without special technical knowledge.
  2. Parnters can share the definition of controlled values so that, when building their combined content, a common set of filtering and flagging values can be applied.
  3. Tools have the option to validate controlled values for attributes against the definitions.
  4. Controlled values can be used to classify content for filtering and flagging at build time but can also scale for retrieval and traversal at runtime if sophisticated information viewers are available.

 

After inspecting the specification of CVF proposal, we come to the conclusion that CVF entries are actually a tree-like structure which can be extended at any level. Values are organzied hiearchically which indicates a paren-children-like relationship between related values. And upward scheme extension indicates a sibling-like relationship between original and extended schema. Thus it is intuitive and straight forward to organize controlled values in a simlar manner, which can be achieved by parsing and merging the controlled values into a tree structure. That is to merge all schema into a final scheme. This tree structure is designed to be generated in GenList module. As the module parses and collects different entries and catagorize them into differenct lists, the analyse for CVF values is performed simutaneously. We intend to use DOM tree as our data structure because it stores structural information about the subject scheme documents which is critical for extending a scheme. SAX is good for parsing documents as text streams but not good at dealing with structural operations.

Our approach takes schema maps as input sources, parses them into document tree models and then merge them together to form a complete scheme. The resulting model will be outputed in XML format for usage in debug-and-filter module which uses controlled values to validate and filter-flag dita elements. Assuming that the final output tree is T and the element being processed currently is E, initiall T = E = null, let the map currently being parse be M and the element SAX currently encountered be N, we will modify SAX logics in GenListReader to build our document tree as:

In startElement:

if M is a subject scheme map (its root element is subjectScheme)
    if E == null, T = E = new document root.

if P is of subjectdef type
    for each node X in E's children
        if X has the same key value as P
            E = X, we found an extension point
    if no valid extension point found
        X = new P
        add X to E's children
        E = X, need to add X's children to E

if P is of schemeref type
    add P's href value to waitList as merging A scheme to B scheme is
    equal to merging B to A, thus just leave it to be parsed later

if P is of enumerationdef type
    X = new P
    add X to E's children
    E = X

if P is of attributedef or elementdef or defaultSubject type:
    X = new P
    add X to E's children
    E = X

In endElement:

if E is not the document root
    E = E's parent node

When gen-list finishes, a merged scheme which contains all valid subject definitions will be constructed in T. We need to output it as a persisten file for usage in later modules.

Next, in debug-filter module, the resulting document tree will be use to validate and filter/flag topic contents. As CVF defines a hierarchical structure of controlled values which employs contains/contained-by relationships, filter operation applied to "container" subject should also be applied to "contained" subjects, e.g. operations applied to elements with @platform="linux" should also affect elements with @platform="redhat" since redhat is a linux. Thus when we are parsing ditaval files, hierarchical information defined in schema maps need to be considered accordingly. Here the merged scheme is utilized as it contains all information we need. In DitaValReader, before putting key-action pair into filterMap, we search for the attribute binding for the current @att, if a subject scheme is associated with this attribute, then add all related descendant subjects to filter map, e.g. suppose @platform is bound to "os" subject, as in ditaval an "exclude" action is defined for any element with @platform="linux", then we need to search for linux definitions in os subject and add all its values into filter map because redhat, suse and ubuntu are linux too:

(Assumptions are the same as above, P is the current element in SAX)
// We use a cache map CM to accelerate the attribute binding search
CM = new HashMap<String, HashMap<String, HashSet>>
for each element X in T's children
    // enumerationdef only appears as direct child of root
    if X is of enumerationdef type
        localname = null
        elementname = "*"
        for each element Y in X's children
            if Y is of elementdef type
                elementname = Y's @name
                continue
            if Y is of attributedef type
                S = CM.get(Y's @name)
                if S == null
                put (Y's @name --> HashMap) into CM
                localname = Y's @name
                continue
            Z = find binding in CM with key=P's @att
            if Z == null
                Z = do a BFS in T to find @keys == Y's @keyref
            if Z != null and localname == P's @att
                for each node V in {Z, Z's children}
                    if V is of subjectdef type and V's @keys == P's @val
                        for each node Q in {V, V's children}
                            put (P's @att = Q's value --> action) into filterMap
            if Z != null
                S = CM.get(localname)
                if S != null
                    A = S.get(elementname)
                    if A is not empty, then add Z into A
                    else
                        A = new HashSet(Z)
                        put (elementname --> A) into S
                else
                    S = new HashMap
                    put (elementname --> new HashSet(Z)) into S
                put (localname --> S) into CM

When DitaValReader finishes, filterMap contains all possible filter actions. The cache map CM we used is also useful in validating properties, so we will process it and pass it to debug writer. In DitaWriter:

S = CM.get(P's attribute name)
if S != null and S is not empty
    if S.keySet() contains "*"
        A = S.get("*")
    else if S.keySet() contains P's element name
        A = S.get(P's element name)
    if A != null and A is not empty
        for each subject tree K in A
            do BFS in K for P's attribute value
            if not found
                throw a warning that the property value is invalid

Modifications will be mainly in GenListModule.java, GenListAndMapReader.java, DitaValReader.java, DitaWriter.java

XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I