Diff for Design for Item 12031 CVF support in DITA OT

Fri, 2009-05-15 09:36 by zhou.youyiFri, 2009-05-15 09:37 by zhou.youyi
Changes to Body
Line 1Line 1
-
CVF provides DITA adopters with a method for defining controlled values as part<br />
+
<pre>
-
of their content. This approach has the following benefits.<br />
+
CVF provides DITA adopters with a method for defining controlled values as part
-
1. Adopters can easily define controlled values for the DITA attributes and<br />
+
of their content. This approach has the following benefits.
-
other purposes without special technical knowledge.  <br />
+
1. Adopters can easily define controlled values for the DITA attributes and
-
2. Parnters can share the definition of controlled values so that, when<br />
+
other purposes without special technical knowledge.
-
building their combined content, a common set of filtering and flagging<br />
+
2. Parnters can share the definition of controlled values so that, when
-
values can be applied.  <br />
+
building their combined content, a common set of filtering and flagging
-
3. Tools have the option to validate controlled values for attributes<br />
+
values can be applied.
-
against the definitions.<br />
+
3. Tools have the option to validate controlled values for attributes
-
4. Controlled values can be used to classify content for filtering and<br />
+
against the definitions.
-
flagging at build time but can also scale for retrieval and traversal at<br />
+
4. Controlled values can be used to classify content for filtering and
-
runtime if sophisticated information viewers are available.<br />
+
flagging at build time but can also scale for retrieval and traversal at
-
<br />
+
runtime if sophisticated information viewers are available.
-
After inspecting the specification of CVF proposal, we come to the conclusion<br />
+
After inspecting the specification of CVF proposal, we come to the conclusion
-
that CVF entries are actually a tree-like structure which can be extended at<br />
+
that CVF entries are actually a tree-like structure which can be extended at
-
any level. Values are organzied hiearchically which indicates a<br />
+
any level. Values are organzied hiearchically which indicates a
-
paren-children-like relationship between related values. And upward scheme<br />
+
paren-children-like relationship between related values. And upward scheme
-
extension indicates a sibling-like relationship between original and extended<br />
+
extension indicates a sibling-like relationship between original and extended
-
schema. Thus it is intuitive and straight forward to organize controlled values<br />
+
schema. Thus it is intuitive and straight forward to organize controlled values
-
in a simlar manner, which can be achieved by parsing and merging the controlled<br />
+
in a simlar manner, which can be achieved by parsing and merging the controlled
-
values into a tree structure. That is to merge all schema into a final scheme.<br />
+
values into a tree structure. That is to merge all schema into a final scheme.
-
<br />
+
This tree structure is designed to be generated in GenList module. As the
-
This tree structure is designed to be generated in GenList module. As the<br />
+
module parses and collects different entries and catagorize them into
-
module parses and collects different entries and catagorize them into<br />
+
differenct lists, the analyse for CVF values is performed simutaneously.
-
differenct lists, the analyse for CVF values is performed simutaneously.<br />
+
We intend to use DOM tree as our data structure because it stores structural
-
<br />
+
information about the subject scheme documents which is critical for extending
-
We intend to use DOM tree as our data structure because it stores structural<br />
+
a scheme. SAX is good for parsing documents as text streams but not good at
-
information about the subject scheme documents which is critical for extending<br />
+
dealing with structural operations.
-
a scheme.  SAX is good for parsing documents as text streams but not good at<br />
+
Our approach takes schema maps as input sources, parses them into document tree
-
dealing with structural operations.<br />
+
models and then merge them together to form a complete scheme. The resulting
-
<br />
+
model will be outputed in XML format for usage in debug-and-filter module which
-
Our approach takes schema maps as input sources, parses them into document tree<br />
+
uses controlled values to validate and filter-flag dita elements. Assuming that
-
models and then merge them together to form a complete scheme. The resulting<br />
+
the final output tree is T and the element being processed currently is E,
-
model will be outputed in XML format for usage in debug-and-filter module which<br />
+
initiall T = E = null, let the map currently being parse be M and the element
-
uses controlled values to validate and filter-flag dita elements. Assuming that<br />
+
SAX currently encountered be N, we will modify SAX logics in GenListReader to
-
the final output tree is T and the element being processed currently is E,<br />
+
build our document tree as:
-
initiall T = E = null, let the map currently being parse be M and the element<br />
+
In startElement:
-
SAX currently encountered be N, we will modify SAX logics in GenListReader to<br />
+
if M is a subject scheme map (its root element is subjectScheme)
-
build our document tree as:<br />
+
if E == null, T = E = new document root.
-
<br />
+
if P is of subjectdef type
-
In startElement:<br />
+
for each node X in E's children
-
<br />
+
if X has the same key value as P
-
if M is a subject scheme map (its root element is subjectScheme)<br />
+
E = X, we found an extension point
-
if E == null, T = E = new document root.<br />
+
if no valid extension point found
-
<br />
+
X = new P
-
if P is of subjectdef type<br />
+
add X to E's children
-
for each node X in E's children<br />
+
E = X, need to add X's children to E
-
if X has the same key value as P<br />
+
if P is of schemeref type
-
E = X, we found an extension point<br />
+
add P's href value to waitList as merging A scheme to B scheme is
-
if no valid extension point found<br />
+
equal to merging B to A, thus just leave it to be parsed later
-
X = new P<br />
+
if P is of enumerationdef type
-
add X to E's children<br />
+
X = new P
-
E = X, need to add X's children to E<br />
+
add X to E's children
-
<br />
+
E = X
-
if P is of schemeref type<br />
+
if P is of attributedef or elementdef or defaultSubject type:
-
add P's href value to waitList as merging A scheme to B scheme is<br />
+
X = new P
-
equal to merging B to A, thus just leave it to be parsed later<br />
+
add X to E's children
-
<br />
+
E = X
-
if P is of enumerationdef type<br />
+
In endElement:
-
X = new P<br />
+
if E is not the document root
-
add X to E's children<br />
+
E = E's parent node
-
E = X<br />
+
When gen-list finishes, a merged scheme which contains all valid subject
-
<br />
+
definitions will be constructed in T. We need to output it as a persisten file
-
if P is of attributedef or elementdef or defaultSubject type:<br />
+
for usage in later modules.
-
X = new P<br />
+
Next, in debug-filter module, the resulting document tree will be use to
-
add X to E's children<br />
+
validate and filter/flag topic contents. As CVF defines a hierarchical
-
E = X<br />
+
structure of controlled values which employs contains/contained-by
-
<br />
+
relationships, filter operation applied to &quot;container&quot; subject should also be
-
In endElement:<br />
+
applied to &quot;contained&quot; subjects, e.g. operations applied to elements with
-
<br />
+
@platform=&quot;linux&quot; should also affect elements with @platform=&quot;redhat&quot; since
-
if E is not the document root<br />
+
redhat is a linux. Thus when we are parsing ditaval files, hierarchical
-
E = E's parent node<br />
+
information defined in schema maps need to be considered accordingly. Here the
-
<br />
+
merged scheme is utilized as it contains all information we need. In
-
When gen-list finishes, a merged scheme which contains all valid subject<br />
+
DitaValReader, before putting key-action pair into filterMap, we search for the
-
definitions will be constructed in T. We need to output it as a persisten file<br />
+
attribute binding for the current @att, if a subject scheme is associated with
-
for usage in later modules.<br />
+
this attribute, then add all related descendant subjects to filter map, e.g.
-
<br />
+
suppose @platform is bound to &quot;os&quot; subject, as in ditaval an &quot;exclude&quot; action
-
Next, in debug-filter module, the resulting document tree will be use to<br />
+
is defined for any element with @platform=&quot;linux&quot;, then we need to search for
-
validate and filter/flag topic contents. As CVF defines a hierarchical<br />
+
linux definitions in os subject and add all its values into filter map because
-
structure of controlled values which employs contains/contained-by<br />
+
redhat, suse and ubuntu are linux too:
-
relationships, filter operation applied to &quot;container&quot; subject should also be<br />
+
(Assumptions are the same as above, P is the current element in SAX)
-
applied to &quot;contained&quot; subjects, e.g. operations applied to elements with<br />
+
// We use a cache map CM to accelerate the attribute binding search
-
@platform=&quot;linux&quot; should also affect elements with @platform=&quot;redhat&quot; since<br />
+
CM = new HashMap&gt;
-
redhat is a linux. Thus when we are parsing ditaval files, hierarchical<br />
+
for each element X in T's children
-
information defined in schema maps need to be considered accordingly. Here the<br />
+
// enumerationdef only appears as direct child of root
-
merged scheme is utilized as it contains all information we need. In<br />
+
if X is of enumerationdef type
-
DitaValReader, before putting key-action pair into filterMap, we search for the<br />
+
localname = null
-
attribute binding for the current @att, if a subject scheme is associated with<br />
+
elementname = &quot;*&quot;
-
this attribute, then add all related descendant subjects to filter map, e.g.<br />
+
for each element Y in X's children
-
suppose @platform is bound to &quot;os&quot; subject, as in ditaval an &quot;exclude&quot; action<br />
+
if Y is of elementdef type
-
is defined for any element with @platform=&quot;linux&quot;, then we need to search for<br />
+
elementname = Y's @name
-
linux definitions in os subject and add all its values into filter map because<br />
+
continue
-
redhat, suse and ubuntu are linux too:<br />
+
if Y is of attributedef type
-
<br />
+
S = CM.get(Y's @name)
-
(Assumptions are the same as above, P is the current element in SAX)<br />
+
if S == null
-
// We use a cache map CM to accelerate the attribute binding search<br />
+
put (Y's @name --&gt; HashMap) into CM
-
CM = new HashMap&lt;String, HashMap&lt;String, HashSet&gt;&gt;<br />
+
localname = Y's @name
-
for each element X in T's children<br />
+
continue
-
// enumerationdef only appears as direct child of root<br />
+
Z = find binding in CM with key=P's @att
-
if X is of enumerationdef type<br />
+
if Z == null
-
localname = null<br />
+
Z = do a BFS in T to find @keys == Y's @keyref
-
elementname = &quot;*&quot;<br />
+
if Z != null and localname == P's @att
-
for each element Y in X's children<br />
+
for each node V in {Z, Z's children}
-
if Y is of elementdef type<br />
+
if V is of subjectdef type and V's @keys == P's @val
-
elementname = Y's @name<br />
+
for each node Q in {V, V's children}
-
continue<br />
+
put (P's @att = Q's value --&gt; action) into filterMap
-
if Y is of attributedef type<br />
+
if Z != null
-
S = CM.get(Y's @name)<br />
+
S = CM.get(localname)
-
if S == null<br />
+
if S != null
-
put (Y's @name --&gt; HashMap) into CM<br />
+
A = S.get(elementname)
-
localname = Y's @name<br />
+
if A is not empty, then add Z into A
-
continue<br />
+
else
-
Z = find binding in CM with key=P's @att<br />
+
A = new HashSet(Z)
-
if Z == null<br />
+
put (elementname --&gt; A) into S
-
Z = do a BFS in T to find @keys == Y's @keyref<br />
+
else
-
if Z != null and localname == P's @att<br />
+
S = new HashMap
-
for each node V in {Z, Z's children}<br />
+
put (elementname --&gt; new HashSet(Z)) into S
-
if V is of subjectdef type and V's @keys == P's @val<br />
+
put (localname --&gt; S) into CM
-
for each node Q in {V, V's children}<br />
+
When DitaValReader finishes, filterMap contains all possible filter actions.
-
put (P's @att = Q's value --&gt; action) into filterMap<br />
+
The cache map CM we used is also useful in validating properties, so we will
-
if Z != null<br />
+
process it and pass it to debug writer. In DitaWriter:
-
S = CM.get(localname)<br />
+
S = CM.get(P's attribute name)
-
if S != null<br />
+
if S != null and S is not empty
-
A = S.get(elementname)<br />
+
if S.keySet() contains &quot;*&quot;
-
if A is not empty, then add Z into A<br />
+
A = S.get(&quot;*&quot;)
-
else <br />
+
else if S.keySet() contains P's element name
-
A = new HashSet(Z)<br />
+
A = S.get(P's element name)
-
put (elementname --&gt; A) into S<br />
+
if A != null and A is not empty
-
else<br />
+
for each subject tree K in A
-
S = new HashMap<br />
+
do BFS in K for P's attribute value
-
put (elementname --&gt; new HashSet(Z)) into S<br />
+
if not found
-
put (localname --&gt; S) into CM<br />
+
throw a warning that the property value is invalid
-
<br />
+
Modifications will be mainly in GenListModule.java, GenListAndMapReader.java,
-
When DitaValReader finishes, filterMap contains all possible filter actions.<br />
+
DitaValReader.java, DitaWriter.java
-
The cache map CM we used is also useful in validating properties, so we will<br />
+
</pre>
-
process it and pass it to debug writer. In DitaWriter:<br />
+
-
<br />
+
-
S = CM.get(P's attribute name)<br />
+
-
if S != null and S is not empty<br />
+
-
if S.keySet() contains &quot;*&quot;<br />
+
-
A = S.get(&quot;*&quot;)<br />
+
-
else if S.keySet() contains P's element name<br />
+
-
A = S.get(P's element name)<br />
+
-
if A != null and A is not empty<br />
+
-
for each subject tree K in A<br />
+
-
do BFS in K for P's attribute value<br />
+
-
if not found<br />
+
-
throw a warning that the property value is invalid<br />
+
-
<br />
+
-
Modifications will be mainly in GenListModule.java, GenListAndMapReader.java,<br />
+
-
DitaValReader.java, DitaWriter.java<br />
+
 
 
Revision of Fri, 2009-05-15 09:37:

Design for Item 12031 CVF support in DITA OT

CVF provides DITA adopters with a method for defining controlled values as part
of their content. This approach has the following benefits.
1. Adopters can easily define controlled values for the DITA attributes and
other purposes without special technical knowledge.
2. Parnters can share the definition of controlled values so that, when
building their combined content, a common set of filtering and flagging
values can be applied.
3. Tools have the option to validate controlled values for attributes
against the definitions.
4. Controlled values can be used to classify content for filtering and
flagging at build time but can also scale for retrieval and traversal at
runtime if sophisticated information viewers are available.
After inspecting the specification of CVF proposal, we come to the conclusion
that CVF entries are actually a tree-like structure which can be extended at
any level. Values are organzied hiearchically which indicates a
paren-children-like relationship between related values. And upward scheme
extension indicates a sibling-like relationship between original and extended
schema. Thus it is intuitive and straight forward to organize controlled values
in a simlar manner, which can be achieved by parsing and merging the controlled
values into a tree structure. That is to merge all schema into a final scheme.
This tree structure is designed to be generated in GenList module. As the
module parses and collects different entries and catagorize them into
differenct lists, the analyse for CVF values is performed simutaneously.
We intend to use DOM tree as our data structure because it stores structural
information about the subject scheme documents which is critical for extending
a scheme. SAX is good for parsing documents as text streams but not good at
dealing with structural operations.
Our approach takes schema maps as input sources, parses them into document tree
models and then merge them together to form a complete scheme. The resulting
model will be outputed in XML format for usage in debug-and-filter module which
uses controlled values to validate and filter-flag dita elements. Assuming that
the final output tree is T and the element being processed currently is E,
initiall T = E = null, let the map currently being parse be M and the element
SAX currently encountered be N, we will modify SAX logics in GenListReader to
build our document tree as:
In startElement:
if M is a subject scheme map (its root element is subjectScheme)
if E == null, T = E = new document root.
if P is of subjectdef type
for each node X in E's children
if X has the same key value as P
E = X, we found an extension point
if no valid extension point found
X = new P
add X to E's children
E = X, need to add X's children to E
if P is of schemeref type
add P's href value to waitList as merging A scheme to B scheme is
equal to merging B to A, thus just leave it to be parsed later
if P is of enumerationdef type
X = new P
add X to E's children
E = X
if P is of attributedef or elementdef or defaultSubject type:
X = new P
add X to E's children
E = X
In endElement:
if E is not the document root
E = E's parent node
When gen-list finishes, a merged scheme which contains all valid subject
definitions will be constructed in T. We need to output it as a persisten file
for usage in later modules.
Next, in debug-filter module, the resulting document tree will be use to
validate and filter/flag topic contents. As CVF defines a hierarchical
structure of controlled values which employs contains/contained-by
relationships, filter operation applied to "container" subject should also be
applied to "contained" subjects, e.g. operations applied to elements with
@platform="linux" should also affect elements with @platform="redhat" since
redhat is a linux. Thus when we are parsing ditaval files, hierarchical
information defined in schema maps need to be considered accordingly. Here the
merged scheme is utilized as it contains all information we need. In
DitaValReader, before putting key-action pair into filterMap, we search for the
attribute binding for the current @att, if a subject scheme is associated with
this attribute, then add all related descendant subjects to filter map, e.g.
suppose @platform is bound to "os" subject, as in ditaval an "exclude" action
is defined for any element with @platform="linux", then we need to search for
linux definitions in os subject and add all its values into filter map because
redhat, suse and ubuntu are linux too:
(Assumptions are the same as above, P is the current element in SAX)
// We use a cache map CM to accelerate the attribute binding search
CM = new HashMap>
for each element X in T's children
// enumerationdef only appears as direct child of root
if X is of enumerationdef type
localname = null
elementname = "*"
for each element Y in X's children
if Y is of elementdef type
elementname = Y's @name
continue
if Y is of attributedef type
S = CM.get(Y's @name)
if S == null
put (Y's @name --> HashMap) into CM
localname = Y's @name
continue
Z = find binding in CM with key=P's @att
if Z == null
Z = do a BFS in T to find @keys == Y's @keyref
if Z != null and localname == P's @att
for each node V in {Z, Z's children}
if V is of subjectdef type and V's @keys == P's @val
for each node Q in {V, V's children}
put (P's @att = Q's value --> action) into filterMap
if Z != null
S = CM.get(localname)
if S != null
A = S.get(elementname)
if A is not empty, then add Z into A
else
A = new HashSet(Z)
put (elementname --> A) into S
else
S = new HashMap
put (elementname --> new HashSet(Z)) into S
put (localname --> S) into CM
When DitaValReader finishes, filterMap contains all possible filter actions.
The cache map CM we used is also useful in validating properties, so we will
process it and pass it to debug writer. In DitaWriter:
S = CM.get(P's attribute name)
if S != null and S is not empty
if S.keySet() contains "*"
A = S.get("*")
else if S.keySet() contains P's element name
A = S.get(P's element name)
if A != null and A is not empty
for each subject tree K in A
do BFS in K for P's attribute value
if not found
throw a warning that the property value is invalid
Modifications will be mainly in GenListModule.java, GenListAndMapReader.java,
DitaValReader.java, DitaWriter.java

XML.org Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | XML.org | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I