## Math Domain Requirements

The fundamental premise of DITA is to provide a framework within which
reusable content may be authored without regard to how that particular
content is rendered on a particular target. DITA strives to be target
neutral in as many ways as possible: thus at the time a topic is
written, no assumptions are made as to whether that topic will be
presented on a web page, in a book or on a poster. Content can thus be
naturally shared between web and print forms. Content can also be
reused in a different context from the one in which it was originally
authored. DITA prefers to record *meaning* over *formatting*.
A mathematical extension to DITA should serve these same goals. In
addition, the math domain should provide comprehensive support to the
advanced user while retaining a low barrier of entry to the casual
user.

## Salient characteristics of mathematical expressions

Mathematical expressions are widely recognized to fall into two major categories, given the names "content" and "presentation" by the MathML specification.

- An expression which falls into the
**content**category preserves the structure of the mathematical formula. Such an expression could be regarded as a "parse tree" for a platform-independent expression language. As such, a "content" expression is actionable: it can be interpreted and acted on by applications. - An expression which falls into the
**presentation**category concentrates on the specification of how a formula should be rendered. Such an expression preserves the relative positioning and alignment of the various elements. A "presentation" expression is*not*actionable, and it would be difficult or impossible to reconstruct a "parse tree" from it.

**Content** expressions leverage an extensive "standard library" of mathematical concepts defined by the OpenMath Society. **Symbols** are the atomic unit of this standard library and are defined in *content dictionaries (CDs)*.
CDs are collected into CD Groups. These CDs define everything from the
"plus" operator to integration to set operations. Expressions
formulated in terms of this standard library are portable across
applications.

The notion of a symbol in OpenMath and MathML is roughly equivalent to
a function declaration in C or an interface declaration in Java. A
symbol is a black box. The inputs and outputs of the box are defined.
Annotations are provided which describe the affect of the black box on
the data. (These are the *commented mathematical properties* and *formal mathematical properties*.) *All*
of these properties, when taken together, describe the meaning of the
symbol. However, the provision of an implementation meeting all of
these criteria is left to the application. There is no mechanism to
specify one or more expressions, *each of which satisfy all of the properties of the symbol.* As such there is no provision in these XML grammars to build platform neutral libraries of reusable **content** expressions which are bound to the symbol. By the same token, there is also no provision to bind **presentation** expressions to the symbol.

The term **symbol**, as used here, is somewhat
ambiguous. In the context of writing an equation to put in a document,
it means "any variable, constant, or function itentifier which appears
in the formula." In the context of MathML and OpenMath, it means "any
item which has been formally defined in a Content Dictionary", which
includes operators like the plus sign in addition to the above. DITA
may add value by allowing the author to explicitly identify the symbols
which require explanation (e.g., most authors will elect to *not*
explain the plus sign, but may wish to define the variables they've
introduced as well as some constants and functions.) In DITA, the term
"symbol" should be applied to anything the author wishes to define,
whether it fits the definition of a MathML/OpenMath symbol or not.

XML representations of mathematical expressions tend to be verbose. This is certainly the case with both MathML and OpenMath. These representations are much more lengthy than their equivalent in StarMath or LaTeX.

To summarize:

- Mathematical expressions fall into two broad categories: content and presentation.
- A content-oriented expression is directly usable by an application.
- A presentation-oriented expression allows fine control over how the expression is rendered.
- Content and presentation oriented expressions are related.
- Content expressions are portable.
- There is no existing portable mechanism to bind either presentation or content expressions to a symbol.
- The definition of a "symbol" by MathML and OpenMath includes several things that authors will want to exclude from a list of symbols, and excludes "identifiers" which they will probably want to include in such a list.
- XML vocabularies for mathematical expressions are typically more verbose than other text based expression languages.

## Desired characteristics of a mathematical domain

A mathematical extension to DITA is unique among the current
domains. In all other aspects of DITA, the only possible target
audience is human. All topics are ultimately written for delivery to
human readers. In the math domain, however, the target audience might
be a computer application. For instance, a named **content** expression in a DITA file may be delivered to a spreadsheet application and used in cell formulas.

In addition, while the primary use of mathematical domain elements may be to include equations in text topics, frequent users are likely to develop formula libraries of frequently used mathematical expressions. Even infrequent users may be put off by the direct inclusion of pages of MathML into their topics. These XML grammars have a low information density and can distract from the authoring task regardless of whether the XML was generated automatically or not. Including this verbose content from an external file is desirable whether or not it is structured such that it can be referenced by other expressions.

A DITA mathematical domain should recognize the existence of content expressions even though DITA itself should not be required to perform any computations. Because DITA is structured, it could be used to associate both presentation and content expressions with symbols (and it is vital that each expression be categorized correctly as content or presentation.) Because DITA is a documentation tool, it can be used to document the expressions themselves. The association and documentation both add value over a bare user-defined content dictionary.

Users may prefer to write their equations in something other than MathML.

There is a very frequent pattern surrounding the use of equations in text. The expression is typically followed by a list of definitions for each of the variables and/or constants used. This list can either be integrated into the following paragraph as prose or it can be more of a tabular format, with one symbol description per line. For example:

y = 5 * x where x is something, and y is something else.

Another frequent pattern in the authoring of documents with mathematical content is the construction of a *common*
table of symbols used in multiple equations. Such a table contains most
(or all) of the symbols used in all of the tables in the article,
paper, or chapter. The table is presented once, rather than after each
equation, in order to conserve space and emphasize that the overall
mathematical work presents a consistent terminology throughout. A
mathematical domain for DITA should support the distillation of such a
table from one or more symbol description lists.

Users should be able to easily refer to a specific symbol used in any equation from the text. The means of this reference should not be verbose and should ensure that the same symbol is always presented in the same way to the reader.

To summarize:

- The simplest use of a math domain will be the direct inclusion of
**presentation**math expressions for inclusion in topics intended for human audiences. - Allowing users to reference a math expression in an external file (via href= or conref=) is the simplest way to include mathematical content in a topic while avoiding distracting verbiage. This also allows users primarily interested in targeting a human audience to assemble a suite of reusable presentation expressions.
- Advanced users can create documented expression libraries which associate both content and presentation expressions with symbols defined in an OpenMath content dictionary.
- If it is easily possible, permit users to author equations via an alternative expression language.
- The math domain should capture the common pattern where the presentation of a formula is associated closely with the definition of the variables used.
- The math domain should provide tools to generate a table of symbols from a list of equations in the document (default should be all equations). Preferably, the author should be able to create more than one such table.
- Reference to symbols used in equations should be easy, and should yield consistent, author-controllable results.

## Distilled Requirements of a Math Domain

- Maintain a low barrier of entry to casual users (expression authors).
- Fully support advanced users (those who define their own symbols).
- Inclusion of math content:
- Every expression must be correctly categorized using the type attribute (type="presentation", type="content", or type="symbol").
- One expression per element.
- Direct inclusion of the expression in the topic requires that the expression be written in MathML.
- Including an expression from another DITA topic is possible via the conref= mechanism.
- Including an expression from an external, non-dita file is possible via the href= and format= pair.
- The href attribute names the external file.
- The format attribute declares the expression language ("openmath", "mathml")

- The DITA processor shall accept and render both content and presentation expressions.

- Expressions and the description of their symbols:
- Associate the expression and symbol descriptions in one element.
- Never render the symbol descriptions from that element.
- Give users an element which will render a tabular set of descriptions (single equation).
- Give users an element which will render an inline set of descriptions (single equation).
- Allow users to select which symbols are described.
- Allow users to provide descriptions of MathML <csymbols> as well as identifiers <mi> or <ci>.
- Give users an element which will render a tabular set of descriptions from a set of equations.

- Association of MathML symbols and expressions:
- Allows authors to control how the symbol is rendered in their document (may provide more than one option to select.)
- Allows authors to specify a platform neutral, machine-readable implementation of a concept (may provide more than one option to select.)
- A math topic will be provided to perform the association.
- Topics have a 1-to-1 relationships to symbols.
- Topics may include zero or more content expressions.
- Topics may include zero or more presentation expressions.
- Documentation of the symbol itself is allowed.
- Each expression may be individually documented.
- Each expression is its own section.

- Symbol reference should be easy and produce consistent output.