About: The Semantic Technologies Conference
Fresh from the Semantic Technologies Conference, which was flat-out inspiring. People are solving real problems in powerful new ways.
My colleague John Warren and I were there presenting a framework for content integration and semantic search. Our framework is implemented on the Eclipse framework and can consume Eclipse help plugins. The full capabilities, however, for faceted search, semantic linking, product versioning, scoped views, and user-selected filtering are enabled by the DITA taxonomy / classification specialization. One of the sources of this initiative was an early prototype that Robert Anderson and I presented at CMS 2004, but the team has created something that far outreaches anything we imagined back then. At the conference, the audience had a deep insight into what we're doing. The questions about the realized framework were excellent.
The other talks at the conference focused on work in the areas of classification-based publishing, text analysis, ontological reasoning, smashups (semantic mashups), semantic wikis, integration, and all kinds of related innovations. For one pragmatic example, the Cambridge Semantics people used their Open Anzo semantic application server to republish airports and on-time arrival averages from US Census Bureau spreadsheets on the web and, when a cell changed, refresh the affected parts of the web view in seconds. That worked because the structure and semantics of the data were annotated in the spreadsheet.
The potential contribution of DITA grist to such mills is pretty clear:
- First, when content is tagged with specialized elements, the downstream processing can recognize special content (or, at a minimum, have a more precise basis for text analysis).
- Second and maybe more importantly, when granular articles (aka topics) have a precise focus in subject area and treatment, the links between articles indicate meaningful relationships between the subjects covered by those articles. DBPedia proved that by mining the links between Wikipedia articles as processable formal definitions.
In short, the better constructed your source content, the more you have to work with downstream. I suspect we'll be seeing some interesting blends of semantic technologies and traditional XML publishing as these communities start noticing one another.