STC 2006, day 1 - so far so good

Blogging from the conference center at STC 2006, where we just had a great keynote presentation from Vinton Cerf and Robert Kahn, two of the founders of the Internet.  Got into an interesting discussion afterwards about how to manage the uniqueness of IDs - a problem I'm still trying to understand in the context of DITA. While a globally unique identifier clearly has advantages, including persistence and reliability, it does have some disadvantages, such as length and lack of scoping. Does the ID need to be globally unique, if all references are scoped by membership in a plugin architecture, as in Eclipse?

One possibility is to move over time from less formal to more formal definitions or identities for the content, as the needs for more formal management emerge: but I suspect this isn't terribly attractive to someone expecting the id to be a permanent and persistent content identifier. If it shifts over time, how can it claim to be the content's identity? I'll be doing some thinking on this, hopefully in the context of our DITA 1.2 work on key-based referencing.

In the meantime, I've uploaded the presentation I'll be giving this afternoon:

Comments and feedback on either the identity issue or the presentation are welcome.

We do need the ability to address content uniquely, but a single identifier may not do the trick.

There's a part of the identifier for finding the right topic or subject, but then additional parts that have to do with the version and possibly with the configuration of the version (what conditions are set) and even something to relate various stages of a production and presentation pipeline.

So if the unique identifier refers to a particular unit of information at a particular time in a particular format and configuration, then there's the inverse challenge of identifying (and navigating in) a grouping that consists of a uniquely identified unit and all its relatives. And otherwise, the "unique identifier" actually is somewhat ambiguous until the remaining factors are specified.

The reason this works for the Internet is that there's a well-defined interface that the identified object is participating in. So we'd need to define a standard interface for a unit of information, and then we could give an identifier to each participant in that interface.

Bruce Esrig Information Architect Lucent Technologies Focus Areas: BPEL | DITA | ebXML | IDtrust | OpenDocument | SAML | UBL | UDDI
OASIS sites: OASIS | Cover Pages | | AMQP | CGM Open | eGov | Emergency | IDtrust | LegalXML | Open CSA | OSLC | WS-I