Monday, May 10, 2010

Considerations in developing a middle distance ontology

In my mind there are three essential considerations when developing a middle distance ontology

  1. What are the entities under discussion?

  2. What constitutes the necessary attributes of these entities?

  3. Should these attributes be hidden behind opaque identifiers or should they be an integral part of the entity under consideration?

The first question "What entities are under discussion?" is the easiest to answer: These are the entities that you discuss when performing your activities. If something has never come up as a factor in your activities (and isn't obviously on the horizon) there is no need to consider it.

Patients, trials, compounds, assays etc. are both important and are definitely "ready to hand" in the Heideggerian sense.

The second and third questions "what count as explicit attributes" and "what are the modifiers captured by opaque identifiers" are more subtle and domain specific.

This highlights a core point about the middle distance ontology viewpoint: what's important is what matters to the activity that you are performing. If it doesn't impact what you are doing it should not be modeled in detail. Truncating the detail is what keeps the model's complexity under control.

However, there is one caveat to this "what you know is all you need to know" approach: it is critical to evaluate the likely potential changes to your current situation. Doing this well requires an identification of the scenarios that might impact your operation in the near future and thinking them through in some detail, using the scenarios to pressure test your decisions.

Such a scenario analysis is needed since the ontology (obviously) constitutes a deep structural commitment and any changes at this level are usually both costly and painful.

I would posit the following classifications of the potential changes:

  • Changes in the science: These can be very unpredictable, but often there are precursors consisting of some new "interesting results" in an area. Although the exact resolution of the controversy may not be known, any outline of their structure can help highlight areas of necessary flexibility.

  • Changes in the environment: (mergers etc.) do others in the field think of things similarly. If not, what are the most significant differences?

  • Changes in the business structure: are there any "nearby" functions that would require support in the face of an internal restructuring?

  • Changes in the technology: there are two parts to this:
    • Changes in the computer technology: most likely won't impact your ontology unless you're pushing systems to their limits (more and more unlikely in my experience).

    • Changes in the technology of the systems which you are analyzing: e.g., reactions now produce ten similar but not identical compounds rather than a single compound, suddenly photos become tagged with GPS information etc. Another hint is if you're starting to hear the words "high throughput" in a context in which you've never heard them before.

I will admit that a difficulty of doing this is that it spans all architectural disciplines from application to enterprise, but I don't see any way around it.

My next post will focus on when to hide (attributes) behind an opaque identifier.

No comments: