Monday, November 24, 2008

Semantic Interoperability: from Mashups to Inference

My last post looked at semantic interoperability from the standpoint of the CDISC BRIDG model. Thinking back on it, I found that writing it left me with more questions than I had before I started.

I have to admit to being unclear as to what is meant by “semantic interoperability,” since I have heard it used in a number of different ways depending upon the audience. (apparently I’m not the only one: the wikipedia entry on semantic interoperability has the caveat All or part of this article may be confusing or unclear.)

"Semantic interoperability" puts requirements on the data, on the models, and on the processes of using them. How we respond to those requirements implies different interpretations of what it means to be semantically interoperable.

I think that there are three basic ways of using data that are "semantically interoperable".
  • “hands-off” data integration between designated well curated systems--this is the way in which I think it is used most often.

  • “hands-off” data integration between any systems sharing common identifiers e.g., publish an interface and allow anyone to use it.

    • The dual use of integration with any published interface that provides the data that you're looking for - which I think is less common e.g., I'll use any map, or any book information service rather than Google/Amazon (or Yahoo/BN). I haven't seen this that often and I think it sounds a bit sketchy.

  • Using OWL reasoners etc. for inference across systems to generate new information.

The requirements around these things are pretty different, both in data quality and in the congruence of the requisite component models.

The first “hands-off” data integration between any systems sharing common identifiers doesn't really require any similarity of models other than around the key integration point(s). You need the name of the referent of the data, the name of the data item and the format of the returned data e.g., "The first president of the United States"; "date of birth" returned in ISO 8601 format. Of course the more points that you want to make referenceable between the systems, the more the models have to match e.g., your model of US presidents has to contain a way of dereferencing the person and that person's date of birth. The more independently developed and maintained your systems are, the more quickly you want to start using RDF to give you very stable identifiers for your referents.

If the systems are required to do some curation/analysis of the data, the exported models need to match more closely so that you can derive the correct metrics to perform the analysis and understand the relationships between individual data points. A good example of this comes from Nick Malik who points out
So, if you look in a database and you see a purchase order... has it been approved or not? The answer depends on the business unit that created it.

Your models can be in a number of different forms (UML, OWL, etc.) and be wildly divergent from the underlying reality, but if the delusion is shared you can achieve some synergy.

Inference of course requires (at least) a locally full up OWL ontology since that's the only modelling language that permits inference. Models also have to more closely resemble the shared "current best understanding" of reality (which is of course a moving target in a scientific domain) or the resulting inferences will be worthless, or at best amusing.

However, doing an ontology is a big deal (see The Joy of Ontology by Suzanna Lewis for a discussion). The increment of commitment that we're making here is decidedly non-trivial, especially if the domain that we are trying to model is of substantial size.

I think the clinical trial domain is a good example of substantial size. BRIDG took a long time to do, it is still undergoing revision and does not allow inference. I would argue that given the continued refinement of some of the base terms (sex and gender were recently updated), even if there was an ontology, hands-off inference is not something that lies in the near future, simply because the ground doesn't provide a sufficiently firm foundation.

Just for clarity -- this doesn't mean that turning loose an inferencing bot over a sufficiently sized test set would not yield interesting and perhaps even transformative results. It just means that the inferencing would be part of a web research project rather than a production operation.

I could be wrong on this (and gladly so), but I did live through AI winter an I can no longer utter the phrase "sufficiently smart compiler" without irony.

Tuesday, November 11, 2008

Semantic Interoperability: Adverse Events

When reviewing the Bridg Release 2.0 Static Elements report.RTF I did look in a bit of detail at the adverse event model.

Here's a summary:

The AdverseEvent
class is decribed as having the following connections and attributes


  • Association link from class PerformedProductInvestigation

  • Association link from class Subject

  • Association link to class AEOutcomeAssessmentRelationship
  • Association link to class AECausalityAssessmentRelationship
  • Association link to class AEActionTakenRelationship
  • Generalization link to class PerformedObservationResult

  • Association link from class PerformedProductInvestigation

  • Generalization of PerformedActivity adding an evaluationMethodCode attribute; PerformedActivity captures the duration of the activity

  • Association link from class Subject -- the clinical subject (An entity of interest, either biological or otherwise.)

  • Association link to class AEOutcomeAssessmentRelationship
    links the AE to an observation
    For example, recovered/resolved, recovering/resolving, not recovered/not resolved, recovered/resolved with
    sequelae, fatal or unknown

  • Association link to class AECausalityAssessmentRelationship
    links the ae to an observation.
    For example, when an adverse event occurs, a physician may evaluate interventions that may have caused the
    adverse event.

  • Association link to class AEActionTakenRelationship.
    Specifies the link between an adverse event and the steps performed to address it.
    For example, study dose reduced, protocol treatment change, etc.

  • Generalization link to class PerformedObservationResult
    links all observations/protocol deviations etc together with a report.

The AE itself has the attributes:
  • gradeCode

  • severityCode
  • seriousnessCode

  • occurrencePatternCode
  • unexpectedReasonCode

  • expectedIndicator

  • highlightedIndicator
  • hospitalizationRequiredIndicator

  • onsetDate
  • resolutionDate

The end result is a structure that has a formal relationship that is well thought out, should cover all situations and permits systems to interoperate.

In my mind, this is not the same as assuring semantic interoperabillity. For semantic interoperability to really occur, the grade codes must be comparable across sites, hospitalization criteria must be identical (or at least commensurable) etc.. Achieving this comparability requires continuing education and harmonization efforts, constant feedback of metrics to practitioners etc.. It therefore represents a much higher bar.

This is not in any way a criticism of BRIDG. You need something like BRIDG -- a well vetted industry standard -- to be even be able to begin such an attempt. However, true semantic interoperability involves not only the structure of the data, but the data itself.