I have to admit to being unclear as to what is meant by “semantic interoperability,” since I have heard it used in a number of different ways depending upon the audience. (apparently I’m not the only one: the wikipedia entry on semantic interoperability has the caveat All or part of this article may be confusing or unclear.)
"Semantic interoperability" puts requirements on the data, on the models, and on the processes of using them. How we respond to those requirements implies different interpretations of what it means to be semantically interoperable.
I think that there are three basic ways of using data that are "semantically interoperable".
- “hands-off” data integration between designated well curated systems--this is the way in which I think it is used most often.
- “hands-off” data integration between any systems sharing common identifiers e.g., publish an interface and allow anyone to use it.
- The dual use of integration with any published interface that provides the data that you're looking for - which I think is less common e.g., I'll use any map, or any book information service rather than Google/Amazon (or Yahoo/BN). I haven't seen this that often and I think it sounds a bit sketchy.
- Using OWL reasoners etc. for inference across systems to generate new information.
The requirements around these things are pretty different, both in data quality and in the congruence of the requisite component models.
The first “hands-off” data integration between any systems sharing common identifiers doesn't really require any similarity of models other than around the key integration point(s). You need the name of the referent of the data, the name of the data item and the format of the returned data e.g., "The first president of the United States"; "date of birth" returned in ISO 8601 format. Of course the more points that you want to make referenceable between the systems, the more the models have to match e.g., your model of US presidents has to contain a way of dereferencing the person and that person's date of birth. The more independently developed and maintained your systems are, the more quickly you want to start using RDF to give you very stable identifiers for your referents.
If the systems are required to do some curation/analysis of the data, the exported models need to match more closely so that you can derive the correct metrics to perform the analysis and understand the relationships between individual data points. A good example of this comes from Nick Malik who points out
So, if you look in a database and you see a purchase order... has it been approved or not? The answer depends on the business unit that created it.
Your models can be in a number of different forms (UML, OWL, etc.) and be wildly divergent from the underlying reality, but if the delusion is shared you can achieve some synergy.
Inference of course requires (at least) a locally full up OWL ontology since that's the only modelling language that permits inference. Models also have to more closely resemble the shared "current best understanding" of reality (which is of course a moving target in a scientific domain) or the resulting inferences will be worthless, or at best amusing.
However, doing an ontology is a big deal (see The Joy of Ontology by Suzanna Lewis for a discussion). The increment of commitment that we're making here is decidedly non-trivial, especially if the domain that we are trying to model is of substantial size.
I think the clinical trial domain is a good example of substantial size. BRIDG took a long time to do, it is still undergoing revision and does not allow inference. I would argue that given the continued refinement of some of the base terms (sex and gender were recently updated), even if there was an ontology, hands-off inference is not something that lies in the near future, simply because the ground doesn't provide a sufficiently firm foundation.
Just for clarity -- this doesn't mean that turning loose an inferencing bot over a sufficiently sized test set would not yield interesting and perhaps even transformative results. It just means that the inferencing would be part of a web research project rather than a production operation.
I could be wrong on this (and gladly so), but I did live through AI winter an I can no longer utter the phrase "sufficiently smart compiler" without irony.