Monday, April 6, 2009

owl:sameAs is a very strong assertion

There's been an interesting discussion on the public-semweb-lifesci mailing list with the subject "blog: semantic dissonance in uniprot" which, appropriately enough, was spurred by a blogpost entitled semantic dissonance in uniprot. This post talks about a uniprot entry which listed a Drosophila (fruit fly) protein sequence as having been isolated from "a young sporophyte contained within a seed."

The point being that although one doesn't find fruit fly genes in plants, following the owl:sameAs link leads directly to that conclusion. This generated a very long, fairly thoughtful and minimally flame based conversation on owl:sameAs and identity in general.

As the discussion progressed, the problem with associating identity across graphs (ontologies/systems of data developed by different organizations) was noted, e.g., (in pseudo annotation) mySystem:itemA owl:sameAs yourSystem:itemX, the issue being that the use of the terms is usually subtly (and often not so subtly) different between the two systems. This problem is especially apparent when making assertions about real objects which exist independently out in the world. For example: "gold" may have a property, but does the property adhere to a single molecule, or a group of gold molecules and if so what characterizes a group of the appropriate size? For example given:
  • A nanotechnology view of gold (still under development)

  • A semiconductor view of gold (probably reasonably well characterized)
  • A jewelry view of gold

what are the precise boundaries of their applicability? The issue doesn't arise in a system developed for nanotechnology, semiconductors, or jewelry. The problems surface only when these systems are linked together.

My thought is that the difficulty centers around the extreme power of owl:sameAs which indicates that things are identical in all contexts. However in the physical world not only is context everything, but context is also inherently incompletely specified.

In practice many of us heuristically treat identity in the physical world as operating as if identity means indistinguishable in this context, with the context being implicitly dependent upon the issue being considered. I would claim that this is the only reasonable way to proceed when reasoning in a practical manner about what is true about particular objects in the world (abstractions can obviously satisfy stronger conditions since they are abstractions -- with the context factored out to any level desired).

In the physical world, we cannot assure that even the ability to track a particular item with unlimited precision would allow us to make statements about that item which would hold through time. For example, although we might make assertions about a particular atom (#0x177FFEAA) of gold and its behavior, some if not all of the assertions may fail under unexpected conditions, e.g., after an event that alters the structure of the nucleus (nuclear collisions, extremely high temperatures etc.). Exhaustively specifying all of these conditions is impractical at best -- which is one of the reasons the phrase ceteris paribus has remained with us for so long.

In my own work, since I never worry about tracking individual atoms. I gravitate toward weak rather than strong assertions of identity, trying to be very attentive to context. This is very much in the spirit of the middle distance as developed in Brian Cantwell Smith's On The Origin of Objects. Smith's point is that our intuitions are well tuned to objects about our size that we interact with frequently. In data integration and architecture work (I had to get there eventually) it implies that integrating across fields that interact to some degree in the "world" is going to be more feasible than integrating across those that don't interact. The give and take of the practical interaction has allowed us to identify the particular features of each item that are important in context.

No comments: