Monday, July 19, 2010

Middle Distance Ontologies: assays

This analysis of assays is the companion to my previous post on antibodies.

A core bifurcation within assays is between in-vivo and in-vitro assays. I'm entirely ignoring clinical trials etc. since they are a completely different conceptual space.

The main differences between in-vivo and in-vitro assays is that the measurement is more indirect/variable and the delta between planned and actual measurements is much greater in the in-vivo assays.

In both we have
  • the system under test
  • the test response(s) being measured
  • the measurement event (with a potential planned vs. actual component to each)
  • the entity whose impact is being assessed
  • the way this entity was introduced into the system (most important for in-vivo assays)

The system under test

This captures the technology maintaining the experimental conditions, the SOP, the "target", and the readout.
It would therefore seem useful to use four opaque identifiers here:
  • one for the SOP
  • one for the particular technology or system being used for the measurement, e.g., the animal

  • one for the measurement device

  • one for the receptor/disease

The test response(s) being measured.

In normal practice, response types are behind opaque identifiers, e.g., %INH, with an additional qualifier as to the response units. Middle distance thinking does nothing to change this. When it comes to derived data (see also my post on necessary attributes), there are two options which I call
  • "resulted in" (this value "resulted in" this derived result) design.

  • "resulted from" (this value "resulted from" an operation on these results) design in which the transformation that calculated the value is designated by an opaque identifier.

I've seen a number of systems work well in which a more basic value points to a result derived from it -- a "resulted in" design. However, my preference is for the "resulted from" design as it allows the transformation to be more open about the algorithms used and the data points which served as sources of the value. This design allows the result to point back to its source data points (via the opaque identifier), rather than forcing the source data points to designate the derived result. It also permits a many-to-many relationship rather than the many-to-one coerced by the "resulted in" design, albeit with an attendant increase in complexity.

The measurement event.

(The measurement event may include an indication that the actual measurement event deviated in some way from the planned measurement event.)

This one is surprisingly different when viewed from a middle distance perspective. As opposed to the techniques which I'm familiar with from either conventional transactional systems or warehousing efforts, the middle distance approach suggests two factors:
  • hiding the details of the measurement behind an opaque identifier (including equipment operator, time of measurement, deviation from plan)
  • surfacing a flag (again an opaque identifier) to indicate if there were any problems of significance with this measurement.
This delegates the determination of error significance (and its type) to processes more familiar with the unique characteristics of the measurement.

The entity whose impact is being assessed.

Normal practice maps these to opaque identifiers that tie back to sample lots, be they compounds, mixtures, formulations, or natural products.

The way in which this entity was introduced into the system.

In some systems this may be covered by the SOP for the response being measured. However in more complex (in-vivo) systems it is worthwhile to explicitly call this out, since it is easy to imagine the same SOP being performed either with multiple injections or an implantable device.

In summary it appears that "almost all" of the detail is hidden behind opaque identifiers.