However, in the clinical domain I've become accustomed to thinking about the data in terms of what did we know when? so that it is possible to reconstruct the understanding of a trial at a given point in time. This obviously requires much more extensive tracking.
I recently came across an excellent book on the topic: Temporal Data and the Relational Model by Date, Darwin and Lorentzos. It presents a detailed analysis of the issues involved in working with temporal information using a refreshingly simple example consisting of a few tables of data about parts and their suppliers.
Many systems use
enddates for each row to track when the data has changed, supporting the type of use most relevant to clinical/scientific analysis. However, this technique does not support some interesting situations in the business domain. For example, p 166 of the book shows that given an item with the attributes:
name, status, cityanswering simple questions such as "how long has a supplier been at that address", or "how long has a supplier had that name" requires a begin/end date for each attribute. Thinking through the implications of this issue results in refactoring the model into irreducible components (aka sixth normal form), as described on p 173.
As implied by the term sixth normal form, using the temporal behavior of the data as a design axis can have extensive implications e.g.,
- splitting quantities out in a LIMS system
- splitting out names (especially last names!) in a system that tracks employees, etc..
This implies that it is important to consider the temporal behavior of the data even if a temporal model is not planned for the system as it helps drive scenarios for evaluating the system's response to expected changes e.g., "high flux" items may require optimized interfaces, surface special reporting requirements etc..
Other noteworthy topics in the book include: merging intervals e.g., the two facts that attribute A had value 3 from
t1-t3and has value 3 from
t3-nowshould be merged into a single fact.
There is also a discussion of the
time-from/time-toin the persistent store, vs the
time-from/time-toin the world, which although important in developing systems requirements, doesn't appear to require analysis different in character from what is conventionally performed. My view is that world and storage times are disjoint. In scientific systems there is rarely a reason to worry about world times -- other than referencing the date upon which an operation was performed.
Again, an interesting read, highly recommended (despite their frequent exhortations on how to read the book e.g., "are definitely meant to be read in sequence as written (p51)" or "note carefully" (carefully is used an inordinately large number of times in the text)).
As storage becomes cheaper, the downside of not having a temporal capability will more frequently exceed its implementation cost.