Monday, January 22, 2007

RDF vs Ontologies

The way I look at it, RDF talks about what you have, Ontologies talk about what you can have. The combination of ontology and the data can then be fed into various reasoning engines to tease out the implications of your data.

This is pretty scary. Given my assumption that the science advances and the thinking about what you can have will change over time, incorporating inferred “facts” leaves one open to fundamental system instability.

My classic example in this is “sorry, we don’t really mean one protein per gene anymore.” The ontology and the implications drawn from inferencing upon the data are wrecked but identifiers for gene, protein, transcription etc are unaffected.

RDF at its most basic, gives you stable identifiers for what you have and allows the declaration of “stable” relationships between these objects. This allows you to communicate clearly about what you have and (possibly) easily version it when the time comes to change (see below). These statements should remain valid even if the ontology which they are thought to be embedded in changes radically.


Thinking at the RDF triple level also allows a low overhead means of versioning your information in a manner analogous to the ZFS file system ( and its built in “copy-on-write” facility.

If I remember correctly this copy-on-write is performed at the disk block level rather than at the file level. This is thought to be the basis for Apple’s “time machine” capability (in the next release of the OS X). The system can just look back and determine the valid blocks at a particular time and reconstitute the file to appear as it did at that time.

A similar functionality could be made to work at the RDF triple level. Triples that changed would be “overwritten” but the old information would still be available with a timestamp of valid from/to dates. It is easy to overlay some provenance information on top (similar techniques are used in data warehouses to allow clear tracking of when information was updated/corrected).

This turns the fine grained structure of RDF into a feature that provides similar advantage to what is seen in these file systems: the incremental disk space required for versioning can be very small -- to keep multiple versions of a documents just requires the incremental diskspace ~= the size of the changes and is independent of the document size.

I have to admit a certain hesitance in saying “RDF is good” (aside from the rdf represented by my initials). I have not implemented an RDF based system, I plan to do one in the next month or so and will update.

No comments: