Wednesday, March 3, 2010

Hubs & Connectors

I recently stumbled upon the composite software site and was impressed by their architecture. It is a virtualized/federated solution that reminds me of the Hub/Connector system which I had proposed as a data integration model for the drug discovery/cheminformatics space.

The advantages of such an architecture over a conventional data warehouse include:

  • There is no requirement to perform a complete mapping of the data. This allows focus upon solutions that address the particular problem at hand and the mappings required to solve it. Such a focus is especially important when the data structure and mapping rules are in a state of flux for part of the system. It allows the high flux areas to be avoided.

  • The target data store need not have a structure capable of holding all of the data simultaneously. For example, a target table that would hold all of your CDISC SDTM SUPPQUAL values could require upward of 1000 columns reaching the limits of many common relational databases. On the other hand, the solution for an incremental data set would be an order of magnitude smaller.

  • Only the data of interest is accessed/moved. In systems that only analyze a small set of the data at a time, server size can be reduced substantially.

  • Data need not be moved to a central repository, minimizing duplicative storage space.

Of course there are disadvantages

  • A warehouse allows the precalculation of complex results, imposing little operational delay in retrieving these results.

  • Warehouses can be more easily structured to handle analyses which involved large portions of the dataset.

In scientific domains, it isn't uncommon for new assays, results, etc. to break your current mappings. A virtualized approach minimizes the impact of these problems upon your system and is certainly something to look at if this sounds like your situation.

No comments: