Monday, March 29, 2010

Architects As Service Providers

This paper by Roland Faber of Siemens Healthcare recently appeared in IEEE Software. It talks pointedly about how it is more effective for architecture to be structured as a service that provides value by interacting closely with project developers rather than being structured as a function that produces documentation to be followed by the projects. It even advocates that architects perform some hands on coding in the projects (mirable dictu).

It would be impossible for me to agree more, including the part about hands on coding, my fondness for which is pretty obvious given my blog posts.

The article posits that this close, ongoing interaction is a good way of assuring both that projects understand what the architects are trying to accomplish with the architecture and also that the architects develop an appreciation for the practical issues involved in building working software. There is also a side effect of this kind of interaction that they don't mention: its value in preventing the obsolescence of the person doing the architecture.

The standard scenario in this industry is that a person spends a number of years learning their craft and refining their practice at which point, if they're good, they become an architect or manager and stop coding. The architects' (or manager's) skills stay relevant for a few years (five years seems to be a recurring number), after which they become a pointy headed character in a Dilbert cartoon.

As an industry we should be learning from this anti-pattern.

It seems to me a truism that coding keeps you grounded, and current, and keeps this Dilbertization from happening. Certainly at some point other duties e.g., architecture/management require that you take yourself off of the critical coding path, otherwise the success of the project is put in jeopardy, but new technologies or core utilities that are not on a critical path timeline are all fair game.

I strongly believe in this approach and this is the first article I've seen that reflects my personal practice.

Monday, March 15, 2010

Low Level Virtual Machine (LLVM)

LLVM, as described in this article on AppleInsider, stands for Low Level Virtual Machine. It is an open source project that is used and partly supported by apple.

One of the most interesting things about LLVM is a quote at the bottom of page 1 of the article
Apple also uses LLVM in the OpenGL stack in Leopard, leveraging its virtual machine concept of common IR to emulate OpenGL hardware features on Macs that lack the actual silicon to interpret that code. Code is instead interpreted or JIT on the CPU.

This approach makes it very likely that developers will use the hardware optimized instructions. Most other approaches impose significant costs upon the developers, e.g., the need to write additional code to cover every possible hardware configuration. With the LLVM there is no coding penalty, therefore using the optimized routines becomes a no-brainer, resulting in faster code for people with beefier hardware (who are also those who tend to be most worried about performance) and usable code for everyone else.

As background the article points to a presentation by Chris Lattner but I prefer his paper with Vikram Adve LLVM: a compilation framework for lifelong program analysis because it talks in terms I can understand (like "Static Single Assignment").

So here's what's cool: LLVM eats a code representation that is very amenable to optimization and analysis. It optimizes this input and outputs machine code (potentially tuned for the actual hardware which will run the code) decorated to allow low-overhead runtime profiling.


(Click on image for larger version)

This approach permits repeated optimizations based upon recent run-time data rather than generalized heuristics -- it is reminiscent of "hot spot" with larger scope but less immediacy.

I'd be remiss to not mention the strategic implications of this: it allows Apple to radically shift hardware configurations, while restricting the software impact to a relatively small chunk of code c.f. iPad.

Update 21 Aug 2010: Just noticed LLVM got a SIGPLAN award -- well deserved!

Wednesday, March 3, 2010

Hubs & Connectors

I recently stumbled upon the composite software site and was impressed by their architecture. It is a virtualized/federated solution that reminds me of the Hub/Connector system which I had proposed as a data integration model for the drug discovery/cheminformatics space.

The advantages of such an architecture over a conventional data warehouse include:

  • There is no requirement to perform a complete mapping of the data. This allows focus upon solutions that address the particular problem at hand and the mappings required to solve it. Such a focus is especially important when the data structure and mapping rules are in a state of flux for part of the system. It allows the high flux areas to be avoided.

  • The target data store need not have a structure capable of holding all of the data simultaneously. For example, a target table that would hold all of your CDISC SDTM SUPPQUAL values could require upward of 1000 columns reaching the limits of many common relational databases. On the other hand, the solution for an incremental data set would be an order of magnitude smaller.

  • Only the data of interest is accessed/moved. In systems that only analyze a small set of the data at a time, server size can be reduced substantially.

  • Data need not be moved to a central repository, minimizing duplicative storage space.

Of course there are disadvantages

  • A warehouse allows the precalculation of complex results, imposing little operational delay in retrieving these results.

  • Warehouses can be more easily structured to handle analyses which involved large portions of the dataset.

In scientific domains, it isn't uncommon for new assays, results, etc. to break your current mappings. A virtualized approach minimizes the impact of these problems upon your system and is certainly something to look at if this sounds like your situation.