Thursday, August 14, 2008

Optimization: Premature and otherwise

My post on structuring database tables made me think again about when (and how) to optimize one's code/design. The caveats about premature optimization are well known and well considered, as are the reasons for not following them slavishly.

In my mind the core questions involve "what are we trying to optimize" and "who cares".

My (obvious?) claim is that we should only spend time optimizing things that have impact upon the high level goals for the project. The reasons for performing an optimization should be articulated and evaluated in this framework. Driving optimizations by focusing on the top level goals sounds obvious but the tradeoffs are difficult to make in practice.

There are legitimate tensions about the proper framework relevant to the analysis. That is, it is all well and good to speak of "strategic business goals," etc. but if the product is unusable, the long term strategy doesn't matter. A strategic focus simply assures that long term impacts are also evaluated when an optimization is considered, e.g., an optimization that greatly increases execution efficiency may or many not be appropriate if it increases the complexity of product installation and set up.

For example, the "narrow table" approach that I've been advocating is designed to support deep changes in the business processes and science over the life the system, without necessitating deep changes in the data model. The "strategic horizon" for such a project is > 10 years (that is total system life of > 10 years) with the expectation that the fundamental data model reflected in the narrow tables will be relatively stable during that time. Even in a rapid prototyping environment with one or more iterations shipping each quarter, the essence of the data model should be fairly stable since fundamental changes to the data model induce data migration efforts which distract from product improvements.

The question is: are there inefficiencies caused by the narrow table approach that will make it overly difficult to achieve a usable product in the short term. My current intuition is to go with the narrow table approach and let either materialized views (which in my knowledge are most easily obtained in Oracle), special database jobs, or a data grid provide the optimizations

That said, one of my personal rules of optimization (based on some experience) is that your intuitions are almost always wrong -- if something is taking too long it is usually worthwhile to benchmark it even if you "know the cause of the problem" (unless the putative fix takes substantially less time than doing the benchmark). The corollary being that if you're worried about something taking too long, and think that you "should be OK" set up a test suite at scale to validate your intuitions going forward, so that you can monitor performance as development proceeds.

This principle holds even if the bottleneck is development time: you need to determine if the problem is with the language, the developer, the user/developer interaction, or the developer's manager (e.g., providing more interrupt driven activity than the developer can handle)? As always, measurement is the key.

No comments: