Wednesday, May 14, 2008

Modularity & Hygiene

This post (and the one to follow) discuss the issues of hygiene in coding libraries.
It is prompted by experiences that I’ve had lately in using (primarily xml/jsf) libraries and the pernicious errors that can be introduced by either mixing different libraries or not using them exactly as designed.

A little background: the term hygienic comes from the Lisp community. The short form definition: hygienic code is code that doesn’t have undue interactions with its calling environment.
A lack of hygiene manifests itself in a couple of ways:
  1. Modules trip over each other: The “mix and match” promise of modularity, and widget sets is violated
    • modules work at one point in the development cycle but break after the addition of an apparently unrelated piece of code
    • mixing components from different widget sets requires much care and tweaking, if it can be done at all.
  2. Modules have an implicit order in which they must be called; there is no well defined way to kick off either a new “top level process” or a child process that has its own (unshared) context.

Note: this turned into a pretty long post, so I will cover the second item in a subsequent post.

I’m not claiming that hygienic code is always necessary or even that “hygienic code” == “good code.” Situations vary, and in some cases e.g., device drivers, OS kernels being hygienic may not be worth it. Also as I describe below, I don’t think that completely hygienic systems are currently possible partly due to the limitations of xml.

However, in most situations, the more hygienic the better.

And now on to the discussion.

1 Modules trip over each other
The first issue concerns the problems encountered when modules end up tripping over each other, aka incorporating functionality from one set of module breaks existing functionality. I have seen this mostly in the javascript space (see this post on integrating UI widget sets in seam).

Although I’m working primarily in jboss/seam for this project, it is not a seam issue per se. If anything, the conventions seam uses for its generated code help to alleviate these issues.

As far as I can tell, this “tripping” arises from an inability to cleanly nest scope in the environment. For example, in xml it is hard to insulate oneself from what goes on in the xml around you as exemplified by the inability to nest comments in an xml file. This deficiency, coupled with the fact that many of the widget sets “compile to xml/html” has arguably had the side effect of diminishing concern for hygienic operation within the development community. Combine this with the silent failure aesthetics of JavaScript and you can produce results that are truly painful to debug.

The contrast with Lisp macros (a radically different idea from C macros Hall has a very nice page on the differences plus some nice simple examples of problems and how to avoid them) is striking. Lisp macros represent the most successful instantiation of code generating behavior that I know of. Lisp macros work because of Lisp’s ability to generate new variable names and then bind incoming values to them so they can be used freely. Achieving a similar result without language support for system-wide unique item naming is (very) hard.

The seam/richfaces framework does much to try to minimize this problem e.g., if you look at the source of the page that seam sends to the browser you will see a lot of html with the form id=”competition:tagtDecoration:j_id70” which is a nice try at preventing variable collisions etc.. However, without enforced namespace encapsulation or the use of a system-wide symbol generating facility (e.g., gensym) variable capture is still possible. I also have not been able to find documentation on when new bindings are generated etc. which probably means I’ll have to look at the source someday.

These issues usually occur more often in scripting languages (for the sake of argument I’m including xml, html, xhtml as scripting languages) rather than in compiled ones because compiled languages normally restrict these “global” environment accesses to compile time. Run time access in compiled languages to environment variables is more difficult, and inherently has to address multi-core/multi-processor/multi-systems issues. The end result is that the “external environment” is generally harder to get at in compiled languages and is relegated to “systems level” utilities.

I’m hoping that the next big thing in scripting languages revolves around scoping and environment giving one the ability to specify a particular type of environment for code to run in, set up a safe context for it to run in, etc.

Part II will be covered in my next post

No comments: