Thursday, May 29, 2008

Modularity & Hygiene II

A similar, but distinct situation involves the environment expected by a module when it is activiated:

  • What needs to be set up for the called function
  • What is expected to be unperturbed from a “pristine” environment -- and the definition of that pristine environment.

This places restrictions upon use e.g., the inability to initiate workflows within a page flow for seam.

These restrictions result from an implicit dependency upon the configuration of the calling environment. Since the appropriate configuration is assured if the module was called as planned, there is little checking to verify that the environmental assumptions have been met. Once the nested calling paradigm has been adopted, design choices are biased towards implicit configurations that cannot be easily set without perturbing the calling environment, since B (and A for that matter) can still reference the external environment.

I’ve found a useful way to think of this as being the difference between a linear and a nested calling environment: a “linear environment” would pass parameters in as a single object which contains required values while the nested approach sets up one or more global variables for access by the called functionality.



In the “linear” illustration nothing in the external environment is perturbed, side effects are minimal and any arguments could be copied, modified and passed onto the next module in a sanitary fashion (with the usual caveats around shared “stream-like” objects).

Admittedly there are some times when the nested approach makes the most sense, usually for “stream like” variables, e.g. an initialization step to read in a configuration file, initialize connections etc.. The problem arises when there is no way to spawn a new configuration (or initialize one if you’re called in a different context). In lieu of such an idealized situation, it would be useful, at a minimum, to be able to detect that you’re being called in the wrong context. When it is difficult for a module to decide if is being called in the correct context (or if it elides the context check), it is hard, if not impossible to provide easy to use modules.

Note: I think that the rails trick of overloading the const-missing exception handler is arguably in this space. The magic underlying this functionality wasn’t easy (for me) to find and it surfaced the capability in such a way that it couldn’t be used for my purposes, since it had no introspection capability and only covered the case of exceptions generated during system operation. Note: discovering this also helped answer one aspect of the environment that I had previously found opaque: “why does my new code seem to be loaded in some cases but not in others.” The answer being that the new code (class definitions etc.) was retrieved if the class had not been previously loaded. If it had been previously loaded the const-missing exception would not be generated and the class would not be reloaded.

At some level I have no problem with this implicit ‘environment’ structuring as long as there are ways to determine what environment you’re in and have the ability to spin up a new environment appropriately configured as necessary.

It’s also important to distinguish environment set-up and configuration using well defined files/apis and this is a perfectly reasonable and necessary practice to set up such things as database access, service locations etc in configuration files that are managed by an “environmental processor”

They are usually
  • One shots done at startup, or in response to a specific reconfiguration request.
  • accessed through a well defined API
  • ‘stream-like’ in nature

Again these are issues that I’m experiencing as I spin up my application. I’m not claiming that their solution is either easy or practical, only that it is desirable.

Wednesday, May 14, 2008

Modularity & Hygiene

This post (and the one to follow) discuss the issues of hygiene in coding libraries.
It is prompted by experiences that I’ve had lately in using (primarily xml/jsf) libraries and the pernicious errors that can be introduced by either mixing different libraries or not using them exactly as designed.

A little background: the term hygienic comes from the Lisp community. The short form definition: hygienic code is code that doesn’t have undue interactions with its calling environment.
A lack of hygiene manifests itself in a couple of ways:
  1. Modules trip over each other: The “mix and match” promise of modularity, and widget sets is violated
    • modules work at one point in the development cycle but break after the addition of an apparently unrelated piece of code
    • mixing components from different widget sets requires much care and tweaking, if it can be done at all.
  2. Modules have an implicit order in which they must be called; there is no well defined way to kick off either a new “top level process” or a child process that has its own (unshared) context.

Note: this turned into a pretty long post, so I will cover the second item in a subsequent post.

I’m not claiming that hygienic code is always necessary or even that “hygienic code” == “good code.” Situations vary, and in some cases e.g., device drivers, OS kernels being hygienic may not be worth it. Also as I describe below, I don’t think that completely hygienic systems are currently possible partly due to the limitations of xml.

However, in most situations, the more hygienic the better.

And now on to the discussion.

1 Modules trip over each other
The first issue concerns the problems encountered when modules end up tripping over each other, aka incorporating functionality from one set of module breaks existing functionality. I have seen this mostly in the javascript space (see this post on integrating UI widget sets in seam).

Although I’m working primarily in jboss/seam for this project, it is not a seam issue per se. If anything, the conventions seam uses for its generated code help to alleviate these issues.

As far as I can tell, this “tripping” arises from an inability to cleanly nest scope in the environment. For example, in xml it is hard to insulate oneself from what goes on in the xml around you as exemplified by the inability to nest comments in an xml file. This deficiency, coupled with the fact that many of the widget sets “compile to xml/html” has arguably had the side effect of diminishing concern for hygienic operation within the development community. Combine this with the silent failure aesthetics of JavaScript and you can produce results that are truly painful to debug.

The contrast with Lisp macros (a radically different idea from C macros Hall has a very nice page on the differences plus some nice simple examples of problems and how to avoid them) is striking. Lisp macros represent the most successful instantiation of code generating behavior that I know of. Lisp macros work because of Lisp’s ability to generate new variable names and then bind incoming values to them so they can be used freely. Achieving a similar result without language support for system-wide unique item naming is (very) hard.

The seam/richfaces framework does much to try to minimize this problem e.g., if you look at the source of the page that seam sends to the browser you will see a lot of html with the form id=”competition:tagtDecoration:j_id70” which is a nice try at preventing variable collisions etc.. However, without enforced namespace encapsulation or the use of a system-wide symbol generating facility (e.g., gensym) variable capture is still possible. I also have not been able to find documentation on when new bindings are generated etc. which probably means I’ll have to look at the source someday.

These issues usually occur more often in scripting languages (for the sake of argument I’m including xml, html, xhtml as scripting languages) rather than in compiled ones because compiled languages normally restrict these “global” environment accesses to compile time. Run time access in compiled languages to environment variables is more difficult, and inherently has to address multi-core/multi-processor/multi-systems issues. The end result is that the “external environment” is generally harder to get at in compiled languages and is relegated to “systems level” utilities.

I’m hoping that the next big thing in scripting languages revolves around scoping and environment giving one the ability to specify a particular type of environment for code to run in, set up a safe context for it to run in, etc.

Part II will be covered in my next post