Tuesday, February 26, 2008

Extensible System: Core Software Requirements

In my last post I promised to detail the software requirements that an extensible system for discovery data shares with other Web 2.0 systems -- here they are:

Workflow Orientation: Support for a complete workflow beyond that which is offered by JSF navigation. This workflow must allow the orchestration of multiple events without requiring additional user interaction. Supported workflows may involve sequenced, conditional interaction with multiple back end systems (obviously existing on multiple platforms). For sanity and maintainability, the workflow language should be BPEL (or an slight extension) and should provide the ability to extend predicates and actions using a well designed and understood language (Java, C# etc.).

Integrateable (mashable) data: The data stored by the application (and the results of any analyses performed on the data) are available for repurposing in other applications. Repurposing should be supported in fine grained manner so as to put as few restrictions as possible upon its use.

This has two implications: the first is stable identifiers; the second is restful interfaces which allow data to be retrieved by referencing a static URL.
Note: Restful interfaces have a number of nice side effects: the seam *From trick mentioned below would be much more difficult without a restful interface. Additionally, the speed of rapid prototyping/development of web pages is greatly increased if one can directly access a ‘deep’ page without having to manually negotiate multiple precursor pages.

Event queues: Good workflow/system interaction is facilitated by message queues for guaranteed message delivery. Queuing systems also provide good interface points for logging and analysis tools.

Rules engines: In its most general form a rules engine is a piece of code that evaluates a set of antecedent-consequent pairs e.g., if antecedent then consequent. Given this abstract definition rules engines need to be distributed at a number of places within the product. I see four distinct areas each with its own role

1: Display within a page e.g., should the particular element be displayed corresponding to the ‘rendered’ predicate in JSF. Rules involve availability of data appropriate for display and authentication/authorization restrictions.

2: Predicates involving page flow (JSF, &REST, etc). Rules involve what page gets displayed next?
The jboss seam pages have a very nice convention in which the presence of a *From attribute/value pair allows an editing action, upon completion, to return to the page from which it was launched. Here is an example using peopleFrom
peopleFrom ? ‘PeopleList’ : peopleFrom}.xhtml”

Which will return the the PeopleList page when the editing action has been completed.
3: Security rules for CRUD operations. Rules involve accessing and modifying data.
4: Back end BPEL operations e.g., if the request has been outstanding for more than a week then notify customer support. Rules involve the overall operation of the system.

Logging: Effective debugging of complex systems requires the ability to gather an integrated log for each activity in the chain of events that produces a given result; supporting this requirement in an operational setting requires that all relevant logs can be time-aligned and assembled into a single report for analysis

Monitoring and management: Business rules should be capable of being extended to monitoring system operation: server load, queue depth, latency etc. allowing the system to be ‘self monitoring.’ The use of a common tool permits the maximal number of people to understand its operation.

In addition, interfaces should be provided to allow information to be updated (recached) without bouncing the server. JMX is a reasonable example see also.

Security: Obviously any enterprise system must provide for some level of security minimally with LDAP support, hopefully with out of the box support for OpenID and SAFE. In practice, I would caution against making security/access overly fine grained since it must support people changing their roles in the organization, changes in business processes etc.. The more fine grained your access model the more thought is required to get it right and the greater the probability of getting it wrong.

I have personally found it useful to distinguish reading, writing, and editing data, opening up the reading and dissemination of the information while restricting writing and editing data to specific tools provided for specific stages of the process.

For example, given a standard lab workflow for data collection, analysis, upload and “publication” (to the persons requesting the tests and then the company at large): there is one tool for collecting, analyzing and uploading the data; there is a second set of tools for integrating and viewing the data in a larger context; and there may be a third set of tools for curating and editing data which has been found discrepant.

This rounds out the software requirements for a practical production system. Although these requirements appear (and are) extensive most, if not all, of them appear in a number of enterprise level toolkits. As I said at the beginning of this post: there are clear best practices. A Powerpoint that covers both these posts is available.

No comments: