Wednesday, March 26, 2008

Taxonomies, Ontologies and the Semantic Web

A couple of weeks ago I attended C-SHALS 2008 (Conference on Semantics in Healthcare and Life Sciences), one aspect of it that I found striking was the number of people who conflated taxonomies with ontologies -- my initial reaction was to want to post a remark about the confusion and highlight the distinctions (see this for a short set of descriptions of these and related terms).

I’ve instead come to view this conflation as reflecting the pragmatic bias of these systems: if the difference between taxonomies and ontologies isn’t apparent to you, the difference doesn’t matter for what you are trying to do (modulo the assumption that the speakers were competent, but that did appear to be the case). The implication is that such systems require no significant machine based inference across organizations. Significant inference, in this context, would involve something beyond the use of term matching to gather locally related terms/individuals (local vis a vis the terms being matched). Note: although I categorize this as being ‘non-signficant’ that’s only from the standpoint of inference -- these systems do cover most of the Business Intelligence/anlaysis use cases being implemented today.

As you might expect, given this characterization, these presentations involved the aggregation of data from multiple sites, using RDF or taxonomies such as Snomed to link data between sites. This is a good thing -- as I’ve mentioned a number of times having stable identifiers across systems is the key to integration. The system presented demonstrated that useful integration is possible even when the same term e.g., the same Snomed terms, have slightly different meanings in the different organizations. (see How Doctors Think for an anecdotal study of physicians classifying patients).

This is an interesting result: although fully vetted, 100% one-to-one mappings would obviously be preferable, in these systems the value of more data outweighs the penalty imposed by increased noise. Rough quick integration is proving more valuable than detailed integration requiring a thorough analysis of all systems used -- probably because the difference between ‘rough, quick’ and ‘thorough, slow’ is measured in months, if not years.

This is related to a discussion at the conference on the contrast between developing ‘problem specific’ ontologies vs. ‘general use’ ontologies. That is: does taking the time to ‘get it right’ add any value? This is roughly equivalent to the old AI scruffy vs. neat distinction.

Although I wouldn’t go so far as to claim that a general purpose ontology is impossible (at least in some limited domain), I am skeptical that it can be achieved. My concern centers around the fact that when you are constructing a general use ontology it is hard to know where to stop e.g., given a small molecule bioactive compound you should represent the formula and chirality, but what about the (possibly fractional) salt form? or the formulation? what about radioisotopes and their decay rates? subnuclear particles etc. I understand pragmatic stopping points for modeling these issues, but I don’t know how to determine principled ones.

It’s reassuring to see a number of researchers finding pragmatically useful parts of the semantic web, without the need for perfect definitions/ontologies. This, to me is the take-home message: there are a number of useful tools and techniques in the semantic web space, don’t be put off by the thought of merging ontologies and developing a grand unified theory of everything.

Wednesday, March 12, 2008

Porting a Ruby on Rails Application to jboss seam

I did finally port my Ruby on Rails application to jboss seam.

The capsule summary is that it took longer than expected (not particularly unusual for software), looks better than it ever did but still needs some performance tuning.

Some specifics

image display
I went with the seam graphicImage tag (note xmlns:s=”
< 's:graphicImage' value="”#{artwork.thumb}” rendered=”#{not empty artwork.thumb}” < 's:transformImageSize' height="”50”" maintainratio="”true”">
< '/s:graphicImage'>

which I found to be slow -- much slower than doing a normal html < width="”x”/"> tag. (update -- this is due to the use of the 'transformImageSize' tag -- rdf 24 March 2008)

In the RoR code I conditionally chose one of two versions of the image tag
< %= if(@artwork.width_inches > @artwork.height_inches)
image_string = “< id =""> + “\” width=100/>”
image_string = “< id =""> + “\” height=100/>”

which rendered much faster.

The s:graphicImage tag doesn’t appear to be intended for rendering up to 20 images on a page -- my next revision will include the equivalent of the RoR code

data display
I was easily able to get data tables with nicely alternating row colors by adding the following to theme.css (I did it here, since the ‘alternating colors’ should vary with the theme)
.table-even { background-color: #ffffff; } .table-odd { background-color: #eeeeee; }

and then adding the following line to the *List.xhtml files
I did have some minor problems getting it to work since I had overwritten a class that applied to each cell and had given it a background color. I forgot that this cell class would take precedence, but was able to figure it all out with Firebug, an indispensable tool!

seam generator
The generator provided an invaluable starting point and some really nice features e.g., it creates tables with sortable columns for the list view. The table defaults to displaying the ID of nested objects, but it was trivial to change it to display something more appropriate while maintaining the expected sort behavior.

ajax suggestions
My goal was to consistently work within the framework. This occasionally put me at a level of abstraction above which I was comfortable (the operational metaphor being “trying to do X while wearing thick gloves”). This caused me to have a much more difficult time doing a drop down suggestion menu than expected e.g.,

< id="”roleDecoration”" template="”layout/edit.xhtml”">
< name="”label”">role
< value="”#{peopleHome.instance.roles}”">mmediate=”true">

since I was having a bit of a problem finding the exact ‘magic location’ for placing the “” tag relative to the tag, aka it all ‘just works’ if you have everything placed ‘just right

Being at a higher level of abstraction also forced some patterns that I found difficult to work around. For example, in this trail system an Artwork object has a framed? attribute backed by a boolean. The behavior that I wanted in the listing ‘query by example’ code was to either
to return framed artworks if the framed? checkbox was checked
to return all artworks if the framed? checkbox is not checked.
However, I could neither come up with a way to have elements on the restriction list take multiple parameters nor to return different restriction lists depending upon the query

my notes at that point say:
if you do this
List restrictionList;
if ((this.artwork != null)
&& (this.artwork.getShipable() != null)
&& this.artwork.getShipable().booleanValue()) {

restrictionList = Arrays.asList(SHIPABLE_RESTRICTIONS);
else {
restrictionList = Arrays.asList(RESTRICTIONS); }

You break the transaction model

I was able to fix this by adding this line to the RESTRICTIONS
artwork.framed in (true, #{ artworkList.artwork.framed})
which obviously will not generalize beyond the boolean case.


I found that the Hibernate documentation was very useful (when I took the time to read it in detail aka RTFM)
The expression language used is described reasonably well here

When I moved to a new version of richfaces to get suggestion boxes working it broke other minor portions of the page layouts. Although not a big deal, I found it disconcerting. Building this rich functionality in the browser is cool and all, but it feels fragile and is causing me to think about trying out flex (or air, its latest incarnation)

unit testing
I used HTMLUnit since it tests the complete end-to-end interaction.
Although I appreciate the ability to do faster, more thorough testing via mocks, I found that they gave me yet another thing to configure and wouldn’t give me the full end-to-end functionality that I was looking for.

I think that jboss/seam will likely prove useful. I have one other application that I’m building as a precursor to the extensible discovery system My biggest area of concern is the ability to do a good UI in this space which might prompt me to investigate the air/flex framework(s) at some point in the near future.