<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-4522919003955435321</id><updated>2012-01-24T16:28:05.763-08:00</updated><category term='data integration'/><category term='xml'/><category term='ontologies'/><category term='hibernate'/><category term='BioIT World'/><category term='jsf'/><category term='RDF'/><category term='hygienic software'/><category term='mysql'/><category term='workflow'/><category term='debugging'/><category term='seam'/><category term='CDISC'/><category term='semantic web'/><category term='jbpm'/><category term='discovery informatics'/><category term='lisp macros'/><category term='RubyOnRails'/><category term='temporal'/><category term='hibernate annotations'/><category term='Amazon Web Services'/><category term='microformats'/><category term='applications'/><category term='iPhone'/><category term='Ruby'/><category term='software'/><category term='richfaces'/><category term='jboss'/><category term='drupal'/><category term='OWL'/><category term='tagging'/><category term='iOS'/><category term='mashup'/><category term='architecture'/><category term='mashupcamp3'/><category term='startups'/><category term='database'/><category term='scientific'/><title type='text'>Software and Architecture</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>76</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-8988527796361630566</id><published>2012-01-24T16:28:00.001-08:00</published><updated>2012-01-24T16:28:05.794-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='startups'/><title type='text'>The Lean Startup</title><content type='html'>&lt;p&gt;I just finished reading Eric Ries' &lt;em&gt;&lt;a href="http://www.amazon.com/Lean-Startup-Entrepreneurs-Continuous-Innovation/dp/0307887898/ref=sr_1_1?s=books&amp;amp;ie=UTF8&amp;amp;qid=1327065831&amp;amp;sr=1-1"&gt;The  Lean Startup&lt;/a&gt;, &lt;/em&gt; and am happy to say  that it was much better than I expected.&lt;/p&gt;&lt;p&gt;When I heard about the book I thought that it only applied to cloud based web startups: the kind of shops that could readily perform &lt;a href="http://en.wikipedia.org/wiki/A/B_testing"&gt;A/B testing&lt;/a&gt; over a weekend, an approach pioneered by Google early on.&lt;/p&gt;&lt;p&gt;My reaction to this was predictable: that's fine when you're making simple changes to web pages, but what if you're doing some heavy technical lifting, e.g., new technology (&lt;a href="http://en.wikipedia.org/wiki/Watson_(computer)"&gt;Watson&lt;/a&gt;), hard technology with deep foundational requirements (&lt;a href="http://www.dropbox.com/"&gt;Dropbox&lt;/a&gt;), etc.&lt;/p&gt;&lt;p&gt;Ries describes something more subtle. He develops a general methodology that allows quantitative evaluation of assumptions, as quickly as possible, with the least possible amount of effort.&lt;/p&gt;&lt;p&gt;Dropbox is one of his most striking examples&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;The challenge was that it was impossible to demonstrate the working software in prototype form. The product required that they overcome significant technical hurdles; it also had an online service component that required high reliability and availability. To  avoid the risk of warming up after years of development with product nobody wanted, Drew did something unexpected easy: he made a video.&lt;/p&gt;&lt;/blockquote&gt;&lt;p&gt;The goal of the video was to &lt;em&gt;validate the assumption that people would actually be interested in such a product&lt;/em&gt;. Of course in the Dropbox case the interest was there, but the book is filled with numerous examples in which it wasn't.&lt;/p&gt;&lt;p&gt;Ries' core approach consists of a large scale feedback loop, shown below (taken from &lt;a href="http://lean.st/principles/build-measure-learn"&gt;here&lt;/a&gt;):&lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;p&gt;&lt;img style="border: 0px initial initial;" src="http://lean.st/images/startup-feedback-loop1.png" border="0" alt="startup-feedback-loop1.png" /&gt;&lt;/p&gt;&lt;p&gt;My two core take aways are&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;&lt;strong&gt;&lt;span style="font-weight: normal;"&gt;&lt;strong&gt;MVP&lt;/strong&gt;&lt;/span&gt;&lt;span style="font-weight: normal;"&gt;&lt;/span&gt;&lt;span style="font-weight: normal;"&gt;--&lt;/span&gt;&lt;span style="font-weight: normal;"&gt;&lt;/span&gt;&lt;span style="font-weight: normal;"&gt;&lt;em&gt;the minimum viable product&lt;/em&gt;&lt;/span&gt;&lt;span style="font-weight: normal;"&gt;, and &lt;/span&gt;&lt;/strong&gt;&lt;/strong&gt;&lt;/li&gt;&lt;li&gt;&lt;strong&gt;&lt;strong&gt;M&lt;/strong&gt;&lt;strong&gt;easure&lt;/strong&gt;&lt;/strong&gt;-- &lt;em&gt;the importance of making clear, tangible predictions ahead of time.&lt;/em&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Both ideas are important in any product that has a customer focus, whether the customers are online consumers or members of an internal department.&lt;/p&gt;&lt;p&gt;They are reminiscent of the techniques my group used years ago to measure uptake of product features by internal departments: if we deployed a page with a particular group in mind, we could see if they continued to use it a few weeks out. If not, we'd visit them and ask what the problem was. There wasn't any need to wait for feature requests (or to hear in a budget meeting that you weren't providing any value). We sought them out and fixed the problem, if at all possible. Monitoring at page level allowed us to make these judgments for small product increments.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;MVP&lt;/strong&gt; is simply getting something in front of people as quickly as possible -- it doesn't need to be working, doesn't need to scale, doesn't need to be fully polished, but it does need to give a feel as to why someone would want to incorporate the product into their life.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Measure&lt;/strong&gt; is more generally important. Ries is perfectly willing to start with measurable  goals that seem almost trivial. The idea being that in short order these goals can eliminate easy workarounds, myths of low hanging fruit, etc.&lt;/p&gt;&lt;p&gt;In one of his examples, he aimed for a revenue of a few &lt;em&gt;hundred&lt;/em&gt; dollars a month to start. A few hundred a month sounds trivial, and certainly isn't sustainable, but after a few months he couldn't even do that.&lt;/p&gt;&lt;p&gt;The point is that if his goal was the few &lt;em&gt;million&lt;/em&gt; dollars a year he'd need for true sustainability, it would have taken him years to realize that he was on the wrong track. This because realistically, even in the best case, the million dollar goal would require a few years to achieve, pushing feedback out to years rather than months.&lt;/p&gt;&lt;p&gt;&lt;em&gt;The Lean Startup&lt;/em&gt; doesn't just focus on startups, it includes of a number of examples from groups within large non-software companies (Proctor and Gamble) and established software companies (Intuit) etc. shows that &lt;em&gt;startup&lt;/em&gt; is more an attitude than a corporate structure.&lt;/p&gt;&lt;p&gt;I highly recommend this book, I don't consider it so much a &lt;em&gt;Lean Startup&lt;/em&gt; book, but an &lt;em&gt;optimized effort&lt;/em&gt; book: How to decide if your efforts are wasting your time or actually taking you in the direction that you want to go.&lt;/p&gt;&lt;p&gt;&lt;em&gt;To be clear: &lt;/em&gt; I don't think that this means that data measured at a fine grained level is the only way of gathering feedback, e.g., it's hard to incrementally identify the best design for a large website c.f., &lt;a href="http://stopdesign.com/archive/2009/03/20/goodbye-google.html"&gt;Goodbye, Google&lt;/a&gt;. However, even in these cases, an MVP is important. It is vital to ground your ideas with your target audience, sooner rather than later (even if it's with a short video of a simulation of your idealized goal).&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-8988527796361630566?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/8988527796361630566/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=8988527796361630566' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8988527796361630566'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8988527796361630566'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2012/01/lean-startup.html' title='The Lean Startup'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-117284259915412470</id><published>2012-01-17T09:45:00.001-08:00</published><updated>2012-01-17T09:45:03.364-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Ruby'/><title type='text'>Watir</title><content type='html'>&lt;p&gt;I know I'm late to this, but I'll share anyway:&lt;br /&gt; &lt;/P&gt;&lt;P&gt;If you're actively developing or supporting  anything with a web front end, take a look at &lt;a href="http://watir.com/"&gt;watir&lt;/a&gt; for automated smoke and regression tests. The  site has a "watir in 5 minutes" page, but probably really takes more like &lt;strong&gt;3&lt;/strong&gt;, it's that well designed.&lt;/p&gt;&lt;strong&gt;Watir&lt;/strong&gt; stands for:&lt;em&gt;Web Application Testing in Ruby&lt;/em&gt;. It is:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Open source&lt;/li&gt;&lt;li&gt;Simple to install: just a ruby library (gem)&lt;/li&gt;&lt;li&gt;All of ruby is available when writing test scripts&lt;/li&gt;&lt;li&gt;Well regarded&lt;/li&gt;&lt;li&gt;Widely used. Here's a partial list:&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;img style="display: block; margin-left: auto; margin-right: auto;" src="http://lh4.ggpht.com/-2RpmeKRTLmY/TxRUEJXzpJI/AAAAAAAAAKU/IWWzMDIlXIw/Watir_users.png?imgmax=800" border="0" alt="Some Watir Users" width="148" height="110" /&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Watir Limitations&lt;/strong&gt;&lt;/p&gt;&lt;UL&gt;&lt;li&gt;“W3C” web only!&lt;/li&gt;&lt;UL&gt;No plug-ins (active-x, flash)&lt;/UL&gt;&lt;li&gt;No recording&lt;UL&gt;Recording tools are available, but not part of the core effort&lt;/UL&gt;&lt;/li&gt;&lt;/UL&gt;&lt;strong&gt;Watir Advantages&lt;/strong&gt;&lt;UL&gt;&lt;LI&gt;Widely used&lt;/LI&gt;&lt;LI&gt;Very easy to use&lt;/LI&gt;&lt;LI&gt;Programmer friendly&lt;/LI&gt;&lt;LI&gt;Ruby is sane&lt;/LI&gt;&lt;LI&gt;Terse without being obtuse&lt;/LI&gt;&lt;/UL&gt;&lt;strong&gt;Interactive Development&lt;/strong&gt;&lt;UL&gt;You can debug scripts interactively using &lt;em&gt;irb&lt;/em&gt;&lt;/UL&gt;&lt;br /&gt;For example, this screenshot shows a section of a browser wind and an irb window about to execute a method to click on the browser's edit button:&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh3.ggpht.com/-681L2EGUYKk/TxWzmxNOCmI/AAAAAAAAAKg/Q86RrdFvy5w/Screen%252520Shot%2525202012-01-16%252520at%2525201.30.17%252520PM.jpg?imgmax=800" alt="Screen Shot 2012 01 16 at 1 30 17 PM" border="0" width="350" height="225" /&gt;&lt;br /&gt;&lt;br /&gt;resulting in:&lt;br /&gt;&lt;br /&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh3.ggpht.com/-4abCwdLCmJw/TxWzneowc3I/AAAAAAAAAKo/IdNY06GY1R4/Screen%252520Shot%2525202012-01-16%252520at%2525201.30.43%252520PM.jpg?imgmax=800" alt="Screen Shot 2012 01 16 at 1 30 43 PM" border="0" width="350" height="257" /&gt;&lt;br&gt;&lt;strong&gt;Page elements are referenced by&lt;br /&gt;&lt;/strong&gt;&lt;UL&gt;&lt;LI&gt; Type and Id/displayed-text/link destination etc&lt;br /&gt;&lt;/LI&gt;&lt;UL&gt;@browser.select_list(:id, 'StatusSelect')&lt;/UL&gt;&lt;LI&gt; or Index&lt;/LI&gt;&lt;UL&gt;@browser.buttons[3]&lt;br /&gt;&lt;UL&gt;Index is obviously not the preferred approach, but it is occasionally necessary, e.g., for pages with four identical “submit”  buttons&lt;/UL&gt;&lt;/UL&gt;&lt;/UL&gt;&lt;p&gt;On a &lt;em&gt;maintainability&lt;/em&gt; note: I recently had to change a script from FireFox (which I'm increasingly becoming disenchanted with) to Chrome. The change only required altering 2 lines of code&lt;/p&gt;&lt;blockquote&gt;&lt;p&gt;#require 'firewatir'&lt;/p&gt;&lt;p&gt;require "watir-webdriver"&lt;/p&gt;&lt;p&gt;and&lt;/p&gt;&lt;p&gt;#    @browser = Watir::Browser.new&lt;/p&gt;&lt;p&gt;@browser = Watir::Browser.new(:chrome)&lt;/p&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;However, I did find the behavior a bit different between Chrome and Firefox: If a button wasn't visible on the page Chrome wouldn't click it. I never had a issue with  this in Firefox. &lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt;, upon further investigation it appears that may be the  &lt;a href="http://groups.google.com/group/watir-general/browse_thread/thread/388a4183b892dd6a?pli=1"&gt;behavior of the newer versions of watir&lt;/a&gt;, rather than a Chrome compatibility issue.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-117284259915412470?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/117284259915412470/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=117284259915412470' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/117284259915412470'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/117284259915412470'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2012/01/watir.html' title='Watir'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/-2RpmeKRTLmY/TxRUEJXzpJI/AAAAAAAAAKU/IWWzMDIlXIw/s72-c/Watir_users.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-5275445549732061043</id><published>2011-10-23T07:43:00.001-07:00</published><updated>2011-10-23T07:43:01.805-07:00</updated><title type='text'>Building and Submitting iOS Apps built using user created static libraries</title><content type='html'>&lt;p&gt; &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;Here are my notes on getting an iOS app submittable to Apple’s app store, which includes an interesting “welcome back to header files” experience. Resolving all the issues required a fair amount of searching and futzing, so I thought I’d put together a synopsis and share:&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;strong&gt;Step 1: Compiling a project referencing a static library I created.&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;&lt;div&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;strong&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;Background: I’m building a suite of apps centered on pdf file display, using the &lt;a href="https://github.com/vfr/Reader"&gt;vfr pdf reader framework&lt;/a&gt; -- the developer, Julius Oklamcak, characterizes it a basic framework, but I found it both robust and easily configurable to fit my needs.&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: 14.0px;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;The first task was to build a library consisting of the vfr pdf reader code so that the code would be factored appropriately. Using this approach, any bugs showing up in the pdf reader require only one code change to fix all the books.&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: 14.0px;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;I used &lt;a href="http://blog.carbonfive.com/2011/04/04/using-open-source-static-libraries-in-xcode-4/"&gt;CarbonFive’s blog post on building and using static libraries&lt;/a&gt; to perform the setup.&lt;/span&gt;&lt;/p&gt;&lt;/strong&gt;&lt;strong&gt;&lt;p style="font: normal normal normal 12px/normal Helvetica; display: inline !important; margin: 0px;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;It took a bit of tweaking to get the library incorporated/recognized by the dependent projects, &lt;em&gt;and, sadly, one of those tweaks came back to bite me later.&lt;/em&gt;&lt;/span&gt;&lt;/p&gt;&lt;/strong&gt;&lt;strong&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: 14.0px;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;The primary change was to put all the projects in a single Xcode workspace so that the library was readily available to the dependent project (apps). Workspace creation, followed putting the library in the frameworks folder, and making the .h files public, made the .h files visible to the dependent project.&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: 14.0px;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;After these changes, the dependent project could&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;compile and run, both on the simulator and on my test device.&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt; &lt;/p&gt;&lt;div&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;strong&gt;Step 2: Archiving the Project&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: 14.0px;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;strong&gt;&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;However, when I went to archive the app (a precursor to submission), the necessary .h files suddenly could not be found.&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: 14.0px;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;There didn’t seem to be any “build level” way around this -- meaning that I couldn’t find a way to eliminate this error that just entailed changing some configuration files. The only solution I could find was to move copies of (or links to) the actual header files being referenced.&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: 14.0px;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;Upon further consideration this appears to be an artifact of the build environment/ObjectiveC language, since even the foundation frameworks are packed with their header files (see below)&lt;/span&gt;&lt;/p&gt;&lt;div&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;img style="display: block; margin-left: auto; margin-right: auto;" src="http://lh6.ggpht.com/-fa-Cdbl3OTo/TqQk9csAbbI/AAAAAAAAAJ0/bWL8SltSsWI/frameworkContents.jpg?imgmax=800" border="0" alt="FrameworkContents" width="300" height="284" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;/strong&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;strong&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt; &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;strong&gt;Step 3: Validating the Project&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: 14.0px;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;At this point the archive could be created without errors, but when I tired to validate and submit I got the following&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: 14.0px;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 13.0px 0.0px; font: 14.0px Arial;"&gt;&lt;span style="letter-spacing: 0px; font-family: 'Courier New'; font-size: 12px;"&gt;&lt;strong&gt;"[projectname] does not contain a single–bundle application or contains multiple products. Please select another archive, or adjust your scheme to create a single–bundle application."&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;This was a quick issue to fix -- the answers appearing on stack overflow&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;a href="http://stackoverflow.com/questions/5206536/archiving-project-in-xcode-incorrectly-creates-multi-application-bundle"&gt;Archiving project in XCode incorrectly creates multi-application bundle&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;The fix required changing the file visibility (back) from &lt;em&gt;public&lt;/em&gt; to &lt;em&gt;project&lt;/em&gt;. Changing file visibility is most easily accomplished by moving the .h files to the appropriate part of the &lt;em&gt;Copy Headers &lt;/em&gt;section of the &lt;em&gt;Build Phases&lt;/em&gt; tab for the target -- I will admit that this approach was advocated it the CarbonFive blog, but it didn’t appeal to my (Java tuned) sensibilities.&lt;/span&gt;&lt;/p&gt;&lt;img style="display: block; margin-left: auto; margin-right: auto;" src="http://lh3.ggpht.com/-IgdpSh-UilQ/TqQn8aEZDWI/AAAAAAAAAKA/0WbeZR-_oEA/h_file_movement.jpg?imgmax=800" border="0" alt="H file movement" width="300" height="235" /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;strong&gt;Step 4: Submitting the Project&lt;/strong&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: 14.0px;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;The fun wasn’t quite over, since an attempt to submit the app resulted in the response &lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;No suitable application records were found.&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: 14.0px;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: 14.0px;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;After setting up the signing appropriately per Apple’s guidelines &lt;a href="http://developer.apple.com/ios/manage/distribution/index.action"&gt;&lt;span style="text-decoration: underline; letter-spacing: 0.0px; color: #0000ad;"&gt;http://developer.apple.com/ios/manage/distribution/index.action&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: 14.0px;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;-- somewhat obscurely located here&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: 14.0px;"&gt;&lt;img style="display: block; margin-left: auto; margin-right: auto;" src="http://lh3.ggpht.com/-Zwt7csQOFXY/TqQn80RrqgI/AAAAAAAAAKI/k8_XR_DWhIM/apple_directons.jpg?imgmax=800" border="0" alt="Apple directons" width="300" height="105" /&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;(BTW am I the only  one finding it frustrating that Apple’s documentation hasn’t been fullly refreshed to reflect the Xcode4 UI?)&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: 14.0px;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;and reading &lt;a href="http://stackoverflow.com/questions/6858260/no-suitable-application-records-were-found"&gt;stack overflow again&lt;/a&gt;,&lt;/span&gt;&lt;/p&gt;&lt;/strong&gt;&lt;strong&gt;&lt;p style="font: normal normal normal 12px/normal Helvetica; display: inline !important; margin: 0px;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;illed out the questionnaire and I was able to submit and upload successfully.&lt;/span&gt;&lt;/p&gt;&lt;/strong&gt;&lt;strong&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: 14.0px;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;/span&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica;"&gt;&lt;span style="letter-spacing: 0.0px;"&gt;Hope this short walkthrough proves useful.&lt;/span&gt;&lt;/p&gt;&lt;div&gt;&lt;span style="letter-spacing: 0.0px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;/strong&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-5275445549732061043?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/5275445549732061043/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=5275445549732061043' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/5275445549732061043'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/5275445549732061043'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2011/10/building-and-submitting-ios-apps-built.html' title='Building and Submitting iOS Apps built using user created static libraries'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/-fa-Cdbl3OTo/TqQk9csAbbI/AAAAAAAAAJ0/bWL8SltSsWI/s72-c/frameworkContents.jpg?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-4896634897383038405</id><published>2011-08-15T08:43:00.001-07:00</published><updated>2011-09-18T16:59:09.944-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='drupal'/><category scheme='http://www.blogger.com/atom/ns#' term='debugging'/><title type='text'>Commenting in html,  tpl.php etc.</title><content type='html'>&lt;p&gt;I came across a very nice commenting convention in the drupal zen theme template files:&lt;/p&gt;&lt;p&gt;They close all of the class and id marked sections with&lt;/p&gt;&lt;blockquote&gt;&lt;pre&gt;&amp;lt;!-- /.{classname} --&amp;gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;p&gt;or&lt;/p&gt;&lt;blockquote&gt;&lt;pre&gt;&amp;lt;!-- /#{idName} --&amp;gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;p&gt; &lt;/p&gt;&lt;p&gt;e.g.,&lt;/p&gt;&lt;blockquote&gt;&lt;pre&gt;&amp;lt;tr class="genesis-page-header-row"&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;p&gt;is closed  with&lt;/p&gt;&lt;blockquote&gt;&lt;pre&gt;&amp;lt;/tr&amp;gt;&amp;lt;!-- /.genesis-page-header-row --&amp;gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;p&gt; &lt;/p&gt;&lt;p&gt;This is nice enough in helping you to keep track of what's going on when editing the code. However, it is invaluable in understanding what might have gone wrong when examining pages in a "modern" inspector.&lt;/p&gt;&lt;p&gt;For example, here's how that section looks in Safari 5.1:&lt;/p&gt;&lt;p&gt;&lt;img style="display: block; margin-left: auto; margin-right: auto;" src="http://lh6.ggpht.com/_uhpaSaKsmiM/TbmoUFJkngI/AAAAAAAAAI8/76czfP9b2sc/Drupal_image_upload.png?imgmax=800" border="0" alt="Drupal image upload" width="600" height="88" /&gt;&lt;/p&gt;&lt;p&gt;making it clear that the you and the browser both agree on the &amp;lt;tr&amp;gt; being terminated by this &amp;lt;/tr&amp;gt; (if you and the browser don't agree, it probably indicates that something has gone awry).&lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-4896634897383038405?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/4896634897383038405/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=4896634897383038405' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/4896634897383038405'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/4896634897383038405'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2011/08/commenting-in-html-tplphp-etc.html' title='Commenting in html,  tpl.php etc.'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/_uhpaSaKsmiM/TbmoUFJkngI/AAAAAAAAAI8/76czfP9b2sc/s72-c/Drupal_image_upload.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-9055149318371393703</id><published>2011-06-26T10:54:00.001-07:00</published><updated>2011-06-26T10:54:53.719-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='drupal'/><category scheme='http://www.blogger.com/atom/ns#' term='debugging'/><title type='text'>Debugging Drupal Access Denied</title><content type='html'>&lt;p&gt;I just finished debugging an &lt;em&gt;Access Denied&lt;/em&gt; problem in Drupal, and wanted to add a couple more tips to those in  &lt;a href="http://www.simonlane.com/site/?q=node/12"&gt;Simon Lane's &lt;em&gt;Drupal Permissions Issues: A Debugging Checklist&lt;/em&gt;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;First: &lt;/strong&gt;assure that the users unable to access the content have been given the appropriate roles.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Second&lt;/strong&gt;: check the permissions of a role that can access the content relative to one that can't -- the following query works in MySQL and is both quicker, and more accurate, than a visual check.&lt;/p&gt;&lt;blockquote&gt;&lt;code&gt;&lt;pre&gt;select permission, module from role_permission &lt;br /&gt;where rid = 5 &lt;br /&gt;and permission not in &lt;br /&gt;(select permission from role_permission where rid = 4)&lt;/pre&gt;&lt;/code&gt;&lt;br /&gt;Where "5" is id of the role that has access, "4" is the id of the role being denied access.&lt;/blockquote&gt;&lt;p&gt;&lt;strong&gt;Third, and most obvious: &lt;/strong&gt;assure that the "inaccessible" content has been published. This was the source of my particular problem (doh!).&lt;/p&gt;&lt;p&gt;&lt;br /&gt;I echo the words of many posters, when I say "It would be nice if Drupal would log the reason for denying access. It would save everyone a lot of time"&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-9055149318371393703?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/9055149318371393703/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=9055149318371393703' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/9055149318371393703'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/9055149318371393703'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2011/06/debugging-drupal-access-denied.html' title='Debugging Drupal Access Denied'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-2318822339050195582</id><published>2011-04-28T10:48:00.001-07:00</published><updated>2011-04-28T10:48:02.187-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='drupal'/><title type='text'>Drupal 7.0: broken image preview</title><content type='html'>&lt;p&gt;I recently decided to use Drupal for one of my projects, since all reports are that it is both well designed and relatively easy to extend.&lt;/p&gt;&lt;p&gt;I'm just getting it set up locally on my MacBook Pro (10.6.7) and ran into a problem with previewing uploaded images: nothing was getting displayed for the preview, even though the images would display fine when saved.&lt;/p&gt;&lt;p&gt;The following errors were being logged.&lt;/p&gt;&lt;p&gt;&lt;img style="display: block; margin-left: auto; margin-right: auto;" src="http://lh3.ggpht.com/_uhpaSaKsmiM/TbmoTiULIjI/AAAAAAAAAI4/OQo0d9pyCow/Access_denied.png?imgmax=800" border="0" alt="Access denied" width="600" height="49" /&gt;&lt;/p&gt;&lt;p&gt;The solution, took a while to find, but was relatively simple: changing the upload location to &lt;strong&gt;Public files&lt;/strong&gt; (see below), cured the problem.&lt;/p&gt;&lt;p&gt;&lt;img style="display: block; margin-left: auto; margin-right: auto;" src="http://lh6.ggpht.com/_uhpaSaKsmiM/TbmoUFJkngI/AAAAAAAAAI8/76czfP9b2sc/Drupal_image_upload.png?imgmax=800" border="0" alt="Drupal image upload" width="600" height="88" /&gt;&lt;/p&gt;&lt;p&gt;(Switching it back recreated the problem) -- seemed worth sharing.&lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-2318822339050195582?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/2318822339050195582/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=2318822339050195582' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/2318822339050195582'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/2318822339050195582'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2011/04/drupal-70-broken-image-preview.html' title='Drupal 7.0: broken image preview'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/_uhpaSaKsmiM/TbmoTiULIjI/AAAAAAAAAI4/OQo0d9pyCow/s72-c/Access_denied.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-8967522623136813606</id><published>2011-04-04T19:27:00.001-07:00</published><updated>2011-04-04T19:27:55.495-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='iOS'/><title type='text'>CoreData != CoreObjects</title><content type='html'>&lt;p&gt;I ran into a situation last week that had me (proverbially) tearing my hair out.&lt;/p&gt;&lt;p&gt;I was trying to call methods on an object stored using &lt;a href="http://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/CoreData/cdProgrammingGuide.html"&gt;CoreData&lt;/a&gt; and the calls just &lt;em&gt;WOULD NOT WORK.&lt;/em&gt; The reason was not obvious, and even the simplest method dispatch was failing: the debugger showed the objects as being of type &lt;code&gt;NSManagedObject&lt;/code&gt; rather than the expected &lt;code&gt;GTGItemLocationList&lt;/code&gt;, even after casting the local var as a &lt;code&gt;(GTGItemLocationList *)&lt;/code&gt;.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh5.ggpht.com/_uhpaSaKsmiM/TZdpYnuDu1I/AAAAAAAAAIY/Io-qIwlwycY/managed_object.jpg?imgmax=800"&gt;&lt;img style="display: block; margin-left: auto; margin-right: auto;" src="http://lh6.ggpht.com/_uhpaSaKsmiM/TZp-J8Co1rI/AAAAAAAAAIo/bQWfRN_5E70/managed_object.jpg?imgmax=800" border="0" alt="Managed object" width="300" height="181" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;p&gt;Rather than continue, I decided to spend some time reading &lt;a href="http://www.amazon.com/gp/product/1430233559"&gt;Pro Core Data for iOS&lt;/a&gt; since, in any case, I felt that my understanding of Core Data was suboptimal. &lt;em&gt;Pro Core Data&lt;/em&gt; is an excellent resource and greatly improved  my understanding of the Core Data framework. I decided to stop trying to treat my &lt;em&gt;GTGItemLocationList&lt;/em&gt; as an instance of the &lt;em&gt;GTGItemLocationList &lt;/em&gt; class, and instead use the NSMutableArray pattern described in the book.&lt;/p&gt;&lt;p&gt;I  went back and coded up what I needed in about an hour.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;Everything has been going well since then, and I thought I'd write this up as a cautionary tale about CoreData not providing objects in the manner expected. However, after digging a bit more, I found that this wasn't quite accurate. After all, the method which was throwing this exception was itself a method on an object stored by CoreData, and the dispatch was to the expected class/method combination. The pattern of accessing the object was different: rather than directly through an accessor, it was via an &lt;code&gt;objectEnumerator&lt;/code&gt; on &lt;code&gt;NSArray&lt;/code&gt;. The object returned using this technique was a &lt;code&gt;GTGItem&lt;/code&gt; manipulatable via conventional OO techniques as expected (and appreciated).&lt;br /&gt;&lt;br /&gt;&lt;a href="http://lh4.ggpht.com/_uhpaSaKsmiM/TZdqoQaL15I/AAAAAAAAAIg/hp0E9tiUfGA/working_item.png?imgmax=800"&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh3.ggpht.com/_uhpaSaKsmiM/TZp-Ku8SGsI/AAAAAAAAAIw/spuadAnwZFw/working_item.jpg?imgmax=800" alt="Working item" border="0" width="300" height="100" /&gt;&lt;/a&gt;&lt;p&gt;I will admit that I haven't determined what's going on here. I think that the moral of the story is that in exchange for CoreData's simplicity and ease of use,* you might find yourself exploring a corner case that exhibits unexpected behavior. In retrospect, the &lt;code&gt;GTGItemLocationList&lt;/code&gt; is completely superfluous. Eliminating this class would improve functionality and move me back into the CoreData's core use cases, reinforcing one of my basic iOS heuristics: &lt;em&gt;If it is very hard to do, your first reaction should be "what built in function am I ignoring that would make this easy?".&lt;/em&gt;&lt;/p&gt;&lt;p&gt;For completeness -- I'm not the only one who has seen this problem, see &lt;a href="http://stackoverflow.com/questions/1576241/core-data-returns-nsmanagedobject-instead-of-concrete-class-but-only-when-using"&gt;here&lt;/a&gt;, and &lt;a href="http://stackoverflow.com/questions/3752471/nsfetchedresultscontroller-always-returns-nsmanagedobject-objects-instead-of-cust"&gt;here&lt;/a&gt; -- none of the suggestions cured my problem&lt;/p&gt;&lt;p&gt;_____________________________________&lt;/p&gt;&lt;p&gt; *I mean this sincerely: the Pro Core Data highlighted the &lt;strong&gt;built in!&lt;/strong&gt; CoreData data migration capabilities. Given how much time the industry spends planning/managing/performing data migration projects, any capability in this space is welcome. &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-8967522623136813606?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/8967522623136813606/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=8967522623136813606' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8967522623136813606'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8967522623136813606'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2011/04/coredata-coreobjects.html' title='CoreData != CoreObjects'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/_uhpaSaKsmiM/TZp-J8Co1rI/AAAAAAAAAIo/bQWfRN_5E70/s72-c/managed_object.jpg?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-7315478406442714815</id><published>2011-02-04T13:14:00.001-08:00</published><updated>2011-02-04T13:14:32.014-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software'/><title type='text'>python, Mac,  MySQLdb "wrong architecture"</title><content type='html'>&lt;p&gt;I thought I'd post this tip since I had a few false starts following other pointers I found on the web -- I'd like to save others the wasted effort.&lt;/p&gt;&lt;p&gt;The error (&lt;em&gt;MySQLdb "wrong architecture"&lt;/em&gt;) IS because of a mismatch between the version of python and the version of the OS that you're running.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;If you are trying to import MySQLdb into Python and getting a "wrong architecture" error message, try the following:&lt;br /&gt;&lt;br /&gt;&lt;p&gt;First, determine the version of your OS by typing&lt;/p&gt;&lt;p&gt;&lt;code&gt; uname -a &lt;/code&gt;&lt;/p&gt;&lt;p&gt;into a shell (Thanks &lt;a href="http://osxdaily.com/2009/09/07/how-to-tell-if-youre-running-the-32-bit-or-64-bit-kernel-in-mac-os-x-snow-leopard/"&gt;OSXDaily&lt;/a&gt;)&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;Then install a current version of Python that matches this architecture from &lt;a href="http://www.python.org/download/releases/"&gt;python.org&lt;/a&gt;, reinstall MySQLdb and see if it works.&lt;br /&gt;&lt;br /&gt;This is all that it took for me to be up and running, and is (relatively) painless.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Note:&lt;/strong&gt; be sure that your scripts point to the newly installed version at &lt;code&gt;/usr/local/bin/python&lt;/code&gt;, rather than the system version at &lt;code&gt;/usr/bin/python&lt;/code&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-7315478406442714815?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/7315478406442714815/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=7315478406442714815' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/7315478406442714815'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/7315478406442714815'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2011/02/python-mac-mysqldb-architecture.html' title='python, Mac,  MySQLdb &amp;quot;wrong architecture&amp;quot;'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-5845013304120796050</id><published>2010-12-20T07:11:00.001-08:00</published><updated>2010-12-20T07:11:01.612-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software'/><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><title type='text'>Adding Structure to Data</title><content type='html'>&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times;"&gt;This post demonstrates how to add further structure to data after the initial items have been (uniquely) identified and committed to your persistent store. The core idea here is that once you have items uniquely identified, you can overlay a structure (or any number of structures) upon them as desired.&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times;"&gt;These structural overlays can also be made to interact as much (or as little) as necessary to address the question currently under consideration.&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times;"&gt;For example, given four cell lines (this is taken from a &lt;a href="http://www.criver.com/SiteCollectionDocuments/ds_d_oncology_research_services.pdf"&gt;brochure&lt;/a&gt; from the Charles River labs web site):&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times;"&gt; &lt;/p&gt;&lt;table border="1"&gt;&lt;tr&gt;&lt;th&gt;Cell Line&lt;/th&gt;&lt;th&gt;Species&lt;/th&gt;&lt;th&gt;Organ&lt;/th&gt;&lt;th&gt;ID&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; SW780 &lt;/td&gt;&lt;td&gt;Human &lt;/td&gt;&lt;td&gt; Bladder &lt;/td&gt;&lt;td&gt;CL-1&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Hep3B &lt;/td&gt;&lt;td&gt;Human &lt;/td&gt;&lt;td&gt; Liver &lt;/td&gt;&lt;td&gt;CL-2&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; B16 &lt;/td&gt;&lt;td&gt; Mouse &lt;/td&gt;&lt;td&gt; Skin &lt;/td&gt;&lt;td&gt;CL-3&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; Madison109 &lt;/td&gt;&lt;td&gt; Murine &lt;/td&gt;&lt;td&gt; Lung &lt;/td&gt;&lt;td&gt;CL-4&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times;"&gt;&lt;br /&gt;(note: not all relationships need to be specifically listed in the parent table)&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times;"&gt;This gives us the following structure:&lt;/p&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh5.ggpht.com/_uhpaSaKsmiM/TQ9x-doo-JI/AAAAAAAAAHw/SHKNB00NP00/Initial_cell_lines.png?imgmax=800" alt="Initial_cell_lines.png" border="0" width="300" height="43" /&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt;Upon which we can overlay a set of relationship showing the source organ&lt;br /&gt; &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh6.ggpht.com/_uhpaSaKsmiM/TQ9x-xNTM8I/AAAAAAAAAH0/EFPFuDVzamg/Graph_organ.png?imgmax=800" alt="Graph_organ.png" border="0" width="300" height="107" /&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt;Or source species &lt;/p&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh3.ggpht.com/_uhpaSaKsmiM/TQ9x_NW3fYI/AAAAAAAAAH4/zYO07YfqIFs/Graph_species_partial.png?imgmax=800" alt="Graph_species_partial.png" border="0" width="300" height="116" /&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt;Now, lets say we add a new cell line &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;table border="1"&gt;&lt;tr&gt;&lt;th&gt;Cell Line&lt;/th&gt;&lt;th&gt;Species&lt;/th&gt;&lt;th&gt;Organ&lt;/th&gt;&lt;th&gt;ID&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; SW780-1 &lt;/td&gt;&lt;td&gt;Human &lt;/td&gt;&lt;td&gt; Bladder &lt;/td&gt;&lt;td&gt;CL-5&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt;Giving us&lt;/p&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh3.ggpht.com/_uhpaSaKsmiM/TQ9x_u5RkoI/AAAAAAAAAH8/6c3DxZwz4YM/Graph_Added_cell.png?imgmax=800" alt="Graph_Added_cell.png" border="0" width="300" height="96" /&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt;We may later realize that CL-5 was derived from CL-1 and just use a separate parent child relationship table to store the information &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;table border="1"&gt;&lt;tr&gt;&lt;th&gt; Parent &lt;/th&gt;&lt;th&gt; Child &lt;/th&gt;&lt;th&gt; Relationship &lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; CL-1 &lt;/td&gt;&lt;td&gt;CL-5 &lt;/td&gt;&lt;td&gt; 	"derived" &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt;(note: "Root" cell lines are those that do not appear in the Child column or do not appear in this table at all) &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh3.ggpht.com/_uhpaSaKsmiM/TQ9yAJm7XsI/AAAAAAAAAIA/ws-A7TP8dug/Graph_Added_relationship.png?imgmax=800" alt="Graph_Added_relationship.png" border="0" width="300" height="96" /&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt;This sort of thing can be generally extended and need not be a strict tree: &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;table border="1"&gt;&lt;tr&gt;&lt;th&gt; Parent &lt;/th&gt;&lt;th&gt; Parent Table&lt;/th&gt;&lt;th&gt; Child &lt;/th&gt;&lt;th&gt; Child Table&lt;/th&gt;&lt;th&gt; Relationship &lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; CL-1 &lt;/td&gt;&lt;td&gt; Cell_Line &lt;/td&gt;&lt;td&gt;CL-5 &lt;/td&gt;&lt;td&gt; Cell_Line &lt;/td&gt;&lt;td&gt; fusion-parent &lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; CL-2 &lt;/td&gt;&lt;td&gt; Cell_Line &lt;/td&gt;&lt;td&gt;CL-5 &lt;/td&gt;&lt;td&gt; Cell_Line &lt;/td&gt;&lt;td&gt; fusion-parent &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt;Obviously the richer the relationship, the more likely you are to move to a table specifically designed to capture that information. &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;table border="1"&gt;&lt;tr&gt;&lt;th&gt; Mixture Component &lt;/th&gt;&lt;th&gt; Mixture Component Table&lt;/th&gt;&lt;th&gt; Mixture &lt;/th&gt;&lt;th&gt; Mixture Table&lt;/th&gt;&lt;th&gt; Amt &lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; C-1 &lt;/td&gt;&lt;td&gt; Compound &lt;/td&gt;&lt;td&gt;C-9 &lt;/td&gt;&lt;td&gt; Compound &lt;/td&gt;&lt;td&gt; 0.1&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; C-1 &lt;/td&gt;&lt;td&gt; Compound &lt;/td&gt;&lt;td&gt;C-9 &lt;/td&gt;&lt;td&gt; Compound &lt;/td&gt;&lt;td&gt; 0.1&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt; R-2 &lt;/td&gt;&lt;td&gt; Reagent &lt;/td&gt;&lt;td&gt;C-9 &lt;/td&gt;&lt;td&gt; Compound &lt;/td&gt;&lt;td&gt; 0.1 &lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt;As these structures build up it is easy to then interrogate the information about our available cells. &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt;&lt;strong&gt;Query:&lt;/strong&gt; What mammalian cell lines do we have? &lt;strong&gt;Procedure:&lt;/strong&gt; Traverse from the mammalian node and collect all cell line instances&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh4.ggpht.com/_uhpaSaKsmiM/TQ9yAacLpVI/AAAAAAAAAIE/TOGzQ_Y01jM/Graph_species.png?imgmax=800" alt="Graph_species.png" border="0" width="300" height="248" /&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt;&lt;strong&gt;Query:&lt;/strong&gt; What cell lines are derived from C-1? &lt;strong&gt;Procedure:&lt;/strong&gt; Find cell lines derived from C-1, find cell-lines derived from them (recursively), collect all cell line instances.&lt;/p&gt;&lt;img style="display:block; margin-left:auto; margin-right:auto;" src="http://lh4.ggpht.com/_uhpaSaKsmiM/TQ9yA6e5MSI/AAAAAAAAAII/XlPSdkFhMwc/Graph_derivation_only.png?imgmax=800" alt="Graph_derivation_only.png" border="0" width="125" height="150" /&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt;The overall pattern is pretty straightforward and is can be processed with standard &lt;a href="http://en.wikipedia.org/wiki/Category:Graph_algorithms"&gt;graph algorithms&lt;/a&gt;&lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt; &lt;/p&gt;&lt;p style="margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Times; min-height: 14.0px;"&gt;See also: &lt;a href="http://rdfsg.blogspot.com/2010/05/considerations-in-developing-middle.html"&gt;Considerations in developing a middle distance ontology&lt;/a&gt;&lt;/p&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-5845013304120796050?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/5845013304120796050/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=5845013304120796050' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/5845013304120796050'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/5845013304120796050'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2010/12/adding-structure-to-data.html' title='Adding Structure to Data'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/_uhpaSaKsmiM/TQ9x-doo-JI/AAAAAAAAAHw/SHKNB00NP00/s72-c/Initial_cell_lines.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-1745096666385216522</id><published>2010-11-08T19:32:00.001-08:00</published><updated>2010-11-08T19:32:06.132-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='debugging'/><title type='text'>Knowing Where Your Bytes Are</title><content type='html'>&lt;p&gt;I thought  that I'd post a pointer to &lt;a href="http://groups.google.com/group/mongodb-user/browse_thread/thread/528a94f287e9d77e?pli=1"&gt;this analysis of the foursquare/mongodb outage&lt;/a&gt; (for completeness, &lt;a href="http://blog.foursquare.com/2010/10/05/so-that-was-a-bummer/"&gt;here is foursquare's description of their outage&lt;/a&gt;). I'm not a foursquare user, so I didn't know anything about this outage until I was catching up on my RSS reading last week, but I think the lessons are broadly applicable.&lt;/p&gt;&lt;p&gt;The key point being that even on  a modern/cutting-edge platform, &lt;em&gt;&lt;span style="font-style: normal;"&gt;low level details of how your data (and processing) maps onto the underlying hardware can be the source of major headaches. &lt;/span&gt;&lt;/em&gt;&lt;em&gt;&lt;span style="font-style: normal;"&gt;&lt;/span&gt;&lt;/em&gt;&lt;/p&gt;&lt;p&gt;&lt;em&gt;&lt;span style="font-style: normal;"&gt;In general, the number of possible details that could bring a system to its knees is too large to keep in one's head at all times. In practice, what is required is the ability to track down why the system isn't responding to your changes as expected. In the FourSquare  case the question became: "why wasn't the RAM usage decreased when 5% of the data was moved to another shard?"&lt;/span&gt;&lt;/em&gt;&lt;/p&gt;&lt;p&gt;&lt;em&gt;&lt;span style="font-style: normal;"&gt;The solution(s) are always obvious in retrospect, and the &lt;em&gt;&lt;span style="font-style: normal;"&gt;FourSquare﻿&lt;/span&gt;&lt;/em&gt; mongodb team appear to have determined the root cause with admirable speed. In my mind the key to this speed was their ability to quickly get to the question&lt;/span&gt; "why didn't my RAM usage decrease"&lt;span style="font-style: normal;"&gt; from the question&lt;/span&gt; "why didn't my system speed up when I added another shard."&lt;span style="font-style: normal;"&gt;&lt;/span&gt;&lt;/em&gt;&lt;/p&gt;&lt;p&gt;&lt;em&gt;&lt;span style="font-style: normal;"&gt;At some level this goes back to my post last year on the &lt;a href="http://rdfsg.blogspot.com/2009/03/osx-performance-analysis-instruments.html"&gt;Instruments performance analysis tool&lt;/a&gt;. A good substrate of these tools is critical:  if you don't have a performance meter that's pegged, in this case the swapping/paging meter, getting a handle on what's going on is difficult, to say the least.&lt;/span&gt;&lt;/em&gt;&lt;/p&gt;&lt;p&gt;&lt;em&gt;&lt;span style="font-style: normal;"&gt; I'm glad I didn't personally experience that particular debugging session.&lt;/span&gt;&lt;/em&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-1745096666385216522?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/1745096666385216522/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=1745096666385216522' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/1745096666385216522'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/1745096666385216522'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2010/11/knowing-where-your-bytes-are.html' title='Knowing Where Your Bytes Are'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-4252791965536878923</id><published>2010-08-30T17:44:00.001-07:00</published><updated>2010-08-30T17:44:16.441-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ontologies'/><title type='text'>Middle Distance Ontologies -- an Intermediate Summary</title><content type='html'>I've posted a number of design exercises using middle distance ontologies including:&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;&lt;a href="http://rdfsg.blogspot.com/2010/04/ontologies-in-practice.html"&gt;Motivation&lt;/a&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;&lt;a href="http://rdfsg.blogspot.com/2010/05/considerations-in-developing-middle.html"&gt;Design considerations&lt;/a&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;&lt;a href="http://rdfsg.blogspot.com/2010/05/necessary-attributes-and-opaque.html"&gt;A detailed consideration of Necessary Attributes and Opaque Identifiers&lt;/a&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;&lt;a href="http://rdfsg.blogspot.com/2010/06/middle-distance-ontologies-antibodies.html"&gt;A design for storing antibody information&lt;br /&gt;&lt;/a&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;&lt;a href="http://rdfsg.blogspot.com/2010/07/middle-distance-ontologies-assays.html"&gt;A design for storing assay information&lt;br /&gt;&lt;/a&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;br /&gt;&lt;br /&gt;At this point I think it's worthwhile to look back and see what, if anything, is different and useful about the middle distance approach.&lt;br /&gt;&lt;br /&gt;What struck me most as I was thinking through the examples was how similar the process was to object oriented class design. Using opaque identifiers allows moving a clump of stuff (attributes, identifiers, functionality, behavior)  to another class so that the objects in the system correspond to objects in the world. They can then be analyzed as a coherent whole (aka modularization). Pragmatically, such partitioning allows developers to become intimately familiar with how the stuff within that partition operates and evolves over time, permitting them to develop a high level of fluency in the domain.&lt;br /&gt;&lt;br /&gt;A key component of this approach proved to be the use of the criterion &lt;strong&gt;what could change&lt;/strong&gt; to drive the splitting off of the chunks of stuff. As such, it is an extension to/refinement of existing design techniques rather than being a replacement for them.&lt;br /&gt;&lt;br /&gt;So, how does middle distance thinking extend current techniques?&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Extension to DB design:&lt;/em&gt; A focus on "objects in the world," rather than just cardinality, results in a more fine grained partition of the problem space. All things which are one-to-many or many-to-one are necessarily distinct and have different potential rates of change, but sometimes things which are one-to-one are also distinct, and should be separated to accommodate future change.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Refinement of OO design:&lt;/em&gt; middle distance ontologies are problem space oriented rather than software oriented. As such, it is less concerned with factoring out the appropriate superclasses than OO designs. This is because software design criteria that push such refactorings are "within the system" rather than "in the world." There is nothing in the middle distance approach that pushes common functionality up to a common implementation.&lt;br /&gt;&lt;br /&gt;One other aspect of the middle distance approach is that it allows you to pull attributes out of the design and hide them behind the opaque identifiers. This modularization allows you to change the definition of such things as &lt;em&gt;validity&lt;/em&gt; as your knowledge of what constitutes &lt;em&gt;validity&lt;/em&gt; in your problem domain changes.&lt;br /&gt;&lt;br /&gt;In summary, I think the middle distance approach is useful as a design principle, but is not a distinct design technique per se. Assessing its utility in practice must await an opportunity to apply it to a real world problem.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-4252791965536878923?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/4252791965536878923/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=4252791965536878923' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/4252791965536878923'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/4252791965536878923'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2010/08/middle-distance-ontologies-intermediate.html' title='Middle Distance Ontologies -- an Intermediate Summary'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-116235422192942978</id><published>2010-08-02T13:49:00.001-07:00</published><updated>2010-08-02T13:49:49.950-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><title type='text'>ArchiMate/OpenGroup</title><content type='html'>Andy Siegel of &lt;a href="http://www.genzyme.com/"&gt;Genzyme&lt;/a&gt;, Hemant Virkar of &lt;a href="http://www.digitalinfuzion.com/"&gt;Digital Infuzion&lt;/a&gt;, and I gave a talk, &lt;a href="http://rdfsg.com/papers/OpenGroup_Boston2010_07.pdf"&gt;&lt;em&gt;ArchiMate as a Communication Tool in Launching an EA Effort&lt;/em&gt;&lt;/a&gt;, at the Open Group's Boston Conference. It discussed our experiences using &lt;a href="http://www.archimate.org/"&gt;ArchiMate&lt;/a&gt; as a communication tool for rolling out an EA Effort at Genzyme.&lt;br /&gt;&lt;br /&gt;As I've &lt;a href="http://rdfsg.blogspot.com/2008/12/discussing-architecture.html"&gt;mentioned before&lt;/a&gt;, I have found ArchiMate to be a practical, useful, framework. This presentation delineates some of the reasons why I came to that conclusion.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-116235422192942978?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/116235422192942978/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=116235422192942978' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/116235422192942978'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/116235422192942978'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2010/08/archimateopengroup.html' title='ArchiMate/OpenGroup'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-7383066002840890915</id><published>2010-07-19T17:32:00.001-07:00</published><updated>2010-07-19T17:32:44.678-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='scientific'/><category scheme='http://www.blogger.com/atom/ns#' term='ontologies'/><title type='text'>Middle Distance Ontologies: assays</title><content type='html'>This analysis of assays is the companion to my previous post on &lt;a href="http://rdfsg.blogspot.com/2010/06/middle-distance-ontologies-antibodies.html"&gt;antibodies&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;A core bifurcation within assays is between &lt;em&gt;in-vivo&lt;/em&gt; and &lt;em&gt;in-vitro&lt;/em&gt; assays. I'm entirely ignoring clinical trials etc. since they are a completely different conceptual space.&lt;br /&gt;&lt;br /&gt;The main differences between &lt;em&gt;in-vivo&lt;/em&gt; and &lt;em&gt;in-vitro&lt;/em&gt; assays is that the measurement is more indirect/variable and the delta between planned and actual measurements is much greater in the &lt;em&gt;in-vivo&lt;/em&gt; assays.&lt;br /&gt;&lt;br /&gt;In both we have&lt;UL&gt;&lt;LI&gt;the system under test&lt;/LI&gt;&lt;LI&gt;the test response(s) being measured&lt;/LI&gt;&lt;LI&gt; the measurement event (with a potential planned vs. actual component to each)&lt;/LI&gt; &lt;LI&gt;the entity whose  impact is being assessed&lt;/LI&gt;&lt;LI&gt;the way this entity was introduced into the system (most important for &lt;em&gt;in-vivo&lt;/em&gt; assays)&lt;/LI&gt;&lt;/UL&gt;&lt;br /&gt;&lt;br /&gt;The &lt;strong&gt;system under test&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;This captures the technology maintaining the experimental conditions, the SOP, the "target", and the readout. &lt;br /&gt;It would therefore seem useful to use four opaque identifiers here: &lt;UL&gt;&lt;LI&gt;one for the SOP&lt;/LI&gt;&lt;LI&gt;one for the particular technology or system being used for the measurement, e.g., the animal&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;one for the measurement device&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;one for the receptor/disease&lt;/LI&gt; &lt;/UL&gt;&lt;br /&gt;&lt;br /&gt;The &lt;strong&gt;test response(s) being measured&lt;/strong&gt;.&lt;br /&gt;&lt;br /&gt;In normal practice, response types are behind opaque identifiers, e.g., %INH, with an additional qualifier as to the response units. Middle distance thinking does nothing to change this. When it comes to derived data (see also my &lt;a href="http://rdfsg.blogspot.com/2010/05/necessary-attributes-and-opaque.html"&gt;post on necessary attributes&lt;/a&gt;),  there are two options which I call&lt;br /&gt;&lt;UL&gt; &lt;LI&gt;"resulted in" (this value "resulted in" this derived result) design.&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;"resulted from" (this value "resulted from" an operation on these results) design in which the transformation that calculated the value is designated by an opaque identifier.&lt;/LI&gt; &lt;/UL&gt;&lt;br /&gt;&lt;br /&gt;I've seen a number of systems work well in which a more basic value points to a result derived from it -- a "resulted in" design. However, my preference is for the "resulted from" design as it allows the transformation to be more open about the algorithms used and the data points which served as sources of the value. This design allows the result to point back to its source data points (via the opaque identifier), rather than forcing the source data points to designate the derived result. It also permits a many-to-many relationship rather than the many-to-one coerced by the "resulted in" design, albeit with an attendant increase in complexity.&lt;br /&gt;&lt;br /&gt;The &lt;strong&gt;measurement event&lt;/strong&gt;.&lt;br /&gt;&lt;br /&gt;(The measurement event may include an indication that the actual measurement event deviated in some way from the planned measurement event.)&lt;br /&gt;&lt;br /&gt;This one is surprisingly different when viewed from a middle distance perspective. As opposed to the techniques which I'm familiar with from  either conventional transactional systems or warehousing efforts, the middle distance approach suggests two factors:&lt;br /&gt;&lt;UL&gt; &lt;LI&gt;hiding the details of the measurement behind an opaque identifier (including equipment operator, time of measurement, deviation from plan)&lt;/LI&gt; &lt;LI&gt;surfacing a flag (again an opaque identifier) to indicate if there were any problems of significance with this measurement.&lt;/LI&gt; &lt;/UL&gt;This delegates the determination of error significance (and its type) to processes more familiar with the unique characteristics of the measurement.&lt;br /&gt;&lt;br /&gt;The &lt;strong&gt;entity whose impact is being assessed&lt;/strong&gt;.&lt;br /&gt;&lt;br /&gt;Normal practice maps these to opaque identifiers that tie back to sample lots, be they compounds, mixtures, formulations, or natural products.&lt;br /&gt;&lt;br /&gt;The &lt;strong&gt;way in which this entity was introduced into the system&lt;/strong&gt;. &lt;br /&gt;&lt;br /&gt;In some systems this may be covered by the SOP for the response being measured. However in more complex (&lt;em&gt;in-vivo&lt;/em&gt;) systems it is worthwhile to explicitly call this out, since it is easy to imagine the same SOP being performed either with multiple injections or an implantable device.&lt;br /&gt;&lt;br /&gt;In summary it appears that "almost all" of the detail is hidden behind opaque identifiers.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-7383066002840890915?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/7383066002840890915/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=7383066002840890915' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/7383066002840890915'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/7383066002840890915'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2010/07/middle-distance-ontologies-assays.html' title='Middle Distance Ontologies: assays'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-805080031521468295</id><published>2010-06-14T16:51:00.001-07:00</published><updated>2010-06-14T16:51:37.508-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ontologies'/><title type='text'>Middle Distance Ontologies: antibodies</title><content type='html'>I want to push on the whole concept of "Middle Distance Ontology" a bit harder and see how it plays out -- my current plan is to concentrate on the discovery space with two entities: Assay Results and Antibodies.&lt;br /&gt;&lt;br /&gt;I'll cover antibodies in this post, assay results in the next.&lt;br /&gt;&lt;br /&gt;Now, there are many different perspectives from which to view antibodies, to name a few:&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;As a biologist investigating antibody action&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;As a vendor producing antibodies to meet a specification&lt;br /&gt;&lt;/LI&gt;&lt;LI&gt;As a pharmaceutical company procuring antibodies from a vendor to use in an assay&lt;/LI&gt;&lt;/UL&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;For this exercise, I'll take the perspective of a pharmaceutical company storing/analyzing assay results, since it is the viewpoint I understand best. Antibodies are something that I'm not intimately familiar with, so my approach will be to generate a list of attributes and then evaluate them for inclusion/exclusion/opaque identification.&lt;br /&gt;&lt;br /&gt;Here are some antibody attributes I came up with. Many were taken from an interesting &lt;a href="http://www.piercenet.com/files/TR0059-Choose-second-Ab.pdf"&gt;white paper&lt;/a&gt; from Pierce Biotechnology/Thermo Scientific:&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;&lt;em&gt;Basic Attributes&lt;/em&gt;: Primary/Secondary; Monoclonal/Polyclonal; Antigen; Vendor Location; Batch&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;&lt;em&gt;IgG Fragments&lt;/em&gt;: IgG Whole Molecule; Gamma Chain of IgG; Fc Fragment of IgG;  F(ab ́)2 Fragment of IgG &lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;&lt;em&gt;IgM Fragments:&lt;/em&gt; IgM Whole Molecule; &lt;br /&gt;Fc5μ Fragment of IgM&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;&lt;em&gt;Mu Chain of IgM&lt;/em&gt;&lt;br /&gt;Light Chains of Immunoglobulins&lt;/LI&gt;&lt;/UL&gt;&lt;br /&gt;&lt;br /&gt;We certainly care about the primary/secondary antibody distinction. This captures both the fact that the secondary antibody was used (aka the primary antibody does not have a fluorescent tag (or equivalent)) and the characteristics of this secondary antibody if it appears. Interestingly a quick search was able to discover a &lt;a href="http://info.med.yale.edu/genetics/ward/tavi/fi08.html"&gt;reference to tertiary antibodies&lt;/a&gt;, so the principles outlined in the &lt;a href="http://rdfsg.blogspot.com/2010/05/considerations-in-developing-middle.html"&gt;scenario analysis section&lt;/a&gt; calls for us to provide for these in the design of the core ontology, even if they are unlikely to be used.&lt;br /&gt;&lt;br /&gt;In our situation, the factor that couples the primary and secondary (and tertiary) antibodies  is their co-occurrence in the assay. &lt;em&gt;A priori,&lt;/em&gt; there is nothing that requires there to be anything other than the primary antibody, nor is there any necessity for the antibodies to be able to bind to each other (after all mistakes happen). We might want to say that the primary, secondary and tertiary should (must) be able to bind to each other. However it would be inappropriate to include this as part of this core ontology since we are trying to capture the run of an assay and a run may be erroneous.&lt;br /&gt;&lt;br /&gt;Therefore for each experimental run we will have a primary antibody and perhaps one secondary+ antibodies. In addition, we might multiplex the experiment and run more than one antibody set per "container" per experiment. A quick search for multiple antibody assay made me think that "multiplex antibody" was the appropriate search term, which results in &gt; 2,000 hits, which indicates that it is possible.&lt;br /&gt;&lt;br /&gt;Antigen is something that we (likely) provide or at minimum specify to the vendor. Although it should consist of a unique sequence, understanding its meaning and role within the overall program would require the ability to support an arbitrary level of complexity. This clearly calls for an opaque identifier.&lt;br /&gt;&lt;br /&gt;At the vendor level there we will need to track some vendor identifier (opaque), shipment information (again opaque) and some sort of vendor lot/group identifier (opaque). The scenario that we we wish to be able to support is one in which the vendor ships multiple lots per shipment or spreads one lot across multiple shipments. Tracking the most fine grained vendor location as the opaque identifier at this level protects you from mergers/divestitures and new location startup issues, all of which would potentially be permanently hidden by the use of a larger grained identifier (just think if you only had ONE identifier for all of Thermo!).&lt;br /&gt;&lt;br /&gt;When it comes to the more specific characteristics of the antibodies e.g., fragments and chains, they are not attributes that present distinctions which are important to the analysis of the results from the perspective of a pharmaceutical company.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Summary&lt;/strong&gt;&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;Opaque identifiers&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;Vendor facility &lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Vendor lot/shipment&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;antigen&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;br /&gt;&lt;br /&gt;&lt;LI&gt;Primary identifiers&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;quantity&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;monoclonal/polyclonal &lt;br /&gt;&lt;/LI&gt;&lt;LI&gt;antibody&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;br /&gt;&lt;br /&gt;&lt;LI&gt;Elided Completely (stored seperately)&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;antigen/antibody hierarchy&lt;/LI&gt;&lt;LI&gt;antibody fragments and chains&lt;/LI&gt;&lt;/UL&gt;&lt;br /&gt;&lt;/LI&gt; &lt;/UL&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt; &lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-805080031521468295?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/805080031521468295/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=805080031521468295' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/805080031521468295'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/805080031521468295'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2010/06/middle-distance-ontologies-antibodies.html' title='Middle Distance Ontologies: antibodies'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-5482432333329930063</id><published>2010-05-26T18:16:00.001-07:00</published><updated>2010-05-26T18:16:33.791-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ontologies'/><title type='text'>Necessary Attributes and Opaque Identifiers</title><content type='html'>At first blush, the use of &lt;em&gt;necessary attributes&lt;/em&gt; and &lt;em&gt;opaque identifiers&lt;/em&gt; looks a lot like database normalization, with necessary attributes being the columns and opaque identifiers being the foreign keys. I will admit that there are some similarities, although I maintain there are also some strong differences.&lt;br /&gt;&lt;br /&gt;The most obvious difference is that many of the attributes (columns) that you would put in a database table would not appear in the ontology. In particular, you would avoid those attributes that are dependent upon specific business processes or technology. This includes most everything that relates to a hierarchy and any inter-object relationships that are only germane to one particular scenario. &lt;br /&gt;&lt;br /&gt;Nullable columns are a mixed bag: The immediate reaction might be that if an attribute can be NULL, &lt;em&gt;a priori,&lt;/em&gt; it would not be ontologically necessary. However, pure ontological necessity can be at cross purposes with the goal of stopping our analysis at a &lt;em&gt;Middle Distance&lt;/em&gt;. A good example of something that can be NULL which would be included is the "derived from" relation. If we stop our analysis at the instrument (which is likely), some results, having been loaded from an instrument will have no source (derived from) results. I find no compelling reason to eliminate the "derived from" attribute, since independent of business processes, most results will derive from a combination of/analysis of other results and an indication of this is necessary for determining dependencies etc, it is just that some results will have no antecedent.  &lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Note:&lt;/strong&gt; It is open at the moment if the "results" of an average, should be treated as a single result or a set of results of different types, all of which are created by the same operation. In any case, given that our analysis truncates at the instrument, even if we considered the instrument a transformation that created the result(s), there would be no result(s) that served as inputs for the transformation. &lt;br /&gt;&lt;br /&gt;By the same token, some attributes that we might think of as always being present may be elided from the model since they are hidden behind opaque identifiers.&lt;br /&gt;A good example here would be the use of geoposition rather than address. Using a &lt;em&gt;Middle Distance&lt;/em&gt; approach we would structure the location of an address as an identity preserving opaque identifier (geoposition) rather than  as a set of columns containing foreign keys that reference other items (in other tables in a relational model) which hold the street/city/state/country values. This opaque identifier allows us to ignore all political boundaries, variations in street names etc. when designating our location. If required, these values can be derived on a "just in time" basis.&lt;br /&gt;&lt;br /&gt;This is the core &lt;em&gt;Middle Distance&lt;/em&gt; ontological question: &lt;br /&gt;&lt;blockquote&gt;Given the way we use an entity, can the entity exist without having a value for the attribute under consideration?&lt;/blockquote&gt; If the answer is &lt;strong&gt;no&lt;/strong&gt; then some flavor of the attribute must be brought forward and attached to the entity. The question of whether or not to represent this attribute as an opaque identifier has to do both with the complexity of the attribute and its variation in practice. I think that &lt;em&gt;experimental conditions&lt;/em&gt; are another paradigmatic example of something that should be hidden behind an opaque identifier since the level of detail that is important (e.g. include the instrument, SOP, lab location? etc.) changes depending upon the particular type of experiment conducted and the variability of these conditions within the organization -- any and all of which might change over time.&lt;br /&gt;&lt;br /&gt;However, the fact that an experiment will have SOME experimental conditions will be invariant.&lt;br /&gt;&lt;br /&gt;This hints at a general rule: if the information about a thing requires one or more ancillary tables/objects to represent it, it is best to wrap the information in an opaque identifier which is designed to be sufficient to disambiguate the  reference, but does not contain any detail. This identifier can be expanded out to a "report specific" level of detail on demand.&lt;br /&gt;&lt;br /&gt;My next post(s) will work through some examples.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-5482432333329930063?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/5482432333329930063/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=5482432333329930063' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/5482432333329930063'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/5482432333329930063'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2010/05/necessary-attributes-and-opaque.html' title='Necessary Attributes and Opaque Identifiers'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-438546462116119133</id><published>2010-05-10T17:17:00.001-07:00</published><updated>2010-05-10T17:17:54.455-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><category scheme='http://www.blogger.com/atom/ns#' term='ontologies'/><category scheme='http://www.blogger.com/atom/ns#' term='data integration'/><title type='text'>Considerations in developing a middle distance ontology</title><content type='html'>In my mind there are three essential considerations when developing  a &lt;em&gt;middle distance ontology&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;&lt;OL&gt;&lt;LI&gt;What are the entities under discussion?&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;What constitutes the necessary attributes of these entities?&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Should these attributes be hidden behind opaque identifiers or should they be an integral part of the entity under consideration? &lt;/LI&gt;&lt;br /&gt;&lt;/OL&gt;&lt;br /&gt;&lt;br /&gt;The first question "What entities are under discussion?" is the easiest to answer: These are the entities that you discuss when performing your activities. If something has never come up as a factor in your activities (and isn't obviously on the horizon) there is no need to consider it.&lt;br /&gt;&lt;br /&gt;Patients, trials, compounds, assays  etc. are both important and are definitely "ready to hand" in the  Heideggerian sense.&lt;br /&gt;&lt;br /&gt;The second and third questions "what count as explicit attributes" and "what are the modifiers captured by opaque identifiers" are more subtle and domain specific.&lt;br /&gt;&lt;br /&gt;This highlights a core point about the &lt;em&gt;middle distance ontology&lt;/em&gt; viewpoint: what's important is what matters to the activity that you are performing. If it doesn't impact what you are doing it should not be modeled in detail. Truncating the detail is what keeps the model's complexity under control.&lt;br /&gt;&lt;br /&gt;However, there is one caveat to this "what you know is all you need to know" approach: it is critical to evaluate the likely potential changes to your current situation. Doing this well requires an identification of the &lt;a href="http://www.amazon.com/Scenarios-Conversation-Kees-van-Heijden/dp/0470023686/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1272736222&amp;sr=8-1"&gt;scenarios&lt;/a&gt; that might impact your operation in the near future and thinking them through in some detail, using the scenarios to pressure test your decisions.&lt;br /&gt;&lt;br /&gt;Such a scenario analysis is needed since the ontology (obviously) constitutes a deep structural commitment and any changes at this level are usually both costly and painful.&lt;br /&gt;&lt;br /&gt;I would posit the following classifications of the potential changes:&lt;br /&gt;&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;&lt;strong&gt;Changes in the science:&lt;/strong&gt; These can be very unpredictable, but often there are precursors consisting of some new "interesting results" in an area. Although the exact resolution of the controversy may not be known, any outline of their structure can help highlight areas of necessary flexibility.&lt;/LI&gt;&lt;br /&gt;&lt;br /&gt;&lt;LI&gt;&lt;strong&gt;Changes in the environment:&lt;/strong&gt; (mergers etc.) do others in the field think of things similarly. If not, what are the most significant differences?&lt;/LI&gt;&lt;br /&gt;&lt;br /&gt;&lt;LI&gt;&lt;strong&gt;Changes in the business structure:&lt;/strong&gt; are there any "nearby" functions that would require support in the face of an internal restructuring?&lt;/LI&gt;&lt;br /&gt;&lt;br /&gt;&lt;LI&gt;&lt;strong&gt;Changes in the technology:&lt;/strong&gt; there are two parts to this: &lt;br /&gt;&lt;UL&gt;&lt;LI&gt;&lt;em&gt;Changes in the computer technology:&lt;/em&gt; most likely won't impact your ontology unless you're pushing systems to their limits (more and more unlikely in my experience).&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt; &lt;em&gt;Changes in the technology of the systems which you are analyzing: &lt;/em&gt;e.g., reactions now produce ten similar but not identical compounds rather than a single compound, suddenly photos become tagged with GPS information etc. Another hint is if you're starting to hear the words "high throughput" in a context in which you've never heard them before.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;br /&gt;&lt;br /&gt;&lt;/UL&gt;I will admit that a difficulty of doing this is that it spans all &lt;a href="http://rdfsg.blogspot.com/2009/08/flavors-of-architects-and-analysts.html"&gt;architectural disciplines&lt;/a&gt; from application to enterprise, but I don't see any way around it.&lt;br /&gt;&lt;br /&gt;My next post will focus on when to hide (attributes) behind an opaque identifier.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-438546462116119133?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/438546462116119133/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=438546462116119133' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/438546462116119133'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/438546462116119133'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2010/05/considerations-in-developing-middle.html' title='Considerations in developing a middle distance ontology'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-7127303210196569521</id><published>2010-04-26T16:56:00.001-07:00</published><updated>2010-04-26T16:56:06.644-07:00</updated><title type='text'>MacBookPro Sleeping Problem: Solved.</title><content type='html'>If you have a MacBookPro that's having trouble waking up from sleep correctly: spinning  beachball, infinitely long login check etc., you might want to try the fix suggested on &lt;a href="http://forums.macrumors.com/showthread.php?t=290091"&gt;macrumors&lt;/a&gt; by jmora71.&lt;br /&gt;&lt;br /&gt;It appears to apply primarily to systems with Solid State Disks and a fair amount of RAM (the author mentions 8G, I have 6G). &lt;br /&gt;&lt;br /&gt;Since performing the fix, I haven't had problems in over a week. I used to have issues daily.&lt;br /&gt;&lt;br /&gt;For various reasons I had thought the cause was &lt;a href="http://www.parallels.com/"&gt;Parallels&lt;/a&gt;, but I was happily mistaken.&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-7127303210196569521?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/7127303210196569521/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=7127303210196569521' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/7127303210196569521'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/7127303210196569521'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2010/04/macbookpro-sleeping-problem-solved.html' title='MacBookPro Sleeping Problem: Solved.'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-3793019115434021456</id><published>2010-04-19T16:50:00.001-07:00</published><updated>2010-04-19T16:50:44.476-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='semantic web'/><title type='text'>Ontologies in Practice</title><content type='html'>This is an extension of an old post &lt;a href="http://rdfsg.blogspot.com/2008/02/extensible-system-for-discovery-data.html"&gt;An Extensible System for Discovery Data&lt;/a&gt;. As I've been thinking more about what constitute the &lt;em&gt;Simplest Building Blocks&lt;/em&gt;, I've begun to realize that they designate something very close to an ontology in the "&lt;a href="http://www.amazon.com/Brian-Cantwell-Smith/e/B001HCZB22/ref=ntt_athr_dp_pel_1"&gt;middle distance&lt;/a&gt;." That is, it isn't about an ontology down to the fundamental constituents of matter, nor is it about specifying things in sufficient detail to adequately compare and track what's going on between organizations (see &lt;a href="http://ontology.buffalo.edu/07/POB/Smith.ppt"&gt;Barry Smith's presentation&lt;/a&gt; for a discussion of these issues), rather it is an ontology of the stuff we deal with in our day to day activities.&lt;br /&gt;&lt;br /&gt;For example, an experiment has &lt;br /&gt;&lt;br /&gt;&lt;UL&gt;&lt;br /&gt;&lt;LI&gt;A protocol consisting of:&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;a set of initial conditions&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;one or more intermediate steps&lt;/LI&gt;&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;each step may have a set of operators, equipment, constituents/ingredients etc.&lt;/LI&gt;&lt;/UL&gt;&lt;/UL&gt;&lt;/LI&gt; &lt;br /&gt;&lt;LI&gt;A result set consisting of one or more members, &lt;UL&gt;&lt;LI&gt;any of which might be invalid for one or more reasons&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;A set of analysis results&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;with methods&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;parameters&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;derived results (results that are based upon this result) &lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;supporting results (results upon which this result is based, such as calibration curves)&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;br /&gt;&lt;br /&gt;My basis for claiming that these are constituents of a "middle distance" ontology is twofold:&lt;br /&gt;&lt;OL&gt;&lt;LI&gt;Each component is ontologically necessary. That is, an experiment cannot exist without these components. &lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;When analyzing these constituents we do not need to go into further detail. We can hide that detail behind an &lt;a href="http://weibel-lines.typepad.com/weibelines/2006/02/identifier_ideo.html"&gt;opaque identifier&lt;/a&gt; and need not give it further meaning. This opacity allows us to stop our analysis at that point; we don't have to analyze down to the constituent quarks (or chiral forms, for that matter, if they don't have any impact on our current goals).&lt;/LI&gt;&lt;/OL&gt;&lt;br /&gt;&lt;br /&gt;In future posts I'll cover the entities and relationships that I take as being important in this &lt;em&gt;middle distance&lt;/em&gt; and how to identify them (which, like most modularization efforts, is a more of an art than a science).&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-3793019115434021456?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/3793019115434021456/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=3793019115434021456' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/3793019115434021456'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/3793019115434021456'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2010/04/ontologies-in-practice.html' title='Ontologies in Practice'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-3147580360898974209</id><published>2010-03-29T17:04:00.001-07:00</published><updated>2010-03-29T17:04:54.519-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><title type='text'>Architects As Service Providers</title><content type='html'>&lt;a href="http://www.computer.org/cms/Computer.org/ComputingNow/homepage/2010/0210a/rW_SW_ArchitectsasServiceProviders.pdf"&gt; This paper&lt;/a&gt; by Roland Faber of Siemens Healthcare recently appeared in IEEE Software. It talks pointedly about how it is more effective for architecture to be structured as a service that provides value by interacting closely with project developers rather than being structured as a function that produces documentation to be followed by the projects. It even advocates that architects perform some hands on coding in the projects (mirable dictu).&lt;br /&gt;&lt;br /&gt;It would be impossible for me to agree more, including the part about hands on coding, my fondness for which is pretty obvious given my blog posts.&lt;br /&gt;&lt;br /&gt;The article posits that this close, ongoing interaction is a good way of assuring both that projects understand what the architects are trying to accomplish with the architecture and also that the architects develop an appreciation for the practical issues involved in building working software. There is also a side effect of this kind of interaction that they don't mention: its value in preventing the obsolescence of the person doing the architecture. &lt;br /&gt;&lt;br /&gt;The standard scenario in this industry is that a person spends a number of years learning their craft and refining their practice at which point, if they're good, they become an architect or manager and stop coding. The architects' (or manager's) skills stay relevant for a few years (five years seems to be a recurring number), after which they become a pointy headed character in a Dilbert cartoon.&lt;br /&gt;&lt;br /&gt;As an industry we should be learning from this anti-pattern.&lt;br /&gt;&lt;br /&gt;It seems to me a truism that coding keeps you grounded, and current, and keeps this Dilbertization from happening. Certainly at some point other duties e.g., architecture/management require that you take yourself off of the critical coding path, otherwise the success of the project is put in jeopardy, but new technologies or core utilities that are not on a critical path timeline are all fair game. &lt;br /&gt;&lt;br /&gt;I strongly believe in this approach and this is the first article I've seen that reflects my personal practice. &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-3147580360898974209?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/3147580360898974209/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=3147580360898974209' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/3147580360898974209'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/3147580360898974209'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2010/03/architects-as-service-providers.html' title='Architects As Service Providers'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-3204589897151587782</id><published>2010-03-15T17:05:00.001-07:00</published><updated>2010-08-21T10:10:06.828-07:00</updated><title type='text'>Low Level Virtual Machine (LLVM)</title><content type='html'>LLVM, as described in this article on &lt;a href="http://www.appleinsider.com/articles/08/06/20/apples_other_open_secret_the_llvm_complier.html"&gt;AppleInsider&lt;/a&gt;, stands for Low Level Virtual Machine. It is an open source project that is used and partly supported by &lt;a href="http://developer.apple.com/Mac/library/releasenotes/DeveloperTools/RN-llvm-gcc/index.html"&gt;apple&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;One of the most interesting things about LLVM is a quote at the bottom of page 1 of the article&lt;br /&gt;&lt;blockquote&gt;Apple also uses LLVM in the OpenGL stack in Leopard, leveraging its virtual machine concept of common IR to emulate OpenGL hardware features on Macs that lack the actual silicon to interpret that code. Code is instead interpreted or JIT on the CPU.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;This approach makes it very likely that developers will use the hardware optimized instructions. Most other approaches impose significant costs upon the developers, e.g., the need to write additional code to cover every possible hardware configuration. With the LLVM there is no coding penalty, therefore using the optimized routines becomes a no-brainer, resulting in faster code for people with beefier hardware (who are also those who tend to be most worried about performance) and usable code for everyone else.&lt;br /&gt;&lt;br /&gt;As background the article points to a &lt;a href="http://llvm.org/pubs/2007-03-12-BossaLLVMIntro.pdf"&gt;presentation by Chris Lattner&lt;/a&gt; but I prefer his paper with Vikram Adve &lt;a href="http://llvm.org/pubs/2004-01-30-CGO-LLVM.pdf"&gt;LLVM: a compilation framework for lifelong program analysis&lt;/a&gt; because it talks in terms I can understand (like &lt;a href="http://en.wikipedia.org/wiki/Static_single_assignment_form"&gt;"Static Single Assignment"&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;So here's what's cool: LLVM eats a code representation that is very amenable to optimization and analysis. It optimizes this input and outputs machine code (potentially tuned for the actual hardware which will run the code) decorated to allow low-overhead runtime profiling.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://lh4.ggpht.com/_x3Qv8fFpuQg/S4xrPlqZPWI/AAAAAAAAAgY/xsoxDVCbxfg/llvm.jpg"&gt;&lt;img src="http://lh4.ggpht.com/_x3Qv8fFpuQg/S4xrgwoEK9I/AAAAAAAAAgc/bnYsOGnRz7Y/blog__llvm.jpg?imgmax=800" alt="blog__llvm.jpg" border="0" height="74" width="300" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;(Click on image for larger version)&lt;br /&gt;&lt;br /&gt;This approach permits repeated optimizations based upon recent run-time data rather than generalized heuristics -- it is reminiscent of "&lt;a href="http://java.sun.com/products/hotspot/whitepaper.html"&gt;hot spot&lt;/a&gt;" with larger scope but less immediacy.&lt;br /&gt;&lt;br /&gt;I'd be remiss to not mention the strategic implications of this: it allows Apple to radically shift hardware configurations, while restricting the software impact to a relatively small chunk of code c.f. iPad.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Update 21 Aug 2010&lt;/span&gt;: Just noticed LLVM got a &lt;a href="http://www.acm.org/press-room/news-releases/2010/sigplan-software-award/"&gt;SIGPLAN award&lt;/a&gt; -- well deserved!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-3204589897151587782?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/3204589897151587782/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=3204589897151587782' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/3204589897151587782'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/3204589897151587782'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2010/03/low-level-virtual-machine-llvm.html' title='Low Level Virtual Machine (LLVM)'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/_x3Qv8fFpuQg/S4xrgwoEK9I/AAAAAAAAAgc/bnYsOGnRz7Y/s72-c/blog__llvm.jpg?imgmax=800' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-4278756385514435206</id><published>2010-03-03T09:24:00.001-08:00</published><updated>2010-03-03T09:24:38.013-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software'/><category scheme='http://www.blogger.com/atom/ns#' term='scientific'/><title type='text'>Hubs &amp; Connectors</title><content type='html'>I recently stumbled upon the &lt;a href="http://www.compositesw.com/"&gt;composite software&lt;/a&gt; site and was impressed by their architecture. It is a virtualized/federated solution that reminds me of the &lt;a href="http://www.rdfsg.com/papers/HubConnectorMay2003.pdf"&gt;Hub/Connector system&lt;/a&gt; which I had proposed as a data integration model for the drug discovery/cheminformatics space.&lt;br /&gt;&lt;br /&gt;The advantages of such an architecture over a conventional data warehouse include: &lt;br /&gt;&lt;UL&gt;&lt;br /&gt;&lt;LI&gt;There is no requirement to perform a complete mapping of the data. This allows focus upon solutions that address the particular problem at hand and the mappings required to solve it. Such a focus is especially important when the data structure and mapping rules are in a state of flux for part of the system. It allows the &lt;em&gt;high flux&lt;/em&gt; areas to be avoided.&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;The target data store need not have a structure capable of holding all of the data simultaneously. For example, a target table that would hold all of your &lt;a href="http://www.cdisc.org/sdtm"&gt;CDISC SDTM&lt;/a&gt; SUPPQUAL values could require upward of 1000 columns reaching the limits of many common relational databases. On the other hand, the solution for an incremental data set would be an order of magnitude smaller.&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Only the data of interest is accessed/moved. In systems that only analyze a small set of the data at a time, server size can be reduced substantially.&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Data need not be moved to a central repository, minimizing duplicative storage space.&lt;/LI&gt;&lt;br /&gt;&lt;br /&gt;&lt;/UL&gt;&lt;br /&gt;Of course there are disadvantages&lt;br /&gt;&lt;UL&gt;&lt;br /&gt;&lt;LI&gt;A warehouse allows the precalculation of complex results, imposing little operational delay in retrieving these results.&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Warehouses can be more easily structured to handle analyses which involved large portions of the dataset.&lt;/LI&gt;&lt;br /&gt;&lt;/UL&gt;&lt;br /&gt;In scientific domains, it isn't uncommon for new assays, results, etc. to break your current mappings. A virtualized approach minimizes the impact of these problems upon your system and is certainly something to look at if this sounds like your situation.&lt;br /&gt; &lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-4278756385514435206?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/4278756385514435206/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=4278756385514435206' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/4278756385514435206'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/4278756385514435206'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2010/03/hubs-connectors.html' title='Hubs &amp;amp; Connectors'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-6602741357460924822</id><published>2010-02-15T18:01:00.001-08:00</published><updated>2010-02-15T18:04:00.579-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='seam'/><category scheme='http://www.blogger.com/atom/ns#' term='jboss'/><title type='text'>Seam 2.2 + Jboss 5.1: a resolution</title><content type='html'>A few weeks ago I &lt;a href="http://rdfsg.blogspot.com/2009/11/seam-jboss-50-jboss-51.html"&gt;posted&lt;/a&gt; about my difficulties migrating to new versions of jboss and seam.&lt;br /&gt;&lt;br /&gt; I was finally able to resolve these difficulties by following these steps:&lt;br /&gt;&lt;UL&gt;&lt;br /&gt;&lt;LI&gt;I first took the suggestion of &lt;a href="http://seamframework.org/Community/GetRestrictionsInSeam21"&gt;this post&lt;/a&gt; on seamframework.org and  used seam-gen to generate a basic seam application that worked with my existing database.&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;I then replaced all the libraries with the ones from the newly generated application.&lt;/LI&gt; &lt;br /&gt;&lt;LI&gt;As a penultimate step I followed the procedure outlined in the &lt;a href="http://anonsvn.jboss.org/repos/seam/tags/JBoss_Seam_2_1_0_GA/seam21migration.txt"&gt;migration guide&lt;/a&gt;.&lt;/LI&gt;&lt;br /&gt;&lt;/UL&gt;&lt;br /&gt;I was now able to search for differences between the new and the old version.&lt;br /&gt;The primary differences were in the &lt;code&gt;*List.java&lt;/code&gt; classes&lt;br /&gt;&lt;br /&gt;--the new version of the code followed the pattern:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;    private static final String[] RESTRICTIONS = {“lower(shows.name) like concat(lower(#{showsList.shows.name}),’%’)”,};&lt;br /&gt;    private static final String EJBQL = “select shows from Shows shows”;&lt;br /&gt;    @SuppressWarnings(“unchecked”)&lt;br /&gt;    public ShowsList() {&lt;br /&gt;        setEjbql(EJBQL);&lt;br /&gt;&lt;strong&gt;        setRestrictionExpressionStrings(Arrays.asList(RESTRICTIONS));&lt;br /&gt;&lt;/strong&gt;        setMaxResults(25);&lt;br /&gt;    }&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;--- the pattern in the old version was:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;	private static final String[] RESTRICTIONS = {“lower(shows.name) like concat(lower(#{showsList.shows.name}),’%’)”,};&lt;br /&gt;	private Shows shows = new Shows();&lt;br /&gt;	@Override&lt;br /&gt;	public String getEjbql() {&lt;br /&gt;		return “select shows from Shows shows”;&lt;br /&gt;	}&lt;br /&gt;	@Override&lt;br /&gt;	public Integer getMaxResults() {&lt;br /&gt;		return 25;&lt;br /&gt;	}&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;The primary difference is that the restrictions used the &lt;code&gt;setRestrictionExpressionStrings&lt;/code&gt; method. As one of the commenters on the seamframeworks post mentioned: &lt;em&gt;it would have been good to have discussed this in the migration document&lt;/em&gt;.&lt;br /&gt;&lt;br /&gt;Not a tremendously big deal but, as I said previously, disappointing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-6602741357460924822?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/6602741357460924822/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=6602741357460924822' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/6602741357460924822'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/6602741357460924822'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2010/02/seam-22-jboss-51-resolution.html' title='Seam 2.2 + Jboss 5.1: a resolution'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-2620956046716894576</id><published>2010-02-01T17:53:00.001-08:00</published><updated>2010-02-01T17:53:37.879-08:00</updated><title type='text'>popViewControllerAnimated &amp; message sent to deallocated instance</title><content type='html'>This is just an informational posting -- it is so odd I thought I'd share.&lt;br /&gt;&lt;br /&gt;If you have any of the following:&lt;br /&gt;&lt;UL&gt;&lt;code&gt;&lt;br /&gt;&lt;LI&gt;controllerDidChangeContent&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;didChangeSection&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;didChangeObject&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;controllerWillChangeContent&lt;/LI&gt;&lt;/code&gt;&lt;/UL&gt;&lt;br /&gt;&lt;br /&gt;in the same class as a save that does a&lt;br /&gt;&lt;code&gt;&lt;strong&gt;popViewControllerAnimated&lt;/strong&gt;&lt;/code&gt;&lt;br /&gt;when the save completes you will get a&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;strong&gt;controllerWillChangeContent:]: message sent to deallocated instance &lt;/strong&gt;(insert your address here)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;I'm not exactly sure how this happens&lt;br /&gt;but essentially &lt;br /&gt;these methods hold onto the old copy of &lt;strong&gt;&lt;em&gt;self&lt;/em&gt;&lt;/strong&gt; in some manner.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;Here's an example:&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;(gdb) p self&lt;br /&gt;$1 = (GroupItemSelectViewController *) &lt;strong&gt;0x3a34f70&lt;/strong&gt;&lt;br /&gt;Current language:  auto; currently objective-c&lt;br /&gt;(gdb) c&lt;br /&gt;Continuing.&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Now pop back to the old screen and then do some editing:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;2010-01-05 20:34:36.052 GoodToGo[62202:207] (GroupItemSelectViewController:didSelectRowAtIndexPath) setting Item to YES &lt;br /&gt;2010-01-05 20:34:37.960 GoodToGo[62202:207] (GroupController:numberOfRowsInSection) rows: 3&lt;br /&gt;2010-01-05 20:34:39.902 GoodToGo[62202:207] (GroupController:numberOfRowsInSection) rows: 2&lt;br /&gt;2010-01-05 20:34:42.690 GoodToGo[62202:207] (GroupController:numberOfRowsInSection) rows: 3&lt;br /&gt;2010-01-05 20:34:43.561 GoodToGo[62202:207] (didSelectRowAtIndexPath) Selecting row 0&lt;br /&gt;2010-01-05 20:34:43.563 GoodToGo[62202:207] (GroupItemSelectViewController:viewDidLoad) fetchedItemCount:5, holderCount: 5&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;What is the current value of self?&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;(gdb) p self&lt;br /&gt;$2 = (GroupItemSelectViewController *) &lt;strong&gt;0x3c84820&lt;/strong&gt;&lt;br /&gt;(gdb) c&lt;br /&gt;2010-01-05 20:34:51.690 GoodToGo[62202:207] (GroupItemSelectViewController:didSelectRowAtIndexPath): setting Item to NO &lt;br /&gt;Continuing.&lt;br /&gt;2010-01-05 20:34:53.546 GoodToGo[62202:207] (GroupController:numberOfRowsInSection) rows: 3&lt;br /&gt;2010-01-05 20:34:53.547 GoodToGo[62202:207] *** -[GroupItemSelectViewController controllerWillChangeContent:]: message sent to deallocated instance &lt;strong&gt;0x3a34f70&lt;/strong&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;As you can see &lt;strong&gt;0x3a34f70&lt;/strong&gt; is the previous value of self, not its current value.&lt;br /&gt;&lt;p&gt;&lt;br /&gt;I do consider this a bug in the framework.&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-2620956046716894576?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/2620956046716894576/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=2620956046716894576' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/2620956046716894576'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/2620956046716894576'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2010/02/popviewcontrolleranimated-message-sent.html' title='popViewControllerAnimated &amp;amp; message sent to deallocated instance'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-7840867892533207297</id><published>2009-11-09T17:21:00.001-08:00</published><updated>2009-11-09T17:21:17.092-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='seam'/><category scheme='http://www.blogger.com/atom/ns#' term='jboss'/><title type='text'>Seam + {jboss 5.0 | jboss 5.1} = ?</title><content type='html'>About a month ago I completed the upgrade of all my Macs to &lt;a href="http://www.apple.com/macosx/"&gt;Snow Leopard&lt;/a&gt;. This generally went smoothly -- not a surprise for a release that has been charactered as "more refinement than upgrade."&lt;br /&gt;&lt;br /&gt;One exception was a jboss 4.2.2/seam 2.1 application that would hang when generating a list view of objects that included images. My initial reaction was that this provided an opportunity to upgrade to jboss 5.x and partake of whatever enhancements that offered.&lt;br /&gt;&lt;br /&gt;This proved to be a task that ended in frustration. I spent ~ 40 hours on it and eventually gave up. I fell back to the earlier version and upgraded to jboss 4.2.3 which solved the problem (which I think was related to using Java 1.6). &lt;br /&gt;&lt;br /&gt;I thought I'd share some of my experiences, just in case someone else finds it useful:&lt;br /&gt;&lt;br /&gt;The first glitch was that the version of seam I was running didn't appear to work with jboss 5.x so I upgraded to version 2.2&lt;br /&gt;&lt;br /&gt;The attendant upgrades caused me to change some of the DB mappings&lt;br /&gt;e.g.,&lt;br /&gt;&lt;blockquote&gt;&lt;pre&gt;&lt;code&gt;change blob annotations (mysql specified)&lt;br /&gt;From: @Column(name = “data”, length = 8000000)&lt;br /&gt;To: @Column(name = “data”, length = 8000000, columnDefinition = “mediumblob”)&lt;/code&gt;&lt;/pre&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;I also tried switching to &lt;a href="http://www.jboss.com/products/devstudio/"&gt;jboss developer studio&lt;/a&gt; to see if that would help me uncover the problem -- this had no real impact.&lt;br /&gt;&lt;br /&gt;The core symptom was that nothing was coming back from pages that generated a list of items in the DB and no Hibernate queries showed up in the back end stream.&lt;br /&gt;&lt;br /&gt;I eventually tried to go in to one of the more "internal" pages&lt;br /&gt; &lt;code&gt;http://localhost:8080/webtwo/Competition.seam?htmlTitle=competition&amp;competitionId=2&amp;cid=6&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;(a real advantage of Seam's &lt;a href="http://en.wikipedia.org/wiki/Representational_State_Transfer"&gt;rest interface&lt;/a&gt;) and finally saw a hibernate query on the background stream with the warning:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;code&gt;WARN  [Param] could not create converter for: competitionId&lt;br /&gt;javax.el.PropertyNotFoundException: Target Unreachable, identifier ‘competitionHome’ resolved to NULL&lt;/code&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;This warning was similar to the error I was getting that the &lt;code&gt;authenticate method resolved to NULL.&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;What appeared to be happening is that the seam annotations weren't being processed correctly (specifically &lt;code&gt;@Name("competitionHome")&lt;/code&gt; ).&lt;br /&gt;&lt;br /&gt;After searching on this error I found this link&lt;br /&gt;&lt;a href="http://www.seamframework.org/Documentation/WhatHappensWhenYouDeploySeamAppInJBoss5"&gt;http://www.seamframework.org/Documentation/WhatHappensWhenYouDeploySeamAppInJBoss5&lt;/a&gt;&lt;br /&gt;which made me think that things are basically broken.&lt;br /&gt;&lt;br /&gt;As I said, rolling back to the original code, the problems went away in jboss 4.2.3. However, I must admit that I'm surprised that the issue exists in the newer versions of seam/jboss&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;strong&gt;Jboss&lt;/strong&gt;&lt;br /&gt;5.0.0.GA	Stable	104 MB	2008-12-05	LGPL	 134971	Download	Notes&lt;br /&gt;5.1.0.GA	Stable	130 MB	2009-05-23	LGPL	 181731	Download	Notes&lt;br /&gt;&lt;strong&gt;Seam&lt;/strong&gt;&lt;br /&gt;JBoss Seam 2.2	2.2.0.GA	Production	111 MB	30.07.2009	LGPL	Notes	Download&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;I know that this is "unsupported" code, but I would still think that there would be better testing. After all, 5.1 was out in May of this year and Seam 2.2 claims&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Seam 2.2 examples target JBoss Application Server 5.1.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Now, I do realize I could have been anywhere from 1 minute to 1 month away from a solution for this problem (if anyone has a solution, I'd be more than happy to try it), but I have two closing thoughts:&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;Allocate more time than you might have expected towards making the transition&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;It would have been appreciated if the various teams involved paid more attention to migration tools (even documentation) and/or backward compatibility. My various searches trying to solve this problem turned up a lot of people having obscure issues with the transition: this is not the way to encourage wide uptake of a tool set.&lt;/LI&gt;&lt;br /&gt;&lt;/UL&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-7840867892533207297?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/7840867892533207297/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=7840867892533207297' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/7840867892533207297'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/7840867892533207297'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2009/11/seam-jboss-50-jboss-51.html' title='Seam + {jboss 5.0 | jboss 5.1} = ?'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-8307647561159855495</id><published>2009-10-20T06:17:00.001-07:00</published><updated>2009-10-20T06:17:18.510-07:00</updated><title type='text'>XCode 3.x</title><content type='html'>I've been coming up to speed on iPhone development and thought I'd share some of my experiences.&lt;br /&gt;&lt;br /&gt;The first is that &lt;em&gt;&lt;a href="http://www.amazon.com/Beginning-iPhone-Development-Exploring-SDK/dp/1430224592/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1254423636&amp;sr=8-1"&gt;Beginning iPhone 3 Development&lt;/a&gt;&lt;/em&gt; is a very useful starting point. I tried a couple of other resources but finally settled on this. I do like books better than video when learning a new environment but this book also has the advantage of being up-to-date and accurate. I hate trying to learn an environment/languiage when the examples are wrong! &lt;em&gt;Beginning iPhone 3 Development&lt;/em&gt; employed a technical reviewer who worked and verified all of the examples. Shouldn't every book like this have a technical reviewer? -- the world would be a better place.&lt;br /&gt;&lt;br /&gt;A few observations on the development environment, which although reasonable is a bit more primitive than netbeans or eclipse, especially around refactoring&lt;br /&gt;e.g., &lt;br /&gt;renaming a class/header file doesn't rename all of the imports throughout the project, and there isn't a "Refactoring" capability that I've found that does this.&lt;br /&gt;&lt;br /&gt;The C aspects certainly harken back to an earlier era, e.g., one has to define a function in a .m file and declare it in a .h file for it to work correctly. At least Code Sense minimizes the chances for mistyping in this case.&lt;br /&gt;&lt;br /&gt;XCode only allows you to view the interface specification (the .xib file) via the interface builder--however it is useful to realize that the .xib file is really an xml file that can be viewed in a normal text editor e.g., &lt;a href="http://www.gnu.org/software/emacs/"&gt;emacs&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Misspellings count and don't seem to generate errors:&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;Surprised that Code Sense doesn't prompt when overriding methods from the superclasses, which causes the the classic "why wasn't this method called debugging session"&lt;/LI&gt; &lt;br /&gt;&lt;br /&gt;&lt;LI&gt;Similarly the compiler doesn't tell if you're calling undefined methods&lt;br /&gt;&lt;/LI&gt;&lt;br /&gt;&lt;br /&gt;&lt;LI&gt;Misspelling accessor e.g., &lt;br /&gt;childController.tltle = prez.name;&lt;br /&gt;vs&lt;br /&gt; childController.title = prez.name;&lt;br /&gt;gets the error: &lt;code&gt;request for member ‘tltle’ in something not a structure or a union&lt;/code&gt;&lt;/LI&gt;&lt;br /&gt;&lt;/UL&gt;&lt;br /&gt;&lt;br /&gt;Also can't believe that there isn’t enough introspection so that you still have to do these:&lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;#pragma mark NSCoding&lt;br /&gt;-(void)encodeWithCoder:(NSCoder *)encoder{&lt;br /&gt;	[encoder encodeObject:field1 forKey:kField1Key];&lt;br /&gt;	[encoder encodeObject:field2 forKey:kField2Key];&lt;br /&gt;	[encoder encodeObject:field3 forKey:kField3Key];&lt;br /&gt;	[encoder encodeObject:field4 forKey:kField4Key];&lt;br /&gt;}&lt;br /&gt;-(id)initWithCoder:(NSCoder *)decoder{&lt;br /&gt;	if(self = [super init]){&lt;br /&gt;		self.field1 = [decoder decodeObjectForKey:kField1Key];&lt;br /&gt;		self.field2 = [decoder decodeObjectForKey:kField2Key];&lt;br /&gt;		self.field3 = [decoder decodeObjectForKey:kField3Key];&lt;br /&gt;		self.field4 = [decoder decodeObjectForKey:kField4Key];&lt;br /&gt;	}&lt;br /&gt;	return self;&lt;br /&gt;}&lt;br /&gt;#pragma mark -&lt;br /&gt;#pragma mark NSCopying&lt;br /&gt;-(id)copyWithZone:(NSZone *)zone{&lt;br /&gt;	FourLines *copy = [[[self class] allocWithZone:zone] init];&lt;br /&gt;	copy.field1 = [[self.field1 copyWithZone:zone] autorelease];&lt;br /&gt;	copy.field2 = [[self.field2 copyWithZone:zone] autorelease];&lt;br /&gt;	copy.field3 = [[self.field3 copyWithZone:zone] autorelease];&lt;br /&gt;	copy.field4 = [[self.field4 copyWithZone:zone] autorelease];&lt;br /&gt;}&lt;/code&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;h2&gt;On The Bright Side &lt;br /&gt;&lt;/h2&gt;&lt;br /&gt;@synthesize obviates a lot of useless typing.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://macdevelopertips.com/objective-c/objective-c-categories.html"&gt;Categories&lt;/a&gt; seem cool and I plan to explore them further. Categories let you add methods to an existing class -- the source code of the existing class is not required.&lt;br /&gt;&lt;br /&gt;From the xcode 3.1 doc Categories&lt;br /&gt;&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;Provide a simple way of grouping related methods. Similar methods defined in different classes can be kept together in the same source file.&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Simplify the management of a large class when several developers contribute to the class definition.&lt;br /&gt;Let you achieve some of the benefits of incremental compilation for a very large class.&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Can help improve locality of reference for commonly used methods.&lt;br /&gt;Enable you to configure a class differently for separate applications, without having to maintain different versions of the same source code.&lt;br /&gt;To declare informal protocols.&lt;br /&gt;See “Informal Protocols ,” as discussed under “Declaring Interfaces for Others to Implement.”&lt;/LI&gt;&lt;/UL&gt;&lt;br /&gt;&lt;br /&gt;The doc also contains a suitable caveat:&lt;br /&gt;&lt;blockquote&gt;Although the language currently allows you to use a category to override methods the class inherits, or even methods declared in the class interface, you are strongly discouraged from using this functionality. A category is not a substitute for a subclass.&lt;/blockquote&gt; &lt;br /&gt;&lt;br /&gt;That is, categories are powerful and can blow your foot off, the "power tool" version of shooting yourself in the foot, if you're not careful.&lt;br /&gt;&lt;br /&gt;You can schedule actions to happen in the future and then cancel them when superseded by a subsequent user action.&lt;br /&gt;&lt;code&gt;&lt;pre&gt;[NSObject cancelPreviousPerformRequestsWithTarget:self selector:@selector(singleTap) object:nil];&lt;br /&gt;			[self performSelector:@selector(doubleTap) withObject:nil afterDelay:.4];&lt;/pre&gt;&lt;/code&gt;&lt;br /&gt;It is very nice to have that capability just "built in."&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-8307647561159855495?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/8307647561159855495/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=8307647561159855495' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8307647561159855495'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8307647561159855495'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2009/10/xcode-3x.html' title='XCode 3.x'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-886156095452704582</id><published>2009-09-14T17:17:00.001-07:00</published><updated>2009-09-14T17:17:48.877-07:00</updated><title type='text'>Patterns in Network Architecture</title><content type='html'>I recently finished reading &lt;a href="http://www.amazon.com/Patterns-Network-Architecture-Return-Fundamentals/dp/0132252422/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1252342315&amp;sr=8-1"&gt;Patterns in Network Architecture&lt;/a&gt; by &lt;a href="http://en.wikipedia.org/wiki/John_Day_(computer_scientist)"&gt;John Day&lt;/a&gt;. It's an attempt to rethink network architectures and polish up "the unfinished demo" that is the internet. &lt;br /&gt;&lt;br /&gt;Now, I'm not a network guy, so I can't evaluate the quality of his proposed solutions in any detail, but I liked his thought process and found it a useful read for anyone interested in a good example of thinking through a hard problem and coming up with a disciplined "minimal covering" solution. &lt;br /&gt;&lt;br /&gt;Day focuses upon discovering the appropriate layers and layer structures necessary for communication. He works up from interprocess communication on a single machine to processes communicating across multiple machines.&lt;br /&gt;&lt;br /&gt;The implications of this analysis are interesting in and of themselves and closely resemble structures seen in other systems. His metaphors are primarily in terms of name lookup and binding,  using compilers and operating systems as examples (I have to admit that this only feels partially correct to me: I think the full problem is more akin to providing the data/instructions to a processor and therefore needs to include the mapping from a "memory location" to an actual address accessible by the chip's execution unit e.g., it should take into account caching, &lt;a href="http://en.wikipedia.org/wiki/Translation_lookaside_buffer"&gt;TLBs&lt;/a&gt; et al.).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The first and foremost conclusion in Day's opinion is that there is one layer that provides interprocessor communication and it replicates. That is, the structure of each network layer is the same, but the policies and optimizations differ depending upon the particulars of what the layer is connected to. Every layer has three parts, data transfer, IPC (Interprocess Communication) control and IPC management -- where control is short cycle management.&lt;br /&gt;&lt;br /&gt;In his words&lt;br /&gt;&lt;br /&gt; &lt;span&gt;"Layers have two major properties that are of interest to us: abstraction and scaling (i.e., divide and conquer). Layers hide the operation of the internal mechanisms from the users of the mechanisms and segregate and aggregate traffic. But most important, they provide an abstraction of the layers below"&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;When moving from a shared to a distributed environment the core new functionality is an Error- and flow-control protocol (EFCP). This protocol replaces the shared memory mechanisms of the single system to ensure reliability and to provide flow control in the environment of communication between two systems. An EFCP PM is a task of the IPC process. Although in theory, such a process could be included even when communication is on a shared processor, in practice, this communication is so reliable as to make it redundant.&lt;br /&gt;&lt;br /&gt;I think that the core insight/technique was to frame communication from the network perspective as being  from application to application and not as interacting with the network e.g., in the figure below (6-15 from the book) communication is conceptualized as being &lt;em&gt;across,&lt;/em&gt; that is between applications at the same layer, rather than down through the network and back up to the other application. &lt;br /&gt;&lt;br /&gt;The application concerns itself with developing a shared state with its partner application. The N-1 layer provides an an abstract aggregated API to support the application's view of the communication and performs whatever aggregation and abstraction necessary to develop a shared state with the N-1 application on the other side. It then hands off details to the N-2 layer which gets it to the N-2 layer on the other side, etc.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh6.ggpht.com/_uhpaSaKsmiM/SqVLTjCKqxI/AAAAAAAAAGw/6gBOXQOqMAQ/blog__6-15.jpg?imgmax=800" alt="blog__6-15.jpg" border="0" width="300" height="185" /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Yes, it is just encapsulation all over again, but as we all know, finding the right thing to encapsulate and doing it in a practical way takes a lot of work. &lt;br /&gt;&lt;br /&gt;I'm eliding a number of the other key findings of the book such as &lt;UL&gt;&lt;LI&gt;The observation than an address only needs to be unique within a (distributed) application layer&lt;/LI&gt;&lt;LI&gt; A connection is made only after authentication has been obtained and the connection authorized etc.&lt;/LI&gt; &lt;/UL&gt;&lt;br /&gt;All are developed in a thoughtful way showing deep insight into the problem.&lt;br /&gt;&lt;br /&gt;Although not the easiest read for someone without a strong networking background, it is an interesting and useful exercise to watch someone so well versed go through the process so thoughtfully.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-886156095452704582?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/886156095452704582/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=886156095452704582' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/886156095452704582'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/886156095452704582'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2009/09/patterns-in-network-architecture.html' title='Patterns in Network Architecture'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/_uhpaSaKsmiM/SqVLTjCKqxI/AAAAAAAAAGw/6gBOXQOqMAQ/s72-c/blog__6-15.jpg?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-6804773551904665488</id><published>2009-08-24T18:00:00.001-07:00</published><updated>2009-08-24T18:00:22.947-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><title type='text'>Flavors of Architects and Analysts</title><content type='html'>I was recently involved in a discussion on the difference between architects and business analysts and decided to put together my thoughts on the subject.&lt;br /&gt;&lt;br /&gt;Here they are: for each category “Architect”, “Business Analyst” there are a number of sub-categories&lt;br /&gt;&lt;br /&gt;I normally think of three levels of Architecture:&lt;br /&gt;&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;Application – addresses evolution and delivery of a single application (small set of highly related functions), activities include:&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;Partitioning functionality within an application.&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Developing best practices.&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Assuring flexibility to meet current and immediate business needs.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Platform – addresses evolution and delivery of a multiple application for a particular business area (applications grouped by functionality/user community), activities include: &lt;br /&gt;&lt;UL&gt;&lt;br /&gt;&lt;LI&gt;Assuring a commonality of results.&lt;/LI&gt; &lt;LI&gt;Providing for fine grained interoperability.&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Developing frameworks that allow multiple applications to ship on a common substrate.&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Building in flexibility to meet business developments on the planning horizon (this year/next year goals) for moderate-sized groups within the company (~100 people)&lt;/LI&gt;&lt;LI&gt; Assuring that a substrate achieving these goals is in place so the applications can pick it up at the appropriate time.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Enterprise: &lt;br /&gt;&lt;UL&gt;&lt;LI&gt;Identify core data elements and services that will be important over the strategic timeframe.&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Assure that there is an appropriate mix of flexibility/capabilities to meet strategic goals e.g., &lt;UL&gt;&lt;LI&gt;If acquisition of companies is a strategic goal, methods for rapidly merging personnel, purchasing, and operational information systems are important.&lt;/LI&gt; &lt;LI&gt;If acquiring products is a strategic goal, capturing data about supply and delivery chains etc. is important.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;br /&gt;&lt;/UL&gt;&lt;br /&gt;On the business analyst side I similarly think of three levels:&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;	Department Level: &lt;/LI&gt;&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;What are the processes involved in performing a function: including &lt;em&gt;as is&lt;/em&gt; and &lt;em&gt;to be&lt;/em&gt; states?&lt;/LI&gt;&lt;/UL&gt;&lt;br /&gt;&lt;LI&gt;Division Level: &lt;br /&gt;&lt;UL&gt;&lt;LI&gt;What is the external business goal that is being addressed?&lt;/LI&gt; &lt;br /&gt;&lt;LI&gt;Is this the right way to address it?&lt;/LI&gt;&lt;LI&gt;Should functions be merged/refactored?&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Corporate/Strategic: &lt;br /&gt;&lt;UL&gt;&lt;LI&gt;What are the strategic business differentiators going to be in the business that we want to be in 3-5 years?&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;What must the business look like to support them?&lt;/LI&gt;&lt;/UL&gt;.&lt;/LI&gt;&lt;br /&gt;&lt;/UL&gt;&lt;br /&gt;At all levels there should be some time spent to look at potential inflection points that might radically change the structure of delivery and build in flexibility to address that potential,  e.g., &lt;br /&gt;&lt;UL&gt;&lt;LI&gt;For architecture think outsourcing, software as a service, location aware computing, etc.&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;For business think increased competition in product acquisition, competition from generic products, regulatory/legal landscape.&lt;/LI&gt;&lt;/UL&gt;&lt;br /&gt;&lt;br /&gt;How these various functions are actually assigned to people depends a lot upon the scale of the problem, the level of risk/uncertainty, the talents of the people involved, and the flexibility of the organization. At one extreme, a star performer building upon a solid platform architecture (in the sense used above) can be a combination business analyst/application-architect/developer for a system serving 50+ users in a non-validated environment.&lt;br /&gt;&lt;br /&gt;I think it important that business analysts are able to understand the business processes and vocabulary well enough so that there is a good transmission of information between those expressing the needs and the analyst. This implies a greater stickiness between the analyst and the user community than is necessary for the architecture or project management functions.&lt;br /&gt;&lt;br /&gt;Similarly there are commonalities in the level of abstractions used in the architecture level (Application, Platform and Enterprise) that imply that levels are sticker than business areas or technology. &lt;br /&gt;&lt;br /&gt;In theory, project management is more transferable, but the stickiness here revolves around legal requirement, diversity of end use (geography, user types), system novelty.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-6804773551904665488?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/6804773551904665488/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=6804773551904665488' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/6804773551904665488'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/6804773551904665488'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2009/08/flavors-of-architects-and-analysts.html' title='Flavors of Architects and Analysts'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-1248582057492028913</id><published>2009-07-26T07:18:00.001-07:00</published><updated>2009-07-26T07:18:09.566-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><title type='text'>Open Source as an Architectural Driver</title><content type='html'>&lt;a href="http://www.washingtonmonthly.com/features/2009/0907.longman.html"&gt;Phillip Longman's post about open source products in healthcare&lt;/a&gt; (specifically,  the VA's "health IT system") talks about how the Midland Memorial Hospital's installation of the VA system went well because it was easy to use, and it was easy to use because it was open source.&lt;br /&gt;&lt;br /&gt;Well, the first one wasn't a surprise. Well-designed easy to use software is well, easy to use. The fact that this leads to successful system uptake/user adoption should be no more surprising than the fact that people like their iPhones. The second factor &lt;em&gt;being easy to use because you are open source&lt;/em&gt; is a bit of mental speed bump: &lt;strong&gt;Easy to use open source? &lt;/strong&gt;Well,  maybe if you are a developer. The article states that the ease of use stemmed from its ease of modification. Now I can't comment on that since I am unfamiliar with the product and have never been involved in a hospital centric system.&lt;br /&gt;&lt;br /&gt;However, open source and ease of modification? Yes, that fits. A successful open source project, by definition, must be relatively easy to modify: An interested developer should be able to jump in, modify the code and stand up a running test build in short order. Otherwise the project won't attract enough attention to survive. More importantly, I think a system that is easier to modify will leap ahead in functionality even if it starts out behind. &lt;br /&gt;&lt;br /&gt;This is one of the reasons we've seen such useful build/test tools come out of the open source community e.g., ant/junit/maven etc.. All open source projects needs tools like these to succeed since they are critical situations where you cannot afford a dedicated buildmeister or QA organization e.g., you're a developer modifying the code to satisfy your needs.&lt;br /&gt;&lt;br /&gt;Similarly a clean, modular, layered structure is going to be favored, codependencies (&lt;strong&gt;A&lt;/strong&gt; depends upon &lt;strong&gt;B&lt;/strong&gt;, &lt;strong&gt;B&lt;/strong&gt; depends upon &lt;strong&gt;A&lt;/strong&gt;) are going to be rejected, since they require understanding two pieces of code and their interrelationship to perform a successful modification.&lt;br /&gt;&lt;br /&gt;Both of these issues can be more easily compensated for within a "closed source" shop, since revenue-generating projects employing full time personnel can invest the time and discipline to keep things working, even if the software has a few points of poor structure. In addition, if the points of 'bad architecture" are manageable there is little incentive to fix the problem, since it might easily cost more to fix than it's worth, given the costs of running a full-time development team.&lt;br /&gt;&lt;br /&gt;One of the strongest examples I think we've seen of this is the EJB vs. Hibernate controversy with the resultant "conciliation" of EJB3. Hibernate, being open source, could give developers what they needed rather than what they "should want" &amp; it won due to its simplicity and speed.&lt;br /&gt;&lt;br /&gt;This argues for a bias towards an open source style, even in a closed source system. For example, can your consultants (internal or external) add/modify deep system functionality? Designing your system in a way that supports such modifications will make the architecture better and help keep the product fresh.&lt;br /&gt;&lt;br /&gt;Again, why this works for hospital systems is beyond me, unless there has been an undocumented rash of coding by doctors and nurses.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-1248582057492028913?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/1248582057492028913/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=1248582057492028913' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/1248582057492028913'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/1248582057492028913'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2009/07/open-source-as-architectural-driver.html' title='Open Source as an Architectural Driver'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-4722817001276152623</id><published>2009-06-30T14:45:00.001-07:00</published><updated>2009-06-30T14:45:15.113-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='iPhone'/><title type='text'>iPhone: changing the way we think</title><content type='html'>I'm struck by how the iPhone has changed the way we think about what can be done with software based devices assisting us as &lt;em&gt;&lt;a href="http://en.wikipedia.org/wiki/Heideggerian_terminology"&gt;beings-in-the-world&lt;/a&gt;&lt;/em&gt;. I'm doing a Heidegger reference here because the iPhone is more than just &lt;a href="http://en.wikipedia.org/wiki/Ubiquitous_computing"&gt;ubiquitous computing&lt;/a&gt;: a device always at my side that could answer those important questions like:&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;Is there good coffee close by?&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;What's the weather going to be like later?&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;How old was Kennedy when he was elected?&lt;/LI&gt;&lt;/UL&gt;&lt;br /&gt;&lt;br /&gt;Although it certainly is that, it has become a lot more, changing both the economics of software delivery and what it means for software to be delivered. &lt;br /&gt;&lt;br /&gt;It's not just that there are a billion apps (or so it seems) in the app store, but the economics of iPhone software is such that a small gaming company can do a novel game e.g., &lt;a href="http://zenbound.com/"&gt;tying rope around wooden blocks&lt;/a&gt;, get traction with it and make money. That didn't sound that amazing until I read an &lt;a href="http://www.gamasutra.com/php-bin/news_index.php?story=23785"&gt;interview&lt;/a&gt; with the developers in &lt;a href="http://www.gamasutra.com/"&gt;Gamasutra&lt;/a&gt; that reminded me how hard it was to make money in computer games pre-iPhone. Not that it is easy now, but compared to the stories I heard when I attended a few &lt;a href="http://www.gdconf.com/"&gt;Game Developer conferences&lt;/a&gt; earlier in the decade, it is trivial. Let's just say that the economics of doing a platform (XBox, Playstation, wii) or PC based game were daunting, to say the least, and the likelihood of getting paid for your game was minimal, even if the game was successful.&lt;br /&gt;&lt;br /&gt;The core of the iPhone's difference is as a platform that is easy to use, location aware and ready-to-hand -- more like a hammer than a computer. &lt;br /&gt;&lt;br /&gt;As a platform it is sufficiently distinct that it is also effecting the way we think about delivering &lt;a href="http://chip.org/platform"&gt;healthcare&lt;/a&gt;. Looking at this list highlights core features that are "new," not "new" in the sense of being completely unheard of, but new in the sense of being practically available for use by the overwhelming bulk of the user community -- sort of like the difference between having a generator kit/knowing about electricity and having an electric grid that you can plug your device into.&lt;br /&gt;&lt;br /&gt;As a user-assistant, the iPhone allows me fully exploit the affordances of my current location. I can see where I am on a map, look at overhead imagery of my current neighborhood to see if there is something that I want to photograph, and, if there is, use a small application to grab the geo coordinates so I can later tag the photos I took with my (non-GPS enabled) camera.&lt;br /&gt;&lt;br /&gt;The end result is something that is always with you, knows who you are, knows where you are, has connectivity both up (3G/internet) and down (bluetooth to local devices) while providing a simple effective mechanism to easily add functionality in small increments. &lt;br /&gt;&lt;br /&gt;I think this makes it the biggest game changer since the rollout of the internet to the general public. However, I also realize that this means that it is time to code up a small test application for the iPhone.&lt;br /&gt;&lt;br /&gt;PS: I don't have any experience with the &lt;a href="http://www.google.com/mobile/android/"&gt;Google android platform&lt;/a&gt; or the &lt;a href="http://www.palm.com/us/products/phones/pre/index.html"&gt;Palm Pre&lt;/a&gt;; these observations may apply equally as well to them.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt; &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-4722817001276152623?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/4722817001276152623/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=4722817001276152623' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/4722817001276152623'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/4722817001276152623'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2009/06/iphone-changing-way-we-think.html' title='iPhone: changing the way we think'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-5448813411722545646</id><published>2009-06-01T18:37:00.001-07:00</published><updated>2009-06-01T18:37:13.404-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data integration'/><title type='text'>Linked Data</title><content type='html'>Finally, thanks to a discussion with &lt;a href="http://eneumann.org/"&gt;Eric Neumann&lt;/a&gt; a few weeks ago, I'm beginning to understand what &lt;a href="http://linkeddata.org/"&gt;Linked Data&lt;/a&gt; is all about. First a caveat -- although I credit Eric for helping me see how linked data fits into what I'm doing, the following interpretation is strictly my own as are errors of omission, commission or orthogonality, although I think my view is supported by the &lt;a href="http://www.w3.org/DesignIssues/LinkedData.html"&gt;Design Issues document&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The short story is that linked data provides stable identifiers for stuff (a more abstract form of things). These stable identifiers then allow you to say things about this (particular) stuff without necessarily making a strong &lt;a href="http://rdfsg.blogspot.com/2009/04/owlsameas-is-very-strong-assertion.html"&gt;ontological&lt;/a&gt; commitment. &lt;br /&gt;&lt;br /&gt;I like this. It provides for interoperability and integration. It does not provide any inference guarantees which is fine by be, and something that I have been advocating for a while. The Linked data site also has links to a number of datasets which  publish stable identifiers for useful stuff. The site also gives examples of how to publish your own data.&lt;br /&gt;&lt;br /&gt;Hopefully &lt;a href="http://www.data.gov/"&gt;data.gov&lt;/a&gt; will provide its data in this form in the near future.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-5448813411722545646?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/5448813411722545646/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=5448813411722545646' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/5448813411722545646'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/5448813411722545646'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2009/06/linked-data.html' title='Linked Data'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-7630833637176229851</id><published>2009-05-17T18:42:00.001-07:00</published><updated>2009-05-18T12:33:23.879-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='RDF'/><title type='text'>Wolfram Alpha</title><content type='html'>Wolfram Alpha is supposed to be launching in the next few days and has been getting a lot of publicity. For background, here's a link to a short &lt;a href="http://www.youtube.com/watch?v=hYhLsQPHNas"&gt;YouTube demo of Wolfram Alpha&lt;/a&gt;, and a &lt;a href="http://www.nytimes.com/2009/05/11/technology/internet/11search.html"&gt;NY Times&lt;/a&gt; article and Doug Lenat has a nice &lt;a href="http://blog.cyc.com/2009/03/wolfram-alpha.html"&gt;post on his impressions&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;From what I can see (and I don't have access) even though it doesn't live up to some of the early hype, it achieves a very interesting result: it allows retrieval of general computable information using a simple &lt;em&gt;&lt;a href="http://en.wikipedia.org/wiki/Natural_language_processing"&gt;natural language processing&lt;/a&gt;&lt;/em&gt; (&lt;strong&gt;NLP&lt;/strong&gt;) interface. &lt;br /&gt;&lt;br /&gt;This allows for analysis similar to that permitted by a data warehouse, but within different design space. The design goals of WolframAlpha, unlike those of a data warehouse, preclude prestructuring the data in &lt;a href="http://en.wikipedia.org/wiki/Data_mart"&gt;marts&lt;/a&gt; to allow rapid querying of the data in relatively well defined ways. However similar to the mart/warehouse situation you must still provide a speedy response to the quantitative queries to prevent users from drifting away while waiting for an answer.&lt;br /&gt;&lt;br /&gt;The question is how is this done? Rumors on the net indicate that the underlying data is an &lt;a href="http://thenoisychannel.com/2009/03/19/wolfram-alpha-second-hand-impressions/"&gt;RDF triple store&lt;/a&gt;, which makes a lot of sense since RDF Triples constitute a &lt;a href="http://rdfsg.blogspot.com/2008/07/structuring-database-tables.html"&gt;vertical&lt;/a&gt;, model free storage approach. In operation, I imagine that the queries provide nice entry points for initiating a &lt;a href="http://en.wikipedia.org/wiki/Spreading_activation"&gt;spreading-activation&lt;/a&gt; fan-out process on the graph. When the activations intersect you can proceed to roll back up to the initiation points suggested by the query, clustering in a bottom-up data-driven fashion along the way. The clustering also affords a natural  way to structure the data for presentation to the user.&lt;br /&gt;&lt;br /&gt;Although I'll admit that this is just an educated guess as to the mechanism, it does suggest an interesting set of technologies involving fast linking and roll up of data for ad-hoc queries without requiring a lot of effort to tune the data to a specific query.&lt;br /&gt;&lt;br /&gt;Generating a set of vetted and annotated data is a different problem, but hopefully would not require a significantly greater level of effort than the ETL portion of current warehousing efforts.&lt;br /&gt;&lt;br /&gt;Wolfram Alpha therefore constitutes another factor leading me to be more vertical in my storage designs. In the  coming months, I'm hoping to run some benchmarks on production hardware/datasets so as to ground the practicality of this approach and then get permission to publish the results.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Update 18 May 2009:&lt;/strong&gt; I did try Wolfram Alpha today and it failed on my first try "age distribution of England vs UK,"  not so much from any idiosyncrasies in parsing my query, but because it appears to be  encoded with the identity "England == UK." This just goes to show how important it is to be spot-on with your identity information aka "synonym tables are easy, antonym tables on the other hand......aren't."&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-7630833637176229851?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/7630833637176229851/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=7630833637176229851' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/7630833637176229851'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/7630833637176229851'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2009/05/wolfram-alpha.html' title='Wolfram Alpha'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-6861647410631050147</id><published>2009-05-06T17:08:00.001-07:00</published><updated>2009-05-06T17:08:58.115-07:00</updated><title type='text'>launchd</title><content type='html'>I just upgraded to a new laptop (driven mostly by the need for more RAM -- hopefully 6G will be adequate for a couple of years). It got me thinking: even though it's great that the Mac will copy all of your old apps over effortlessly to your new machine, it also happily copies all your old unused cruft over to your new machine, and that's not so great.  &lt;br /&gt;&lt;br /&gt;So, in the spirit of good hygiene (and H1N1 preparedness), I decided to open up the console and look to see what I might find.  I discovered that I had a couple of launchd jobs that referenced executables which didn't exist on my system any more e.g., carbon copy cloner. &lt;br /&gt;&lt;br /&gt;I have been able to rid myself of all the launchd issues, by cleaning up the Launchdemons/launchagents under the Library folder but&lt;br /&gt;I still haven't been able to rid myself of all of these&lt;br /&gt;&lt;pre&gt;&lt;code&gt;/Applications/Safari.app/Contents/MacOS/Safari[54428]: Warning: accessing obsolete X509Anchors.&lt;/code&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;This is even after searching the web a couple of times. I think the problem starts up after I open an article from &lt;a href="http://www.newsfirerss.com/"&gt;NewsFire&lt;/a&gt;, but I'm not completely sure. This is definitely a space in which I believe &lt;em&gt;correlation is not causality.&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;If anyone has any ideas on how to fix this, I'd appreciate it.&lt;br /&gt;&lt;br /&gt;BTW it is really nice to have a built in tool like Console: it is simple and effective with just that little bit of extra functionality (string filtering) that makes all the difference in usability.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-6861647410631050147?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/6861647410631050147/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=6861647410631050147' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/6861647410631050147'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/6861647410631050147'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2009/05/launchd.html' title='launchd'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-4458585381251565688</id><published>2009-04-20T15:12:00.001-07:00</published><updated>2009-04-20T15:12:25.083-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software'/><title type='text'>Java Concurrency</title><content type='html'>A predictable side effect of having (way too) many years of experience in Java is that certain "new" features escape your notice. This is particularly true if the IDE's don't pressure you into changing your previously successful, and still functional patterns (the way they do with generics).&lt;br /&gt;&lt;br /&gt;I realized this when reading &lt;a href="http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1240155526&amp;sr=8-1"&gt;Java Concurrency in Practice&lt;/a&gt;. It's a very good book -- I can't say it really opened my eyes on concurrency since I had done some work on multi-master VME based real-time systems years ago, but it is spot on, well written, and a nice refresh. In addition, it made me aware of the thread/concurrency capabilities available in newer versions of Java such as &lt;a href="http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html"&gt;ThreadPoolExecutor&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I recently built a file crawler/hash-calculator/storage system as part of my &lt;a href="http://rdfsg.blogspot.com/2008/12/name-that-data.html"&gt;namedData&lt;/a&gt; work using an &lt;a href="http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ArrayBlockingQueue.html"&gt;ArrayBlockingQueue&lt;/a&gt; and explicitly created threads. ThreadPoolExecutor appeared to allow an easier approach with cleaner shutdown/interrupt semantics.&lt;br /&gt;&lt;br /&gt;Java tips has a clear &lt;a href="http://www.java-tips.org/java-se-tips/java.util.concurrent/pooling-threads-to-execute-short.html"&gt;example&lt;/a&gt; -- the primary change that I would make to this example is to size the thread pool based upon the &lt;a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Runtime.html#availableProcessors()"&gt;number of processors available&lt;/a&gt; (on my laptop this returns the number of cores).&lt;br /&gt;&lt;br /&gt;It took me less than an hour to make this change, test the code, etc. The final product is a lot cleaner, has better shutdown behavior, and even feels like it runs faster. Definitely the right way to go.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-4458585381251565688?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/4458585381251565688/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=4458585381251565688' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/4458585381251565688'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/4458585381251565688'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2009/04/java-concurrency.html' title='Java Concurrency'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-6884284460948142678</id><published>2009-04-06T17:53:00.001-07:00</published><updated>2009-04-06T17:53:32.333-07:00</updated><title type='text'>owl:sameAs is a very strong assertion</title><content type='html'>There's been an interesting discussion on the  public-semweb-lifesci mailing list with the subject "&lt;a href="http://lists.w3.org/Archives/Public/public-semweb-lifesci/2009Mar/0066.html"&gt;blog: semantic dissonance in uniprot&lt;/a&gt;" which, appropriately enough, was spurred by a blogpost entitled &lt;a href="http://i9606.blogspot.com/2009/02/semantic-dissonance-in-uniprot.html"&gt;semantic dissonance in uniprot&lt;/a&gt;. This post talks about a &lt;a href="http://www.uniprot.org/"&gt;uniprot&lt;/a&gt; entry which listed a Drosophila (fruit fly) protein sequence as having been isolated from "a young &lt;a href="http://en.wikipedia.org/wiki/Sporophyte"&gt;sporophyte&lt;/a&gt; contained within a seed." &lt;br /&gt;&lt;br /&gt;The point being that although one doesn't find fruit fly genes in plants, following the &lt;code&gt;owl:sameAs&lt;/code&gt; link leads directly to that conclusion. This generated a very long, fairly thoughtful and minimally flame based &lt;a href="http://www.w3.org/Search/Mail/Public/search?type-index=public-semweb-lifesci&amp;index-type=t&amp;keywords=semantic%20dissonance%20in%20uniprot&amp;search=Search&amp;resultsperpage=100&amp;sortby=date&amp;page=1"&gt;conversation on &lt;code&gt;owl:sameAs&lt;/code&gt; and identity in general&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;As the discussion progressed, the problem with associating identity across graphs (ontologies/systems of data developed by different organizations) was noted, e.g., (in pseudo annotation) &lt;strong&gt;&lt;code&gt;mySystem:itemA owl:sameAs yourSystem:itemX&lt;/code&gt;&lt;/strong&gt;, the issue being that the use of the terms is usually subtly (and often not so subtly) different between the two systems. This problem is especially apparent when making assertions about real objects which exist independently out in the world. For example: "gold" may have a property, but does the property adhere to a single molecule, or a group of gold molecules and if so what characterizes a group of the appropriate size? For example given:&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;A nanotechnology view of gold (still under development)&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;A semiconductor view of gold (probably reasonably well characterized)&lt;br /&gt;&lt;/LI&gt;&lt;LI&gt;A jewelry view of gold&lt;/LI&gt;&lt;/UL&gt;&lt;br /&gt;what are the precise boundaries of their applicability? The issue doesn't arise in a system developed for nanotechnology, semiconductors, or jewelry. The problems surface only when these systems are linked together.&lt;br /&gt;&lt;br /&gt;My thought is that the difficulty centers around the extreme power of &lt;code&gt;owl:sameAs&lt;/code&gt; which indicates that things are identical in all contexts. However in the physical world not only is context everything, but context is also inherently incompletely specified. &lt;br /&gt;&lt;br /&gt;In practice many of us heuristically treat identity in the physical world as operating as if identity means &lt;em&gt;indistinguishable in this context&lt;/em&gt;, with the context being implicitly dependent upon the issue being considered. I would claim that this is the only reasonable way to proceed when reasoning in a practical manner about what is true about particular objects in the world (abstractions can obviously satisfy stronger conditions since they are abstractions -- with the context factored out to any level desired).&lt;br /&gt;&lt;br /&gt;In the physical world, we cannot assure that even the ability to track a particular item with unlimited precision would allow us to make statements about that item which would hold through time. For example, although we might make assertions about a particular atom (#0x177FFEAA) of gold and its behavior, some if not all of the assertions may fail under unexpected conditions, e.g., after an event that alters the structure of the nucleus (nuclear collisions, extremely high temperatures etc.). Exhaustively specifying all of these conditions is impractical at best -- which is one of the reasons the phrase&lt;a href="http://en.wikipedia.org/wiki/Ceteris_paribus"&gt; &lt;em&gt;ceteris paribus&lt;/em&gt;&lt;/a&gt; has remained with us for so long.&lt;br /&gt;&lt;br /&gt;In my own work, since I never worry about tracking individual atoms. I gravitate toward weak rather than strong assertions of identity, trying to be very attentive to context. This is very much in the spirit of the &lt;em&gt;middle distance&lt;/em&gt; as developed in Brian Cantwell Smith's &lt;a href="http://www.amazon.com/Origin-Objects-Bradford-Books/dp/0262692090/ref=sr_1_2?ie=UTF8&amp;s=books&amp;qid=1238705921&amp;sr=8-2"&gt;On The Origin of Objects&lt;/a&gt;.  Smith's point is that our intuitions are well tuned to objects about our size that we interact with frequently. In data integration and architecture work (I had to get there eventually) it implies that integrating across fields that interact to some degree in the "world" is going to be more feasible than integrating across those that don't interact. The give and take of the practical interaction has allowed us to identify the particular features of each item that are important in context.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-6884284460948142678?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/6884284460948142678/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=6884284460948142678' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/6884284460948142678'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/6884284460948142678'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2009/04/owlsameas-is-very-strong-assertion.html' title='owl:sameAs is a very strong assertion'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-8198218463232690636</id><published>2009-03-23T18:03:00.001-07:00</published><updated>2009-03-23T18:03:04.823-07:00</updated><title type='text'>OSX Performance Analysis: Instruments</title><content type='html'>I started working with OSX's &lt;a href="http://developer.apple.com/documentation/DeveloperTools/Conceptual/InstrumentsUserGuide/Introduction/Introduction.html"&gt;Instruments&lt;/a&gt; performance analysis tool, partly out of curiosity and partly because I had just fixed a performance problem in an application using an ad hoc &lt;em&gt;a priori&lt;/em&gt; analysis. It happened to solve the problem, but I have enough experience with performance issues to know that the &lt;em&gt;a priori&lt;/em&gt; guess is often wrong.&lt;br /&gt;&lt;br /&gt;Instruments is heavily related to &lt;a href="http://en.wikipedia.org/wiki/DTrace"&gt;dtrace&lt;/a&gt; and shares a lot of its core attributes. The key attributes are that it is low overhead and works with (almost) anything running on your systems (OSX apparently has the capability for some applications to turn off monitoring for security/DRM reasons).&lt;br /&gt;&lt;br /&gt;There's a lot to like here: you can easily get it up and going on your system and it the analysis section is very user friendly:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;a href="http://lh4.ggpht.com/_x3Qv8fFpuQg/Sb12SCVKTJI/AAAAAAAAANc/ERSxh7S7r5I/instruments_screen.jpg?imgmax=800" alt="instruments_screen.jpg"&gt;&lt;img src="http://lh6.ggpht.com/_x3Qv8fFpuQg/Sb12LP0QKfI/AAAAAAAAANY/Z8AabyYJxe8/blog__instruments_screen.jpg?imgmax=800" alt="blog__instruments_screen.jpg" border="0" width="300" height="211" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Especially nice features include &lt;UL&gt;&lt;LI&gt;Low overhead: the peak CPU usage I saw for the tool was ~ 16%&lt;/LI&gt;&lt;LI&gt;The ability to display exactly what is going on under the &lt;strong&gt;read head&lt;/strong&gt; (the upside down triangle above the graph)&lt;/LI&gt;&lt;LI&gt; Being able to display parameters that you didn't think of turning on during the run. All parameters are captured. The selection only impacts the display -- a godsend for anyone who has had to rerun a test because they forgot to capture a parameter&lt;/LI&gt;&lt;/UL&gt;&lt;br /&gt;That said, I couldn't get any particular instrument to focus only on the process specified. As you can see, all of the instruments capture all of the activity, even though they were set to focus on different processes. Additionally, the "default action" kept resetting whenever I dragged a new instrument onto the display.&lt;br /&gt;&lt;br /&gt;It is still a very worthwhile tool, but if anyone has any tips as to how to get around these issues, I'd appreciate it.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-8198218463232690636?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/8198218463232690636/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=8198218463232690636' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8198218463232690636'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8198218463232690636'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2009/03/osx-performance-analysis-instruments.html' title='OSX Performance Analysis: Instruments'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/_x3Qv8fFpuQg/Sb12LP0QKfI/AAAAAAAAANY/Z8AabyYJxe8/s72-c/blog__instruments_screen.jpg?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-8772317947446322965</id><published>2009-03-04T18:26:00.001-08:00</published><updated>2009-03-04T18:26:56.702-08:00</updated><title type='text'>Kindle</title><content type='html'>It's a bit off topic, but I thought I'd point out how useful a &lt;a href="http://www.amazon.com/Kindle-Amazons-Wireless-Reading-Generation/dp/B00154JDAI/ref=amb_link_83624371_1?pf_rd_m=ATVPDKIKX0DER&amp;pf_rd_s=center-1&amp;pf_rd_r=0N6H9WJA3W78N3122NF5&amp;pf_rd_t=101&amp;pf_rd_p=469942651&amp;pf_rd_i=507846"&gt;Kindle&lt;/a&gt; can be for consulting. You can carry at least 500 reference books on it (and who needs more than 490 anyhow?). It is also very light and easy to read.&lt;br /&gt;&lt;br /&gt;I do have a couple of qualms. It is a page oriented display (no scrolling), no touchscreen and has no spatial indexing e.g., &lt;em&gt;the top side of the right page half way in&lt;/em&gt;, but other than that it's a win.&lt;br /&gt;&lt;br /&gt;An important note on utility: &lt;a href="http://oreilly.com/ebooks/"&gt;O'Reilly&lt;/a&gt; e books can be read on the Kindle. The truly great thing about O'Reilly's e-books is that you get both the Kindle compatible mobiPocket files and the more aesthetically pleasing PDF files (for me,  aesthetics matter--even in a SQL guide). &lt;br /&gt;&lt;br /&gt;You can mix and match reading and reference between the formats depending upon your preference. Thankfully the files aren't copy protected. Thanks,  O'Reilly, this is a very nice touch.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-8772317947446322965?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/8772317947446322965/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=8772317947446322965' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8772317947446322965'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8772317947446322965'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2009/03/kindle.html' title='Kindle'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-1803918442617895088</id><published>2009-02-17T08:56:00.001-08:00</published><updated>2009-02-17T08:56:59.404-08:00</updated><title type='text'>Seambay modifications to access Seam Annotations </title><content type='html'>This post extends my last one about &lt;a href="http://rdfsg.blogspot.com/2009/02/seam-from-command-line.html"&gt;accessing Seam from the command line&lt;/a&gt;. Here I describe the transition from using &lt;code&gt;EntityManager&lt;/code&gt; to using &lt;code&gt;EntityHome&lt;/code&gt;.&lt;br /&gt;&lt;br /&gt;The first thing I did was to create a new folder for the webSevice sources, which meant that I had to add this directory into the build.xml file and add all of the libraries into the compile path in NetBeans (both of which are obvious, but both of which I always forget to do).&lt;br /&gt;&lt;br /&gt;The next was to make my action work similarly to an .xhtml page and interact with a  &lt;em&gt;home&lt;/em&gt; object rather than directly with the EntityManager&lt;br /&gt;going from:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;      &lt;br /&gt;if (fileData == null) {&lt;br /&gt;            fileData = new FileData();&lt;br /&gt;            &lt;em&gt;// various actions on fileData&lt;/em&gt;&lt;br /&gt;            entityManager.persist(fileData);&lt;br /&gt;        }&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;to&lt;br /&gt;&lt;pre&gt;&lt;code&gt;       &lt;br /&gt;if (fileData == null) {&lt;br /&gt;            fileDataHome.create();&lt;br /&gt;            fileDataHome.persist(); //side effect of creating the defined instance&lt;br /&gt;            fileData = fileDataHome.getDefinedInstance();&lt;br /&gt;            &lt;em&gt;// various actions on fileData&lt;/em&gt;&lt;br /&gt;            fileDataHome.update();&lt;br /&gt;        }&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;which also required adding these lines to components.xml&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;code&gt;xmlns:transaction=”http://jboss.com/products/seam/transaction”   &lt;br /&gt;....    &lt;br /&gt;&amp;lt;transaction:ejb-transaction/&amp;gt;  &lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;None of which was particularly difficult and I was up and running in a hour or so.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-1803918442617895088?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/1803918442617895088/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=1803918442617895088' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/1803918442617895088'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/1803918442617895088'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2009/02/seambay-modifications-to-access-seam.html' title='Seambay modifications to access Seam Annotations '/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-8844746419218783778</id><published>2009-02-02T17:36:00.001-08:00</published><updated>2009-02-02T17:36:41.806-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='seam'/><category scheme='http://www.blogger.com/atom/ns#' term='jboss'/><title type='text'>Seam From a Command Line</title><content type='html'>I recently wanted to access some seam derived functionality from a command line java program (something that I could run via cron). I ran into a few minor problems and thought I'd share their solutions.&lt;br /&gt;&lt;br /&gt;The first problem was that seam annotations like &lt;code&gt;@Logger&lt;/code&gt; won't work. I guess it isn’t that surprising, but the jboss seam annotations are unavailable to a command line program (at least not easily) since the portions of the framework that enables these annotations are designed to operate within a server.&lt;br /&gt;&lt;br /&gt;This was disappointing. The &lt;code&gt;@Logger&lt;/code&gt; annotation is really useful, but I couldn't come up with a way to get it going.&lt;br /&gt;&lt;br /&gt;This pushed me into wanting to use web services as much as possible to take advantage of other annotations that I had built into my system, e.g., the ability to automatically stamp an object with &lt;em&gt;time modified&lt;/em&gt; and &lt;em&gt;time created&lt;/em&gt; to support &lt;a href="http://rdfsg.blogspot.com/2008/07/temporal-data.html"&gt;temporal data operations&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;I did not find the seam documentation about accessing seam web services particularly clear (especially when using netbeans) so I turned to the &lt;a href="http://www.netbeans.org/kb/55/websvc-jax-ws.html"&gt;netbeans tutorial&lt;/a&gt; and was quickly up and running with the seambay example. &lt;br /&gt;&lt;br /&gt;Note:&lt;br /&gt;&lt;UL&gt;The WSDL for the seambay example is found at (assuming your server is local &lt;strong&gt;and&lt;/strong&gt; you have deployed the seambay example) http://localhost:8080/jboss-seam-bay-jboss-seam-bay/AuctionService?wsdl&lt;br /&gt;&lt;br /&gt;An overview of all services at the host (again, assuming your server is local) appears at&lt;br /&gt;http://localhost:8080/jbossws/services.&lt;/UL&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-8844746419218783778?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/8844746419218783778/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=8844746419218783778' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8844746419218783778'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8844746419218783778'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2009/02/seam-from-command-line.html' title='Seam From a Command Line'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-8861242408980788293</id><published>2009-01-05T18:10:00.001-08:00</published><updated>2009-01-05T18:10:19.448-08:00</updated><title type='text'>Data Deduplication</title><content type='html'>IEEE Computer recently published a short &lt;a href="http://www.computer.org/portal/cms_docs_computer/computer/homepage/Dec08/r12intr.pdf"&gt;survey&lt;/a&gt; on data deduplication. Conceptually &lt;a href="http://en.wikipedia.org/wiki/Deduplication"&gt;deduplication&lt;/a&gt; is isomorphic to the &lt;a href="http://rdfsg.blogspot.com/2008/12/name-that-data.html"&gt;named data &lt;/a&gt; approach I posted about a few weeks ago.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt; The vendors discussed include &lt;br /&gt;&lt;UL&gt;&lt;br /&gt;&lt;LI&gt;&lt;a href="http://www.emc.com/products/detail/software/avamar.htm"&gt;EMC&lt;/a&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;&lt;a href="http://www.exagrid.com/"&gt;ExaGrid&lt;/a&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;&lt;a href="http://www.falconstor.com/en/pages/index.cfm?pn=Deduplication&amp;bhcp=1"&gt;FalconStor&lt;/a&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;&lt;a href="http://www.necam.com/HYDRAstor/HYDRAstorWorks.cfm"&gt;NEC DataRedux&lt;/a&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;&lt;a href="http://www.quantum.com/Solutions/datadeduplication/Index.aspx"&gt;Quantum DXi&lt;/a&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;&lt;a href="http://www.sepaton.com/news/news_item.php?news_id=124"&gt;Sepaton DeltaStor&lt;/a&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;&lt;a href="http://www.symantec.com/business/resources/articles/article.jsp?aid=power_of_disk"&gt;Symantec&lt;/a&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;br /&gt;&lt;div&gt;Since I'm not sure how long the pdf of the article will be available, I'm posting this as a follow up to my previous post.&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div&gt; My preference would be to see these capabilities built right into the internet/operating systems rather than separate utility servers.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-8861242408980788293?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/8861242408980788293/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=8861242408980788293' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8861242408980788293'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8861242408980788293'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2009/01/data-deduplication.html' title='Data Deduplication'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-6726992157996770995</id><published>2008-12-23T18:20:00.001-08:00</published><updated>2008-12-23T18:20:07.415-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><title type='text'>Discussing Architecture</title><content type='html'>If you're like me, you are forever grappling with finding the right format for discussing architecture with end users.&lt;br /&gt;&lt;br /&gt;There are the standard architectural diagrams, but they really don't capture: &lt;UL&gt;&lt;LI&gt;How the end users relate to the system&lt;/LI&gt;&lt;LI&gt; How all of the components are strung together to support the overall business processes&lt;/LI&gt; &lt;LI&gt;What artifacts (data, documents, etc.) are produced&lt;/LI&gt;  &lt;/UL&gt;&lt;br /&gt;&lt;br /&gt;I think there may finally be a "better mousetrap" available in  &lt;a href="http://www.archimate.org"&gt;ArchiMate&lt;/a&gt; which has a suitably small number of patterns/best practices that allow you to capture what's important in a large system at the appropriate level of detail. The most important aspect of ArchiMate is that it is a mixed mode modeling language with a distinctive, simple shape for each type of artifact.&lt;br /&gt;&lt;br /&gt;ArchiMate allows for &lt;a href="http://www.archimate.org/en/about_archimate/"&gt;seven different&lt;/a&gt; &lt;em&gt;types of things&lt;/em&gt;: Services, Processes, Organization, Products, Information, Infrastructure, Applications, and Functions.&lt;br /&gt;&lt;br /&gt;As of yet I am not completely clear on the breakdown of these categories and to be honest I'm not sure that it really matters. ArchiMate is a communication tool. If it helps you communicate, it has served its purpose. If your way of breaking down the architecture is slightly different from their recommendations (which I &lt;strong&gt;strongly&lt;/strong&gt; feel could use more examples) it may have no more impact than if your class structure for a domain is slightly different from someone else's. Don't get me wrong: I'm a big fan of standards when they have been properly vetted and heavily used, but until they hit that point it is important to be flexible in using them. Flexibility allows a standard to grow and cover a sufficient portion of the domain; otherwise it will wither from lack of use.&lt;br /&gt;&lt;br /&gt;The presentation format that speaks to me the most is  the layered diagram as shown on p 11(figure 12) of the &lt;a href="http://www.via-nova-architectura.org/magazine/magazine/enterprise-architecture-development-and-mode.html"&gt;Enterprise Architecture Development and Modelling&lt;/a&gt; paper. See below:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://lh3.ggpht.com/_uhpaSaKsmiM/SU7dEo460oI/AAAAAAAAAFQ/vm-L-kxaPHE/2007%20Lankhorst_P11.jpg"&gt;&lt;img src="http://lh4.ggpht.com/_uhpaSaKsmiM/SU7cn6X2hcI/AAAAAAAAAFE/6seiuSeyL8o/example_archimate_small.jpg?imgmax=800" alt="example_archimate_small.jpg" border="0" width="212" height="300" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here's my simplified take for a clinical trial system used by physicians and patients.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://lh4.ggpht.com/_uhpaSaKsmiM/SU7czaMgmrI/AAAAAAAAAFM/ej-OnUOwF9Q/archimate.jpg"&gt;&lt;img src="http://lh5.ggpht.com/_uhpaSaKsmiM/SU7ctobUpyI/AAAAAAAAAFI/h7g9fd_QpaI/clinical_archimate_small.jpg?imgmax=800" alt="clinical_archimate_small.jpg" border="0" width="227" height="300" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;What I like about this format is that all of the elements on a particular layer are at the same level of abstraction and are easily placed in relationship to the other few things that are at that same abstraction level. Simultaneously, you can see what supports (and is supported by) a particular component. Each perspective (which appears on the same diagram) only requires concentrating on a small number things at a time and can easily be held in your short term memory.&lt;br /&gt;&lt;br /&gt;One of the key things about the layers is that they distinguish externally available from internally consumed data interfaces -- especially highlighting those that cross abstraction boundaries. These external interfaces are distinguished from those that support multiple applications at the same level in the stack. Such "external" interfaces (which are internal to a particular level of abstraction) are more easily altered since they are more tightly coupled organizationally. This nicely foregrounds the implication that care should be taken in designing the external, level crossing APIs and lifecycles since changes to them will be harder to coordinate given the diversity of the interested parties.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-6726992157996770995?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/6726992157996770995/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=6726992157996770995' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/6726992157996770995'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/6726992157996770995'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/12/discussing-architecture.html' title='Discussing Architecture'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/_uhpaSaKsmiM/SU7cn6X2hcI/AAAAAAAAAFE/6seiuSeyL8o/s72-c/example_archimate_small.jpg?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-8867007854969451661</id><published>2008-12-08T17:28:00.001-08:00</published><updated>2008-12-08T17:28:23.554-08:00</updated><title type='text'>Name that data</title><content type='html'>There is an interesting trend that I feel has the potential to fundamentally shift the way we think about how data is used in networking and applications.&lt;br /&gt;&lt;br /&gt;It involves attaching unique names to data, i.e., referring to the data by its SHA1 hash value. Unique identifiers allow a number of things. For example, they allow you to retrieve the data from the network without worrying where it resides on the network, as described in &lt;br /&gt;&lt;a href="http://video.google.com/videoplay?docid=-6972678839686672840&amp;hl=en"&gt;A New Way to look at Networking&lt;/a&gt; in which &lt;a href="http://en.wikipedia.org/wiki/Van_Jacobson"&gt;Van Jacobson&lt;/a&gt; discusses breaking out the content of the page from the page itself. &lt;br /&gt;&lt;p&gt;At 42:47 in the talk there's the section on &lt;br /&gt;&lt;em&gt;Dissemination networking&lt;/em&gt; &lt;blockquote&gt;in which &lt;br /&gt;data is requested, by name using any and all means available (IP, VON tunnels, zeroconf addresses, multicast, proxies, etc),&lt;br /&gt;Anything that hears the request and has a valid copy of the data can respond. The returned data is signed and optionally secured, so its integrity and association with the name name can be validated.&lt;/blockquote&gt;&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;Rather than getting all of the data from a particular location e.g., as specified by a URL, we get the identifier for the content of the data and then let the network supply the data from the location "nearest" (using the appropriate distance metric) to its use.&lt;br /&gt;&lt;br /&gt;A similar idea exists in &lt;a href="http://en.wikipedia.org/wiki/ZFS"&gt;ZFS'&lt;/a&gt; implementation of a copy-on-write transactional model:&lt;br /&gt;&lt;blockquote&gt;ZFS uses a copy-on-write transactional object model. All block pointers within the filesystem contain a 256-bit checksum of the target block which is verified when the block is read. Blocks containing active data are never overwritten in place; instead, a new block is allocated, modified data is written to it, and then any metadata blocks referencing it are similarly read, reallocated, and written.&lt;/blockquote&gt;&lt;br /&gt;This design allows small changes in large files to be reflected by changing &lt;em&gt;only the blocks that have been altered&lt;/em&gt; rather than by rewriting the whole file to a new location. This simplifies backup procedures, reduces R/W bandwidth requirements, etc. Part of what's significant here is that we're dealing with abstractions of the data, e.g., checksums rather than the data itself. If we use cryptographic hash functions rather than checksums the ideas become isomorphic aka, &lt;em&gt;get me this block from wherever it is&lt;/em&gt;.&lt;br /&gt;&lt;br /&gt;This has a number of potentially interesting applications depending on the granularity of the "named data". As a simple example, answering the obvious: "am I working with the copy of the file that was emailed to me last Friday, or an older version?" Even with coarse grain naming it would be possible to create mashups of music already on a users computer -- just transmitting offsets into and segment durations of existing content gets past &lt;a href="http://en.wikipedia.org/wiki/Digital_rights_management"&gt;DRM&lt;/a&gt; issues entirely (morally if not practically-- since much DRM protected media is encoded).&lt;br /&gt;&lt;br /&gt;Uniquely naming the content is not just about networking, nor is it just about data but is really about the cross product of the two: the data and its location/retrieval. Data location and retrieval is most of what is involved in computing: unless data is being actively processed by a computational unit (in this case, I'm talking about the integer and floating point units on the chip) the rest of "computation" is about the retrieval and storage of data e.g., do I put this in a frame buffer, in the cloud or in a shredder?&lt;br /&gt;&lt;br /&gt;Imagine an ecosystem of data that represents everything that you own/care about -- you could partition this data into multiple overlapping categories based on any number of attributes such as:&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;&lt;strong&gt;&lt;em&gt;temporary:&lt;/em&gt;&lt;/strong&gt;&lt;em&gt;&lt;/em&gt; make no copies&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;&lt;strong&gt;&lt;em&gt;pieces of a larger whole:&lt;/em&gt;&lt;/strong&gt;&lt;em&gt;&lt;/em&gt; applications would minimize the number of named datasets that they change when they perform updates e.g., editing out that first minute from your video won't change every block of the video.&lt;br /&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;&lt;b&gt;&lt;i&gt;number of independently survivable copies required&lt;/i&gt;&lt;/b&gt;, with what longevity?&lt;br /&gt;&lt;/LI&gt;&lt;LI&gt;&lt;b&gt;&lt;i&gt;coupling the data to geographic position:&lt;/i&gt;&lt;/b&gt; How closely should the data follow me as I move around the planet? Can it stay where I put it, or should it go where I go and be available for high bandwidth processing.&lt;/LI&gt;&lt;/UL&gt;&lt;br /&gt;&lt;br /&gt;This are just my first pass ideas. This concept of divorcing the data off from its location and higher level structural organization opens up the potential for a whole new set of applications which provide enhanced user functionality by pushing a lot of these data management, replication and caching issues deep into the infrastructure.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-8867007854969451661?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/8867007854969451661/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=8867007854969451661' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8867007854969451661'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8867007854969451661'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/12/name-that-data.html' title='Name that data'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-5447325946630154298</id><published>2008-11-24T17:33:00.001-08:00</published><updated>2008-11-24T17:33:35.223-08:00</updated><title type='text'>Semantic Interoperability: from Mashups to Inference</title><content type='html'>My &lt;a href="http://rdfsg.blogspot.com/2008/11/semantic-interoperability-adverse.html"&gt;last post&lt;/a&gt; looked at semantic interoperability from the standpoint of the &lt;a href="http://bridgmodel.org/"&gt;CDISC BRIDG model&lt;/a&gt;. Thinking back on it, I found that writing it left me with more questions than I had before I started.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I have to admit to being unclear as to what is meant by “semantic interoperability,” since I have heard it used in a number of different ways depending upon the audience. (apparently I’m not the only one: the &lt;a href="http://en.wikipedia.org/wiki/Semantic_interoperability"&gt;wikipedia entry on semantic interoperability&lt;/a&gt; has the caveat &lt;b&gt;&lt;i&gt;All or part of this article may be confusing or unclear&lt;/i&gt;&lt;/b&gt;.) &lt;br /&gt;&lt;br /&gt;"Semantic interoperability" puts requirements on the data, on the models, and on the processes of using them. How we respond to those requirements implies different interpretations of what it means to be &lt;em&gt;semantically interoperable.&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;I think that there are three basic ways of using data that are "semantically interoperable". &lt;br /&gt;&lt;UL&gt;&lt;LI&gt;“hands-off” data integration between designated well curated systems--this is the way in which I think it is used most often.&lt;/LI&gt; &lt;br /&gt;&lt;LI&gt;“hands-off” data integration between any systems sharing common identifiers e.g., publish an interface and allow anyone to use it.&lt;/LI&gt;&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;The dual use of integration with any published interface that provides the data that you're looking for - which I think is less common e.g., I'll use any map, or any book information service rather than Google/Amazon (or Yahoo/BN). &lt;em&gt;I haven't seen this that often and I think it sounds a bit sketchy.&lt;/em&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;br /&gt;&lt;LI&gt;Using OWL reasoners etc. for inference across systems to generate new information.&lt;/LI&gt;&lt;/UL&gt;&lt;br /&gt;The requirements around these things are pretty different, both in data quality and in the congruence of the requisite component models.&lt;br /&gt;&lt;br /&gt;The first &lt;em&gt;“hands-off” data integration between any systems sharing common identifiers&lt;/em&gt; doesn't really require any similarity of models other than around the key integration point(s). You need the &lt;em&gt;&lt;strong&gt;name of the referent&lt;/strong&gt;&lt;/em&gt; of the data, the &lt;em&gt;&lt;strong&gt;name of the data item&lt;/strong&gt;&lt;/em&gt; and the &lt;em&gt;&lt;strong&gt;format of the returned data&lt;/strong&gt;&lt;/em&gt; e.g., &lt;em&gt;"The first president of the United States"&lt;/em&gt;; &lt;em&gt;"date of birth"&lt;/em&gt; returned in &lt;em&gt;ISO 8601&lt;/em&gt; format. Of course the more points that you want to make referenceable between the systems, the more the models have to match e.g., your model of US presidents has to contain a way of dereferencing the person and that person's date of birth. The more independently developed and maintained your systems are, the more quickly you want to start using RDF to give you very stable identifiers for your referents. &lt;br /&gt;&lt;br /&gt;If the systems are required to do some curation/analysis of the data, the exported models need to match more closely so that you can derive the correct metrics to perform the analysis and understand the relationships between individual data points. A good example of this comes from &lt;a href="http://blogs.msdn.com/nickmalik/archive/2007/10/19/soa-in-the-coordination-model.aspx"&gt;Nick Malik&lt;/a&gt; who points out &lt;blockquote&gt;So, if you look in a database and you see a purchase order... has it been approved or not?  The answer depends on the business unit that created it.&lt;/blockquote&gt;&lt;br /&gt;Your models can be in a number of different forms (UML, OWL, etc.) and be wildly divergent from the underlying reality, but if the delusion is shared you can achieve some synergy. &lt;br /&gt;&lt;br /&gt;Inference of course requires (at least) a locally full up OWL ontology since that's the only modelling language that permits inference. Models also have to more closely resemble the shared "current best understanding" of reality (which is of course a moving target in a scientific domain) or the resulting inferences will be worthless, or at best amusing.&lt;br /&gt;&lt;br /&gt;However, doing an ontology is a big deal (see &lt;a href="http://bioontology.org/wiki/images/f/f0/OntologyJoy.ppt"&gt;The Joy of Ontology&lt;/a&gt; by Suzanna Lewis for a discussion). The increment of commitment that we're making here is decidedly non-trivial, especially if the domain that we are trying to model is of substantial size. &lt;br /&gt;&lt;br /&gt;I think the clinical trial domain is a good example of &lt;strong&gt;substantial size&lt;/strong&gt;. BRIDG took a long time to do, it is still undergoing revision and does not allow inference. I would argue that  given the continued refinement of some of the base terms (&lt;a href="https://cabig.nci.nih.gov/workspaces/VCDE/Data_Standards/"&gt;sex and gender were recently updated&lt;/a&gt;), even if there was an ontology, hands-off inference is not something that lies in the near future, simply because the ground doesn't provide a sufficiently firm foundation. &lt;br /&gt;&lt;br /&gt;Just for clarity -- this doesn't mean that turning loose an inferencing bot over a sufficiently sized test set would not yield interesting and perhaps even transformative results. It just means that the inferencing would be part of a web research project rather than a production operation.&lt;br /&gt;&lt;br /&gt;I could be wrong on this (and gladly so), but I did live through AI winter an I can no longer utter the phrase "sufficiently smart compiler" without irony.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-5447325946630154298?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/5447325946630154298/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=5447325946630154298' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/5447325946630154298'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/5447325946630154298'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/11/semantic-interoperability-from-mashups.html' title='Semantic Interoperability: from Mashups to Inference'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-1475837687485740497</id><published>2008-11-11T10:28:00.001-08:00</published><updated>2008-11-11T10:28:31.036-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data integration'/><category scheme='http://www.blogger.com/atom/ns#' term='CDISC'/><title type='text'>Semantic Interoperability: Adverse Events</title><content type='html'>When reviewing the  &lt;em&gt;Bridg Release 2.0 Static Elements report.RTF&lt;/em&gt; I did look in a bit of detail at the adverse event model.&lt;br /&gt;&lt;br /&gt;Here's a summary:&lt;br /&gt;&lt;br /&gt;The &lt;code&gt;AdverseEvent&lt;/code&gt;&lt;br /&gt;class is decribed as having the following connections and attributes&lt;br /&gt;&lt;br /&gt;Connections &lt;br /&gt;&lt;UL&gt;&lt;br /&gt;&lt;LI&gt;Association link from class &lt;code&gt;PerformedProductInvestigation&lt;/code&gt;&lt;br /&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Association link from class &lt;code&gt;Subject&lt;/code&gt;&lt;br /&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Association link to class &lt;code&gt;AEOutcomeAssessmentRelationship&lt;/code&gt;&lt;br /&gt;&lt;/LI&gt;&lt;LI&gt;Association link to class &lt;code&gt;AECausalityAssessmentRelationship&lt;/code&gt;&lt;br /&gt;&lt;/LI&gt;Association link to class &lt;code&gt;AEActionTakenRelationship&lt;/code&gt;&lt;br /&gt;&lt;LI&gt;Generalization link to class &lt;code&gt;PerformedObservationResult&lt;/code&gt;&lt;br /&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Association link from class &lt;code&gt;PerformedProductInvestigation&lt;/code&gt;&lt;br /&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Generalization of &lt;code&gt;PerformedActivity&lt;/code&gt; adding an &lt;code&gt;evaluationMethodCode&lt;/code&gt; attribute; &lt;code&gt;PerformedActivity&lt;/code&gt; captures the duration of the activity&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Association link from class &lt;code&gt;Subject&lt;/code&gt; -- the clinical subject (An entity of interest, either biological or otherwise.)&lt;br /&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Association link to class &lt;code&gt;AEOutcomeAssessmentRelationship&lt;/code&gt;&lt;br /&gt;links the AE to an observation&lt;br /&gt;&lt;div&gt;For example, recovered/resolved, recovering/resolving, not recovered/not resolved, recovered/resolved with &lt;br /&gt;sequelae, fatal or unknown&lt;/div&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Association link to class &lt;code&gt;AECausalityAssessmentRelationship&lt;/code&gt;&lt;br /&gt;links the ae to an observation.&lt;div&gt; For example, when an adverse event occurs, a physician may evaluate interventions that may have caused the &lt;br /&gt;adverse event.&lt;/div&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Association link to class &lt;code&gt;AEActionTakenRelationship&lt;/code&gt;.&lt;br /&gt;Specifies the link between an adverse event and the steps performed to address it.&lt;br /&gt;&lt;div&gt;For example, study dose reduced, protocol treatment change, etc.&lt;br /&gt;&lt;/div&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;Generalization link to class &lt;code&gt;PerformedObservationResult&lt;/code&gt;&lt;br /&gt;links all observations/protocol deviations etc together with a report.&lt;/LI&gt;&lt;br /&gt;&lt;/UL&gt;&lt;br /&gt;The AE itself has the attributes:&lt;br /&gt;&lt;UL&gt;&lt;LI&gt; &lt;code&gt;gradeCode&lt;/code&gt; &lt;br /&gt;&lt;/LI&gt; &lt;br /&gt;&lt;LI&gt;&lt;code&gt;severityCode&lt;/code&gt; &lt;br /&gt;&lt;/LI&gt;&lt;LI&gt; &lt;code&gt;seriousnessCode&lt;/code&gt; &lt;br /&gt;&lt;/LI&gt; &lt;br /&gt;&lt;LI&gt;&lt;code&gt;occurrencePatternCode&lt;/code&gt; &lt;br /&gt;&lt;/LI&gt;&lt;LI&gt; &lt;code&gt;unexpectedReasonCode&lt;/code&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt; &lt;code&gt;expectedIndicator&lt;/code&gt;&lt;/LI&gt; &lt;br /&gt;&lt;LI&gt; &lt;code&gt;highlightedIndicator&lt;/code&gt; &lt;br /&gt;&lt;/LI&gt;&lt;LI&gt; &lt;code&gt;hospitalizationRequiredIndicator&lt;/code&gt; &lt;/LI&gt;&lt;br /&gt;&lt;LI&gt; &lt;code&gt;onsetDate&lt;/code&gt; &lt;br /&gt;&lt;/LI&gt;&lt;LI&gt; &lt;code&gt;resolutionDate&lt;/code&gt; &lt;/LI&gt;&lt;br /&gt;&lt;/UL&gt;&lt;br /&gt;The end result is a structure that has a formal relationship that is well thought out, should cover all situations and permits systems to interoperate. &lt;br /&gt;&lt;br /&gt;In my mind, this is not the same as assuring &lt;strong&gt;semantic interoperabillity&lt;/strong&gt;. For semantic interoperability to really occur, the grade codes must be comparable across sites, hospitalization criteria must be identical (or at least commensurable) etc.. Achieving this comparability requires continuing education and harmonization efforts, constant feedback of metrics to practitioners etc.. It therefore represents a much higher bar.&lt;br /&gt;&lt;br /&gt;This is not in any way a criticism of BRIDG. You need something like BRIDG -- a  well vetted industry standard -- to be even be able to begin such an attempt. However, true semantic interoperability involves not only the structure of the data, but the data itself.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-1475837687485740497?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/1475837687485740497/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=1475837687485740497' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/1475837687485740497'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/1475837687485740497'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/11/semantic-interoperability-adverse.html' title='Semantic Interoperability: Adverse Events'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-4179442749587789714</id><published>2008-10-27T18:52:00.001-07:00</published><updated>2008-10-27T18:52:02.425-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='data integration'/><category scheme='http://www.blogger.com/atom/ns#' term='CDISC'/><title type='text'>CDISC/BRIDG</title><content type='html'>Last month I attended the Boston Area CDISC Users group meeting (BACUN). All of the presentations were interesting and useful. However, I found that the one by Lisa Chatterjee on BRIDG stood out as particularly informative.&lt;br /&gt;&lt;br /&gt;The  &lt;a href="http://bridgmodel.org/"&gt;BRIDG&lt;/a&gt; Domain Analysis Model is a representation of protocol-driven biomedical/clinical research.&lt;br /&gt;&lt;br /&gt;One of the goals of the effort is Semantic Interoperability - I don't think that this means that "following the model" guarantees semantic interoperability, but rather that BRIDG constitutes a starting point from which a semantically commensurable system can be built. The bridge team appears to view the model as a foundation for other more problem specific representations (CDISC/HL7 etc.). The idea being that if you can map BRIDG &lt;-&gt; HL7 and BRIDG &lt;-&gt; CSDISC the HL7 &lt;-&gt; CDISC mapping is (relatively) straightforward.&lt;br /&gt;&lt;br /&gt;There is no question that BRIDG represents an excellent starting place for using data in an interoperable fashion.&lt;br /&gt;All in all it shows a &lt;strong&gt;&lt;em&gt;very inclusive approach&lt;/em&gt;&lt;/strong&gt;&lt;br /&gt; -- and a surprising openness to modifying the model to ameliorate difficulties encountered in use.&lt;br /&gt;&lt;br /&gt;The core modeling language is UML and spreadsheets are used to track much of the mapping (there already is a draft version of a spreadsheet that maps the BRIDG R2.0 model to RIM2.18).&lt;br /&gt;&lt;br /&gt;From the presentation:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://lh5.ggpht.com/rdf541/SP4Ggce1c2I/AAAAAAAAAE8/HsLh1IPT0PI/Use_of_bridge.png"&gt;&lt;img src="http://lh4.ggpht.com/rdf541/SP4IzPMhWTI/AAAAAAAAAFA/djExRoKkg4M/bridg_slide_sm.jpg?imgmax=800" alt="bridg_slide_sm.jpg" border="0" width="300" height="222" /&gt;&lt;br /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I have to admit that I haven't examined the model in complete detail. However, from what I've seen almost everything that you need for the target domain is there and the level of abstraction feels right: low enough to be relatively easy to implement, but high enough so that you don't get wedged into a corner from the get-go.&lt;br /&gt;&lt;br /&gt;I did look in a bit of detail at the adverse event model,which is represented in the &lt;em&gt;Bridg Release 2.0 Static Elements report.RTF&lt;/em&gt;.&lt;br /&gt;&lt;br /&gt;What we see in &lt;b&gt;&lt;i&gt;Figure 5 : View 4 - Adverse Event&lt;/i&gt;&lt;/b&gt; is ~ 90% of the complete domain model with a number of classes added in support of recording adverse events and tracking their eventual analysis/resolution.&lt;br /&gt;&lt;br /&gt;Since adverse events raise some issues about semantic interoperabiltity that I want to talk about in detail I will cover them in my next post.&lt;br /&gt;&lt;br /&gt;BTW on my mac, the only application that could open the .rtf file with the figures was &lt;a href="http://www.openoffice.org/"&gt;open office 3&lt;/a&gt;.  MS word 2004 elided the figures.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-4179442749587789714?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/4179442749587789714/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=4179442749587789714' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/4179442749587789714'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/4179442749587789714'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/10/cdiscbridg.html' title='CDISC/BRIDG'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/rdf541/SP4IzPMhWTI/AAAAAAAAAFA/djExRoKkg4M/s72-c/bridg_slide_sm.jpg?imgmax=800' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-8358273264837594624</id><published>2008-10-13T20:24:00.001-07:00</published><updated>2008-10-13T20:24:19.592-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='seam'/><title type='text'>Seam: the ftl advantage</title><content type='html'>I finished the first pass schema for the &lt;a href="http://rdfsg.blogspot.com/2008/02/extensible-system-for-discovery-data.html"&gt;flexible drug discovery framework&lt;/a&gt; and pointed the most recent (&lt;a href="https://sourceforge.net/project/showfiles.php?group_id=22866&amp;package_id=163777&amp;release_id=602455"&gt;2.02&lt;/a&gt;) GA release of &lt;a href="http://www.jboss.com/products/seam"&gt;Seam&lt;/a&gt; at the database. &lt;br /&gt;&lt;br /&gt;My hope was that it would produce pages with the latest &lt;a href="http://rdfsg.blogspot.com/2008/07/richfacesdatatablecolumn-sortby.html"&gt;ajax-friendly table sorting headers&lt;/a&gt; -- sadly it didn't. In addition, this version retains the practice of generating pages that use labels and column headers wired to a particular language (English) rather than allowing the headers to be language dependent messages, even though the pages fully support localization (via language specific message_x.properties files in the resources directory).&lt;br /&gt;&lt;br /&gt;I was in the process of fixing these issues manually via emacs using &lt;a href="http://xahlee.org/emacs/find_replace_inter.html"&gt;these instructions&lt;/a&gt; (I find that emacs makes it easier to select the files that I want to edit -- Netbeans picks up too many files and deselecting 70 or so "extra" files is too much work). While making these changes, I was discussing a new system with one of my clients and recommended that they consider an open source solution since they could change it (or hire somebody to change it) if at some point they encountered something that they didn't like. I thought this better than the alternative of waiting for a potentially unresponsive vendor to pay attention to their problem.&lt;br /&gt;&lt;br /&gt;As was saying this, I thought to myself: "Seam is open source, maybe I should try to fix the code rather than edit the result." I'm usually hesitant to go down this path. Sometimes it works, but my &lt;em&gt;a priori &lt;/em&gt;estimate is that the process of understanding the code structure, getting the build environment running etc. costs a day (or more) before anything productive comes out the other end.&lt;br /&gt;&lt;br /&gt;However, as a proof of principle,  I thought I should give it a try and see how it went.&lt;br /&gt;&lt;br /&gt;I'm very happy to report that given a combination of good architecture and good tool selection on the part of the Seam team, making these modifications was almost trivial. &lt;br /&gt;&lt;br /&gt;Let me explain why, and encourage you to "do this at home."&lt;br /&gt;The core reason is that the &lt;a href="http://docs.jboss.org/tools/2.0.0.GA/seam/en/html/generate_entities.html"&gt;seam generate-entities&lt;/a&gt; command operates in a number of phases, one of which generates the .xml and .xhtml files using &lt;a href="http://freemarker.org/"&gt;freemarker template&lt;/a&gt; files (&lt;code&gt;.ftl&lt;/code&gt;). This allows the required changes to be made without either looking at the java involved or developing any understanding of the calling structure, etc.&lt;br /&gt;&lt;br /&gt;The &lt;code&gt;.ftl&lt;/code&gt; files (in &lt;code&gt;./seam-gen/view/&lt;/code&gt;) are pretty much self-documenting (which is handy, given the level of documentation included in the files) and very easy to change -- errors thrown by the freemarker engine are clear and easy to work with.&lt;br /&gt;&lt;br /&gt;A couple of (minor) caveats &lt;br /&gt;&lt;UL&gt;&lt;LI&gt;The &lt;code&gt;.java&lt;/code&gt; files in the &lt;code&gt;src/action&lt;/code&gt; will need to be removed between runs of &lt;code&gt;seam generate-entities.&lt;/code&gt;&lt;br /&gt;&lt;/LI&gt;&lt;LI&gt;I think it is a good idea to point the seam generator at a different target directory (specified by &lt;code&gt;./seam-gen/build.properties&lt;/code&gt; -- "&lt;code&gt;workspace.home&lt;/code&gt;") while debugging so that you don't accidentally overwrite edits that you've already made (oddly I thought of this BEFORE I ran the generator)&lt;/LI&gt;&lt;br /&gt;&lt;/UL&gt;&lt;br /&gt;&lt;br /&gt;In summary, if you find yourself making a lot of changes to seam generated files, change the .ftl files instead; it can be &lt;b&gt;&lt;i&gt;much, much easier.&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-8358273264837594624?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/8358273264837594624/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=8358273264837594624' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8358273264837594624'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8358273264837594624'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/10/seam-ftl-advantage.html' title='Seam: the ftl advantage'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-1615780743855561545</id><published>2008-09-29T07:21:00.001-07:00</published><updated>2008-09-29T07:21:56.996-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><title type='text'>Suddenly mysql won't start</title><content type='html'>&lt;em&gt;Suddenly&lt;/em&gt; is a bit of an exaggeration. This happened on my desktop -- I use mysql on my laptop almost daily, but only once every few months my desktop. They are similar environments: latest Mac OSX 10.5 patched intel based macs. The mysql on the desktop was transferred from my previous PowerPc based desktop, but has been used a few times since the transfer.&lt;br /&gt;&lt;br /&gt;In any case, the normal startup action&lt;br /&gt;&lt;code&gt; sudo /usr/local/mysql/bin/mysqld_safe&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;failed with the following output.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;code&gt;Starting mysqld daemon with databases from /usr/local/mysql/data&lt;br /&gt;&lt;br /&gt;/usr/local/mysql/bin/mysqld_safe: line 395: /usr/local/var/: Is a directory&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;/usr/local/mysql/bin/mysqld_safe: line 401: /usr/local/var/: Is a directory&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;STOPPING server from pid file /usr/local/mysql/data/rdf-8-Tower.local.pid&lt;br /&gt;&lt;br /&gt;tee: /usr/local/var/: Is a directory&lt;br /&gt;&lt;br /&gt;080910 15:39:05  mysqld ended&lt;br /&gt;&lt;br /&gt;tee: /usr/local/var/: Is a directory&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;At this point I said to myself "well this hasn't been upgraded in a while, I should upgrade mysql" &lt;strong&gt;!!bad idea!!&lt;/strong&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The upgrade didn't solve the problem. After tracing through the script in more detail I did a &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;code&gt;sudo ./my_print_defaults&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt; &lt;br /&gt;which reminded me that &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;code&gt;Default options are read from the following files in the given order:&lt;br /&gt;&lt;br /&gt;/etc/my.cnf /usr/local/mysql/etc/my.cnf ~/.my.cnf&lt;/code&gt; &lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;/etc/my.cnf contained&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;code&gt;[mysqld]&lt;br /&gt;&lt;br /&gt;log = /usr/local/var/mysqlLOG.log&lt;br /&gt;&lt;br /&gt;#If no specific storage engine/table type is defined in an SQL-Create statement the default type will be used.&lt;br /&gt;&lt;br /&gt;default-storage-engine=myisam&lt;br /&gt;&lt;br /&gt;max_allowed_packet = 16M&lt;br /&gt;&lt;br /&gt;#Enter a name for the error log file. Otherwise a default name will be used.&lt;br /&gt;&lt;br /&gt;log-error=/usr/local/var/&lt;br /&gt;&lt;br /&gt;#Enter a name for the slow query log. Otherwise a default name will be used.&lt;br /&gt;&lt;br /&gt;log-slow-queries=/usr/local/var/&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/code&gt;&lt;/blockquote&gt;&lt;br /&gt;Which I changed to &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;code&gt;[mysqld]&lt;br /&gt;&lt;br /&gt;log = /usr/local/var/mysqlLOG.log&lt;br /&gt;&lt;br /&gt;#If no specific storage engine/table type is defined in an SQL-Create statement the default type will be used.&lt;br /&gt;&lt;br /&gt;default-storage-engine=myisam&lt;br /&gt;&lt;br /&gt;max_allowed_packet = 16M&lt;br /&gt;&lt;br /&gt;#Enter a name for the error log file. Otherwise a default name will be used.&lt;br /&gt;&lt;br /&gt;log-error=/usr/local/mysql_ERROR_LOG.log&lt;br /&gt;&lt;br /&gt;#Enter a name for the slow query log. Otherwise a default name will be used.&lt;br /&gt;&lt;br /&gt;log-slow-queries=/usr/local/var/mysql_SLOW_l_LOG.log&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/code&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;code&gt; /usr/local/mysql/etc/my.cnf&lt;br /&gt;&lt;/code&gt;didn't exist, nor did&lt;br /&gt;&lt;code&gt;~/.my.cnf&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;This allowed the db to start but I couldn't log in with any of the user accounts normally available (including root). It appears that using the &lt;code&gt;mysql-5.0.67-osx10.5-x86.dmg&lt;/code&gt; file to update caused the db user information to get hosed.&lt;/p&gt;&lt;br /&gt;&lt;p&gt;&lt;br /&gt;Fortunately I'm pretty neurotic about backups and so I easily recovered just by copying /usr/local/* from my last backup. &lt;br /&gt;&lt;br/&gt;&lt;br /&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; this backup was to a separate disk performed using &lt;a href="http://www.shirt-pocket.com/SuperDuper/SuperDuperDescription.html"&gt;SuperDuper&lt;/a&gt; -- &lt;a href="http://www.apple.com/macosx/features/timemachine.html"&gt;TimeMachine&lt;/a&gt; isn't going to get files in /usr (which helps explain why the disk space used by TimeMachine is smaller than I expected). &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-1615780743855561545?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/1615780743855561545/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=1615780743855561545' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/1615780743855561545'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/1615780743855561545'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/09/suddenly-mysql-won-start.html' title='Suddenly mysql won&amp;#39;t start'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-2948085440037304221</id><published>2008-09-15T17:41:00.001-07:00</published><updated>2008-09-15T17:45:49.419-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='seam'/><category scheme='http://www.blogger.com/atom/ns#' term='Amazon Web Services'/><category scheme='http://www.blogger.com/atom/ns#' term='jboss'/><title type='text'>Seam on Amazon EC2</title><content type='html'>I just completed putting up a demo of my seam work on &lt;a href="http://www.amazon.com/gp/browse.html?node=3435361"&gt;Amazon Web Services &lt;/a&gt; EC2 service. I primarily did this to ground my advocacy of EC2 as a good option for small biotechs that may need occasional bursts of compute power but have neither the cash to buy adequate servers for peak compute load nor the staff to maintain them.&lt;br /&gt;&lt;br /&gt;I thought that putting up my jboss/seam/mysql demo would also be sufficiently non-trivial to give me a good feel of what it is like.&lt;br /&gt;&lt;br /&gt;There are similarities between EC2 and other virtualization options (EC2&lt;a href="http://www.virtualization.info/2006/08/amazon-launches-xen-powered-virtual.html"&gt; is based on XEN&lt;/a&gt; after all).&lt;br /&gt;&lt;br /&gt;The core differences in my mind revolve around having S3 as a backing store. Since S3 is on Amazon's servers you need to pay more attention to security keys etc.&lt;br /&gt;&lt;br /&gt;My recommendation is the go through the &lt;a href="http://docs.amazonwebservices.com/AWSEC2/2008-02-01/GettingStartedGuide/"&gt;Getting Started Guide&lt;/a&gt; -- even to the point of saving a modified image. This will assure that you have the proper accounts set up both on EC2 and S3 and you have a bucket set up on S3 for storing your image.&lt;br /&gt;&lt;br /&gt;I found it easiest to create a bucket using the &lt;a href="http://aws.krugle.com/kse/projects/gMHDggx#2"&gt;Python&lt;/a&gt; examples -- even though I have never done much more than a "hello world" program in python (yes one that consists of &lt;br /&gt;&lt;code&gt;#!/usr/bin/env python&lt;br /&gt;print "Hello World"&lt;/code&gt;). The python code is the most self contained and required the fewest downloads of ancillary libraries in my environment (Apple OSX 10.5.4)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Building and saving an image takes a while -- I would recommend doing a reboot of your virtual machine to make sure that all of the changes take hold (boot processes start as designed etc.).&lt;br /&gt;&lt;br /&gt;Not to overstate the obvious, but the image that you start with has a tremendous impact on the time it takes you to get up and running. I settled upon an &lt;a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1523&amp;categoryID=101"&gt;image&lt;/a&gt; that already had mysql5, jboss 4.2.2 installed, and it made things much easier. In general I didn't feel that the images were particularly well documented. Not being that familiar with Fedora I thought that the difference bettween the Fedora-core-4 and fedora-core-8 was how many CPU cores they were optimized for, not the revision number. My initial foray with the fedora-core-4 image stopped when I realized that it had mysql4 rather than mysql 5.&lt;br /&gt;&lt;br /&gt;All in all the experience wasn't too bad (I don't think that I could ever call one of these experiences "good" -- if it were good I would just be able to click a "do it" button that would do exactly what I wanted), and would have been much better if I hadn't "lost my keys".&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-2948085440037304221?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/2948085440037304221/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=2948085440037304221' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/2948085440037304221'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/2948085440037304221'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/09/seam-on-amazon-ec2.html' title='Seam on Amazon EC2'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-3341570637991809339</id><published>2008-08-29T06:37:00.001-07:00</published><updated>2008-08-29T06:54:18.149-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software'/><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><category scheme='http://www.blogger.com/atom/ns#' term='data integration'/><title type='text'>TOGAF and Evaluating Architectures</title><content type='html'>The old adage is that you always have an enterprise architecture even if you never designed one -- the point being to encourage an organization to spend the time to design one. This is all well and good, but given an ongoing enterprise, what's the best way to determine what enterprise architecture you have, where you want it to go and most importantly, how to get there.&lt;br /&gt;&lt;br /&gt;For various reason I've started looking at this issue again and have just refamiliarized myself with &lt;a href="http://www.opengroup.org/togaf/"&gt;TOGAF&lt;/a&gt; (The Open Group's Architecture Framework). I had forgotten how much I liked it: it is pragmatic, highly tailorable and focused on open cross-organizational solutions. I'm not going to do a detailed analysis of TOGAF vs. other frameworks -- it's not really my interest as I'm definitely in &lt;a href="http://en.wikipedia.org/wiki/Satisficing"&gt;satisficing&lt;/a&gt; mode here. What follows is my (long) elevator pitch of the TOGAF take home message.&lt;br /&gt;&lt;br /&gt;The top level graphic of the TOGAF process captures the flavor pretty well&lt;br /&gt;&lt;br /&gt;&lt;a href="http://lh3.ggpht.com/rdf541/SJxf55DXheI/AAAAAAAAAD0/EU6UDrSVdvc/togaf_dev_cycle.jpg"&gt;&lt;img src="http://lh5.ggpht.com/rdf541/SLf-k7Hq5HI/AAAAAAAAAEU/pK8F01Nlv-8/blog__togaf_dev_cycle.jpg?imgmax=800" alt="blog__togaf_dev_cycle.jpg" border="0" width="270" height="300" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The preliminary phase is key but very easy to overlook. TOGAF suggests that this phase consists of defining the overall objectives and scope&lt;br /&gt;&lt;br /&gt;&lt;UL&gt;&lt;br /&gt;&lt;LI&gt;Define Objectives&lt;br /&gt;&lt;UL&gt;&lt;LI&gt;Assure that everyone who will be involved in or beneﬁt from this approach is committed to the success of the architectural process&lt;/LI&gt; &lt;br /&gt;&lt;LI&gt;Define the architecture principles that will inform the constraints on any architecture work &lt;br /&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt; Define the ‘‘architecture footprint’’ for the organization — the people responsible for performing architecture work, where they are located, and their responsibilities &lt;/LI&gt;&lt;br /&gt;&lt;/UL&gt;&lt;br /&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt; &lt;br /&gt;Define Scope and Assumptions&lt;br /&gt;&lt;UL&gt;&lt;br /&gt;&lt;LI&gt;The business units that are involved&lt;br /&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;The level of detail to be deﬁned &lt;br /&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;The speciﬁc architecture domains to be covered (Business, Data, Applications, Technology) &lt;br /&gt;&lt;/LI&gt;&lt;br /&gt;&lt;LI&gt;The time horizon that should be addressed by the architecture. &lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;br /&gt;&lt;/UL&gt;&lt;br /&gt;&lt;br /&gt;What I find attractive about this whole approach is its focus on getting getting buy in from the key players in the organization, defining their roles and developing a shared set of expectations around what's going to be done as part of the architecture effort. The preliminary steps give the team some initial criteria for driving the architectural vision, but then TOGAF immediately requires them to ground it as supporting the needs of the business users. I find this grounding critical; often the business thinks that architecture efforts are worthless and often they are right because the architectural model hasn't been grounded in the business process. &lt;b&gt;&lt;i&gt;Note: in this case "business process" and "user's scientific process" are equivalent.&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The key to the success of an architecture effort is to address current pain points as they will be reflected in the business processes that will be in place when the architecture rolls out. Sorry if the tense of the last sentence was a bit torqued. What I mean is that the architecture needs to hit a mark to support business operations as they will be in the future, not as they are now, and that some of the problems that are being experienced now will only be exacerbated by these planned changes.&lt;br /&gt;&lt;br /&gt;The dialog with the Business Unit leader sounds something like &lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;We're planning to do more collaborations in the future, but with our current collaborations we have a terrible time registering new users and tracking responses to our questions about the data. However, if we put an architecture in place which uses our new authentication mechanism that supports &lt;a href="http://openid.net/"&gt;OpenId&lt;/a&gt; it will radically simplify the process of adding new users. &lt;br /&gt;&lt;p&gt;&lt;br /&gt;In addition, if we use vendor X's implementation of the &lt;a href="http://www.cdisc.org/publications/Benefits_lifeSciencesIndustryArchitecture.pdf"&gt;Life Sciences Industry Architecture&lt;/a&gt;, queries will be automatically tracked. &lt;p&gt;&lt;br /&gt;Our ability to handle more collaborations on the back end is increased as our new system allows us to share extra capacity across multiple business units, thereby sharing the cost of reserve capacity to meet any unanticipated surges in demand. &lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Such a "political/operational" model for rolling out an architectural analysis implies that everyone who contributes to the effort should get something out of it (this is a goal, but the closer you can come to meeting the goal, the more self-organizing the system becomes).&lt;br /&gt;&lt;br /&gt;As one proceeds around the TOGAF loop, you pick and choose what makes sense given the decisions made previously (which of course you are always free to revisit) analyzed to the level of depth that is appropriate.&lt;br /&gt;&lt;br /&gt;Think of TOGAF as providing a (partial) checklist of processes to use and things to consider that help you reach the end state of &lt;br /&gt;&lt;a href="http://www.opengroup.org/"&gt;Boundaryless information flow (tm)&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://lh6.ggpht.com/rdf541/SJxqkaYoFFI/AAAAAAAAAD4/Sgm0acobIQg/Brokerage_applications.jpg"&gt;&lt;img src="http://lh4.ggpht.com/rdf541/SLf_VqDaIaI/AAAAAAAAAEY/x6o1NkDU32U/blog__Brokerage_applications.jpg?imgmax=800" alt="blog__Brokerage_applications.jpg" border="0" width="300" height="182" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I think of it as being similar in spirit to the way the Software Engineering Institute's &lt;a href="http://www.sei.cmu.edu/publications/documents/93.reports/93.tr.006.html"&gt;Risk Management Taxonomy&lt;/a&gt; provides a comprehensive checklist of things to consider when undertaking a project -- it keeps you from forgetting something that would be obvious in retrospect. &lt;br /&gt;&lt;br /&gt;An aside:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;My favorite quote from one of their &lt;a href="http://www.sei.cmu.edu/risk/index.html"&gt;pages&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Another company is developing a flight control system. During system integration testing the flight control system becomes unstable because processing of the control function is not quick enough during a specific maneuver sequence.&lt;br /&gt;&lt;br /&gt;The instability of the system is not a risk since the event is a certainty - it is a problem.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-3341570637991809339?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/3341570637991809339/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=3341570637991809339' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/3341570637991809339'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/3341570637991809339'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/08/togaf-and-evaluating-architectures.html' title='TOGAF and Evaluating Architectures'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/rdf541/SLf-k7Hq5HI/AAAAAAAAAEU/pK8F01Nlv-8/s72-c/blog__togaf_dev_cycle.jpg?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-5093993942492073349</id><published>2008-08-14T16:11:00.001-07:00</published><updated>2008-08-14T16:11:43.326-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software'/><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><title type='text'>Optimization: Premature and otherwise</title><content type='html'>My post on &lt;a href="http://rdfsg.blogspot.com/2008/07/structuring-database-tables.html"&gt;structuring database tables&lt;/a&gt; made me think again about when (and how) to optimize one's code/design. The caveats about &lt;a href="http://en.wikiquote.org/wiki/Donald_Knuth"&gt;premature optimization&lt;/a&gt; are well known and well considered, as are the reasons for &lt;a href="http://www.acm.org/ubiquity/views/v7i24_fallacy.html"&gt;not following them slavishly.&lt;/a&gt; &lt;br /&gt;&lt;br /&gt;In my mind the core questions involve "what are we trying to optimize" and "who cares". &lt;br /&gt;&lt;br /&gt;My (obvious?) claim is that we should only spend time optimizing things that have impact upon the high level goals for the project. The reasons for performing an optimization should be articulated and evaluated in this framework. Driving optimizations by focusing on the top level goals sounds obvious but the tradeoffs are difficult to make in practice. &lt;br /&gt;&lt;br /&gt;There are legitimate tensions about the proper framework relevant to the analysis. That is, it is all well and good to speak of "strategic business goals," etc. but if the product is unusable, the long term strategy doesn't matter. A strategic focus simply assures that long term impacts are also evaluated  when an optimization is considered, e.g., an optimization that greatly increases execution efficiency may or many not be appropriate if it increases the complexity of product installation and set up.&lt;br /&gt;&lt;br /&gt;For example, the &lt;a href="http://rdfsg.blogspot.com/2008/02/extensible-system-for-discovery-data.html"&gt;"narrow table"&lt;/a&gt; approach that I've been advocating is designed to support deep changes in the business processes and science over the life the system, without necessitating deep changes in the data model.  The "strategic horizon" for such a project is &gt; 10 years (that is total system life of &gt; 10 years) with the expectation that the fundamental data model reflected in the &lt;em&gt;narrow tables&lt;/em&gt; will be relatively stable during that time. Even in a rapid prototyping environment with one or more iterations shipping each quarter, the essence of the data model should be fairly stable since fundamental changes to the data model induce data migration efforts which distract from product improvements.&lt;br /&gt;&lt;br /&gt;The question is: are there inefficiencies caused by the narrow table approach that will make it overly difficult to achieve a usable product in the short term. My current intuition is to go with the narrow table approach and let either &lt;a href="http://en.wikipedia.org/wiki/Materialized_view"&gt;materialized views&lt;/a&gt; (which in my knowledge are most easily obtained in Oracle), special database jobs, or a &lt;a href="http://wiki.tangosol.com/display/COH/Oracle+Coherence+Knowledge+Base+Home"&gt;data grid&lt;/a&gt; provide the optimizations&lt;br /&gt;&lt;br /&gt;That said, one of my personal rules of optimization (based on some &lt;a href="http://portal.acm.org/citation.cfm?id=242621&amp;dl=GUIDE&amp;coll=GUIDE&amp;CFID=81417610&amp;CFTOKEN=49243683"&gt;experience&lt;/a&gt;) is that your intuitions are almost always wrong  -- if something is taking too long it is usually worthwhile to benchmark it even if you "know the cause of the problem" (unless the putative fix takes substantially less time than doing the benchmark). The corollary being that if you're worried about something taking too long, and think that you "should be OK" set up a test suite at scale to validate your intuitions going forward, so that you can monitor performance as development proceeds.&lt;br /&gt;&lt;br /&gt;This principle holds even if the bottleneck is development time: you need to determine if the problem is with the language, the developer, the user/developer interaction, or the developer's manager (e.g., providing more interrupt driven activity than the developer can handle)? As always, measurement is the key.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-5093993942492073349?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/5093993942492073349/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=5093993942492073349' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/5093993942492073349'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/5093993942492073349'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/08/optimization-premature-and-otherwise.html' title='Optimization: Premature and otherwise'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-6797878584856861473</id><published>2008-07-30T09:42:00.001-07:00</published><updated>2008-07-30T09:42:47.437-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='richfaces'/><category scheme='http://www.blogger.com/atom/ns#' term='seam'/><category scheme='http://www.blogger.com/atom/ns#' term='debugging'/><category scheme='http://www.blogger.com/atom/ns#' term='applications'/><category scheme='http://www.blogger.com/atom/ns#' term='jboss'/><title type='text'>richfaces:dataTable:column sortBy</title><content type='html'>&lt;strong&gt;&lt;em&gt;sortBy&lt;/em&gt;&lt;/strong&gt; is a new &lt;code&gt;richfaces&lt;/code&gt; capability that I mentioned &lt;a href="http://rdfsg.blogspot.com/2008/06/de-novo-project-retrospective.html"&gt;previously&lt;/a&gt;. I've started to incorporate it into my system and what follows are some tips/notes/frustrations.&lt;br /&gt;&lt;br /&gt;If the sorting glyphs appear, but nothing happens, the dataTable needs to be surrounded by &lt;code&gt;&lt; h: form&gt; &lt; / h:form&gt; &lt;/code&gt; -- sorting doesn't work outside of a &lt;code&gt;form&lt;/code&gt; context.&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh5.ggpht.com/rdf541/SICaVJtI-MI/AAAAAAAAADQ/qLa7h_Ym2vw/SortingGlyph.jpg?imgmax=800" alt="SortingGlyph.jpg" border="0" width="183" height="135" /&gt;&lt;/div&gt;&lt;br /&gt;If the glyphs do not appear &lt;code&gt;sortBy&lt;/code&gt; may not be being given a valid attribute for sorting. This may be the result of a simple typo e.g., &lt;code&gt;sortBy="{competition.id}"&lt;/code&gt; rather than &lt;code&gt;sortBy="&lt;strong&gt;&lt;font color="red"&gt;#&lt;/font&gt;&lt;/strong&gt;{competition.id}"&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;I had some very strange behavior in netbeans/seamonkey with this facility. For example, if I clicked on the sort glyph for id&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh4.ggpht.com/rdf541/SICbyFqP0nI/AAAAAAAAADU/kOXLnC9Kp0o/glyph.jpg?imgmax=800" alt="glyph.jpg" border="0" width="83" height="84" /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;I got&lt;br /&gt;&lt;div style="text-align:center;"&gt;&lt;img src="http://lh5.ggpht.com/rdf541/SICciPIUbXI/AAAAAAAAADY/lUXrg3IJU0c/Sorted_header.jpg?imgmax=800" alt="Sorted_header.jpg" border="0" width="202" height="83" /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;YES the header of the column changed to "&lt;strong&gt;comp_ID form ss&lt;/strong&gt;" -- even though "&lt;strong&gt;comp_ID form ss&lt;/strong&gt;"  no longer appeared in the file. &lt;br /&gt;&lt;br /&gt;It used to be there, but I had removed it and performed a "build" in NetBeans (rather than a "clean and build"). I'm not sure about the underlying cause of this, but in my mind the effect straddles the boundary between disconcerting and amusing. The bottom line is that if things start acting strangely do a "clean and build".&lt;br /&gt;&lt;br /&gt;All in  all &lt;strong&gt;&lt;em&gt;sortBy&lt;/em&gt;&lt;/strong&gt; is a real step forward: it works in ajax tabPanels and simplifies the xhtml code a great deal. Despite &lt;a href="https://jira.jboss.org/jira/browse/RF-2915"&gt;this warning&lt;/a&gt;, sorting has been working for me as expected.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-6797878584856861473?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/6797878584856861473/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=6797878584856861473' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/6797878584856861473'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/6797878584856861473'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/07/richfacesdatatablecolumn-sortby.html' title='richfaces:dataTable:column sortBy'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/rdf541/SICaVJtI-MI/AAAAAAAAADQ/qLa7h_Ym2vw/s72-c/SortingGlyph.jpg?imgmax=800' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-4287614056755801372</id><published>2008-07-16T04:51:00.001-07:00</published><updated>2008-07-16T04:51:55.901-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software'/><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><category scheme='http://www.blogger.com/atom/ns#' term='database'/><title type='text'>Structuring Database Tables</title><content type='html'>Continuing with the  &lt;em&gt;&lt;strong&gt;what should the storage actually look like&lt;/strong&gt; &lt;/em&gt;line of my &lt;a href="http://rdfsg.blogspot.com/2008/07/temporal-data.html"&gt;last post&lt;/a&gt;, I just read a paper entitled &lt;a href="http://www.vldb.org/conf/2001/P149.pdf"&gt;&lt;strong&gt;Storage and Querying of E-Commerce Data&lt;/strong&gt;&lt;/a&gt; by &lt;em&gt;Agrawal, Somani, and Xu&lt;/em&gt;. This paper discusses the trade offs between storing data in "wide" (up to 1K column) tables versus storing the data in sets of narrow (vertical) tables. The data under consideration consists of sparsely populated data sets (lots of null values), varying definitions of "sparse" are used to generate the results. &lt;br /&gt;&lt;br /&gt;Their measurements clearly support the overall performance advantage of the narrow table approach despite the complexity of reassembling the data into a "wide" form, when (and if) it is required. Their work has been &lt;a href="http://citeseerx.ist.psu.edu/showciting?cid=1033878&amp;start=10"&gt;picked up a bit&lt;/a&gt; by the &lt;a href="http://en.wikipedia.org/wiki/Column-oriented_DBMS"&gt;column database&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Semantic_Web"&gt;semantic web&lt;/a&gt; crowds, but not to the extent that one would expect. I think that this reflects the fact that the datasets were fairly small (1000 cols, 20k rows).&lt;br /&gt;&lt;br /&gt;A couple of notes about the results -- they achieved improved performance in the vertical representation despite the fact that they represented everything as an &lt;code&gt;object, key, value&lt;/code&gt; triple, and built a translation layer to shield the user from the vertical representation. The triple store was built on &lt;a href="http://www-306.ibm.com/software/data/db2/"&gt;DB2&lt;/a&gt; &lt;br /&gt;&lt;br /&gt;Although these findings are both intriguing and encouraging (from the standpoint of wanting to break information up into its &lt;a href="http://rdfsg.blogspot.com/2008/02/extensible-system-for-discovery-data.html"&gt;primal entities&lt;/a&gt;), I wonder about the scaling behavior of a system structured in this way as it radically increases the number of rows per table. After all, all systems have limits (e.g. &lt;a href="http://dev.mysql.com/doc/refman/5.0/en/limits.html"&gt;MySql&lt;/a&gt;, &lt;a href="http://www.postgresql.org/about/"&gt;Postgres&lt;/a&gt;, &lt;a href="http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/limits003.htm"&gt;Oracle&lt;/a&gt;, &lt;a href="http://msdn.microsoft.com/en-us/library/ms143432.aspx"&gt;SQLServer&lt;/a&gt;), and more importantly they have optimal operating regions, aka "sweet spots". A few years ago when  the largest table in one of my systems reached 100 millions (wide) rows, running even simple queries against that table was painful (which I'm sure could have been alleviated with some clever tuning -- but it was neither the tallest pole in the tent, nor the squeakiest wheel on the cart).&lt;br /&gt;&lt;br /&gt;My concern has to do with the risks of getting outside of the "sweet spot" of the systems upon which FDD is being constructed -- I remember back when one of my former employers switched database vendors (a non-trivial project to say the least). With vendor A, we were the customer with the largest DB in their installed base (aka outside the sweet spot). With vendor B, we were a "moderately large" installation, but certainly not in the top 100 (aka inside the sweet spot). The number of bugs which we encountered in Vendor B's database were substantially fewer. I assume that this was because the bugs had already been stumbled upon by the bleeding edge users and fixed by the vendor by the time we would have encountered them.&lt;br /&gt;&lt;br /&gt;That to me is the prime reason to try to stay sweet spot. If you don't you're the one finding the new bugs and either fixing them, paying for them to be fixed, or hoping for the vendor to fix them (and developing a deep understanding of where you are on the vendor's list of priority customers).&lt;br /&gt;&lt;br /&gt;More on this in my next post.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-4287614056755801372?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/4287614056755801372/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=4287614056755801372' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/4287614056755801372'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/4287614056755801372'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/07/structuring-database-tables.html' title='Structuring Database Tables'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-821040223573503881</id><published>2008-07-06T18:06:00.001-07:00</published><updated>2008-07-08T09:41:24.205-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='discovery informatics'/><category scheme='http://www.blogger.com/atom/ns#' term='scientific'/><category scheme='http://www.blogger.com/atom/ns#' term='applications'/><category scheme='http://www.blogger.com/atom/ns#' term='database'/><category scheme='http://www.blogger.com/atom/ns#' term='temporal'/><title type='text'>Temporal Data</title><content type='html'>As part of fleshing out the design for the Flexible Drug Discovery (FDD) platform, I'm deciding upon the level of support for temporal data. The simplest decision, based upon the existing "webtwo" infrastructure, would be to have an &lt;em&gt;insert_time&lt;/em&gt; and &lt;em&gt;update_time&lt;/em&gt; attribute for each item. I think that these times are the bare minimum required to understand/debug system operation.&lt;br /&gt;&lt;br /&gt;However, in the clinical domain I've become accustomed to thinking about the data in terms of &lt;em&gt;&lt;strong&gt;what did we know when?&lt;/strong&gt;&lt;/em&gt; so that it is possible to reconstruct &lt;em&gt;&lt;strong&gt;the understanding of a trial at a given point in time&lt;/strong&gt;&lt;/em&gt;. This obviously requires much more extensive tracking.&lt;br /&gt;&lt;br /&gt;I recently came across an excellent book on the topic: &lt;a href="http://www.amazon.com/Temporal-Relational-Kaufmann-Management-Systems/dp/1558608559/ref=pd_bbs_sr_1?ie=UTF8&amp;s=books&amp;qid=1215378261&amp;sr=8-1"&gt; Temporal Data and the Relational Model&lt;/a&gt; by &lt;em&gt;Date, Darwin and Lorentzos&lt;/em&gt;. It presents a detailed analysis of the issues involved in working with temporal information using a refreshingly simple example consisting of a few tables of data about parts and their suppliers.&lt;br /&gt;&lt;br /&gt;Many systems use &lt;code&gt;begin&lt;/code&gt; and &lt;code&gt;end&lt;/code&gt; dates for each row to track when the data has changed, supporting the type of use most relevant to clinical/scientific analysis. However, this technique does not support some interesting situations in the business domain. For example, p 166 of the book shows that given an item with the attributes: &lt;code&gt;name, status, city&lt;/code&gt; answering simple questions such as  "how long has a supplier been at that address", or "how long has a supplier had that name" requires a begin/end date for each attribute. Thinking through the implications of this issue results in refactoring the model into &lt;em&gt;irreducible&lt;/em&gt; components (aka &lt;strong&gt;&lt;em&gt;sixth normal form&lt;/em&gt;&lt;/strong&gt;), as described on p 173.&lt;br /&gt;&lt;br /&gt;As implied by the term &lt;strong&gt;&lt;em&gt;sixth normal form&lt;/em&gt;&lt;/strong&gt;, using the temporal behavior of the data as a design axis can have extensive implications  e.g., &lt;UL&gt;&lt;LI&gt;splitting quantities out in a LIMS system&lt;/LI&gt; &lt;LI&gt;splitting out names (especially last names!) in a system that tracks employees, etc.. &lt;/LI&gt;&lt;br /&gt;&lt;/UL&gt;&lt;br /&gt;This implies that it is important to consider the temporal behavior of the data even if a temporal model is not planned for the system as it helps drive scenarios for evaluating the system's response to expected changes e.g., "high flux" items may require optimized interfaces, surface special reporting requirements etc..&lt;br /&gt;&lt;br /&gt;Other noteworthy topics in the book include: &lt;strong&gt;&lt;em&gt;merging intervals&lt;/em&gt;&lt;/strong&gt; e.g., the two facts that attribute &lt;strong&gt;A&lt;/strong&gt; had value 3 from &lt;code&gt;t1-t3&lt;/code&gt; and has value 3 from &lt;code&gt;t3-now&lt;/code&gt; should be merged into a single fact.&lt;br /&gt;&lt;br /&gt;There is also a discussion of the &lt;code&gt;time-from/time-to&lt;/code&gt; &lt;em&gt;in the persistent store&lt;/em&gt;, vs the &lt;code&gt;time-from/time-to&lt;/code&gt; &lt;em&gt;in the world&lt;/em&gt;, which although important in developing systems requirements, doesn't appear to require analysis different in character from what is conventionally performed.  My view is that &lt;em&gt;world&lt;/em&gt; and &lt;em&gt;storage&lt;/em&gt; times are disjoint. In scientific systems there is rarely a reason to worry about world times -- other than referencing the date upon which an operation was performed.&lt;br /&gt;&lt;br /&gt;Again, an interesting read, highly recommended (despite their frequent exhortations on how to read the book e.g., "are definitely meant to be read in sequence as written (p51)" or "note carefully" (carefully is used an inordinately large number of times in the &lt;a href="http://books.google.com/books?id=grTubz0fjSEC&amp;q=carefully#search"&gt;text&lt;/a&gt;)).&lt;br /&gt;&lt;br /&gt;As storage becomes cheaper, the downside of not having a temporal capability will more frequently exceed its implementation cost.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-821040223573503881?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/821040223573503881/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=821040223573503881' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/821040223573503881'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/821040223573503881'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/07/temporal-data.html' title='Temporal Data'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-1539863881409796139</id><published>2008-06-16T09:04:00.001-07:00</published><updated>2008-06-27T20:59:15.322-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software'/><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><category scheme='http://www.blogger.com/atom/ns#' term='seam'/><category scheme='http://www.blogger.com/atom/ns#' term='jboss'/><title type='text'>de novo project retrospective</title><content type='html'>I’m doing the finishing touches on my “webtwo” application and thought I'd post a quick &lt;a href="http://www.developer.com/design/article.php/3637441"&gt;&lt;em&gt;post mortem&lt;/em&gt;&lt;/a&gt; on the project.&lt;br /&gt;&lt;br /&gt;The core set of functionality was straightforward:&lt;br /&gt;&lt;br /&gt;&lt;h4&gt;Components/Capabilities&lt;/h4&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Users with system defined logins (not relying on database logins) and system specific roles&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Competitions consisting of user submissions. Each submission can have multiple submission items consisting of text or images.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Everything can be tagged and commented upon.&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Tags can be reused and applied to any object in the system.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h4&gt;Here are my observations&lt;/h4&gt;&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Ratings&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;My initial design had ratings as a separate class/table. I decided to push ratings down into both &lt;strong&gt;tag&lt;/strong&gt; and &lt;strong&gt;user_text&lt;/strong&gt; (the generic data type that holds comments) since, upon further consideration, I thought that the semantics of the term “rating” was different in each case. If the rating is attached to a comment, both the rating and the comment refer to some other item, and the comment explains the rating. On the other hand, if the rating refers to a tag, it reflects how strongly the item reflects the tag term.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Persistence&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;Even though tagging and commenting (adding user text) are distinct functionalities, much of the persistence operation is the same. I therefore packed persistence into a single class which persists tags and text in distinct methods. At some point I may factor this into three classes: one for the common persistence core and two for the specific tagging commenting functionality; but this single class is adequate for the current purpose.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Images&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;As mentioned &lt;a href="http://rdfsg.blogspot.com/2008/03/porting-ruby-on-rails-application-to.html"&gt;previously&lt;/a&gt;, my initial design allowed fairly large scale image uploads with dynamic resizing for page display. I have since switched to a more conventional architecture in which standard image sizes are generated and stored upon upload. This is a big performance win, and also reduces storage requirements.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Seam&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;Seam is a platform that is evolving in an encouraging way -- when I wanted to add sorting to a table displayed within a ajax tab -- I could not see how to do it but upon further investigation it appears to be supported in an &lt;a href="http://livedemo.exadel.com/richfaces-demo/richfaces/sortingFeature.jsf?c=sorting"&gt;upcoming release&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;e.g., in this view I can sort&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;img src="http://lh6.ggpht.com/rdf541/SFaIx_Mg5CI/AAAAAAAAADI/uM2_lNr2c0E/Sortable.jpg?imgmax=800" alt="Sortable.jpg" border="0" height="197" width="509" /&gt;&lt;/div&gt;&lt;br /&gt;while in this I cannot&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;img src="http://lh3.ggpht.com/rdf541/SFaJO9xp6-I/AAAAAAAAADM/VkvC6dLPGbc/NotSortable.jpg?imgmax=800" alt="NotSortable.jpg" border="0" height="350" width="451" /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Type X&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;For this system I had separate "type tables" for each type category e.g., &lt;strong&gt;user_text_type&lt;/strong&gt; holds the allowable types of &lt;strong&gt;user_text&lt;/strong&gt; such as &lt;em&gt;comment&lt;/em&gt;, &lt;em&gt;description&lt;/em&gt; etc. As I consider building larger systems with similar functionality, I plan to have an &lt;strong&gt;item_type&lt;/strong&gt; table which holds the categories for every table that requires typing. This table would share a similar pattern to that used for &lt;strong&gt;user_text&lt;/strong&gt; (comments, descriptions) and &lt;strong&gt;tags&lt;/strong&gt; which holds the &lt;em&gt;table_name&lt;/em&gt; and the &lt;em&gt;id&lt;/em&gt; of the referenced object. This will simplify understanding the data model/code and facilitate building curation tools.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;&amp;amp;Rest&lt;/em&gt;&lt;br /&gt;&lt;br /&gt;One additional note about the seam framework: it sometimes feels like an &lt;strong&gt;&lt;em&gt;&lt;a href="http://en.wikipedia.org/wiki/Representational_State_Transfer"&gt;&amp;amp;rest&lt;/a&gt;&lt;/em&gt;&lt;/strong&gt; interface, since it is easy to bookmark access to a particular object etc.. However, it does &lt;strong&gt;not&lt;/strong&gt; do so without &lt;a href="http://en.wikipedia.org/wiki/Representational_State_Transfer#REST.27s_central_principle:_resources"&gt;“seeing past”&lt;/a&gt; its own request: &lt;em&gt;parameters which are not specified for "pass through" in the .page.xml files are stripped out.&lt;/em&gt; It is easy enough to add them -- all that is required is to specify a&lt;code&gt; &lt; name=""&gt;parameterForPassthough"/&gt;&lt;br /&gt;&lt;/code&gt;in the .page.xml for each parameter that should be passed through. This annotation also needs to be in every intervening page.&lt;br /&gt;&lt;br /&gt;I'm not particularly bothered by this, I'm just pointing it out. On the good side it forces standard parameter naming so that each page's logic can operate upon standard variable names (assuming all of the necessary parameters have already been incorporated into the .page.xml files) and it prevents the urls from becoming unwieldy. The downside is that &lt;em&gt;all of the intervening pages&lt;/em&gt; do have to be modified if a new parameter is required. In balance it seems to be a reasonable decision.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Note: &lt;/strong&gt;You may ask "when would this apply?". One example: you want to comment upon an item and return to it when the comment is complete. In this case the &lt;em&gt;item/item_id&lt;/em&gt; would need to be passed thought the &lt;strong&gt;&lt;em&gt;&lt;code&gt;comment-editing-page&lt;/code&gt;&lt;/em&gt;&lt;/strong&gt; to the &lt;strong&gt;&lt;em&gt;&lt;code&gt;comment-verification/completion-page&lt;/code&gt;&lt;/em&gt;&lt;/strong&gt; so that the "&lt;em&gt;done/comment-complete&lt;/em&gt;" button can return to the right location.&lt;br /&gt;&lt;br /&gt;The only real regret that I have about the project is that I didn't know about the &lt;a href="http://developer.yahoo.com/ypatterns/"&gt;yahoo design patterns&lt;/a&gt; earlier.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-1539863881409796139?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/1539863881409796139/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=1539863881409796139' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/1539863881409796139'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/1539863881409796139'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/06/de-novo-project-retrospective.html' title='de novo project retrospective'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/rdf541/SFaIx_Mg5CI/AAAAAAAAADI/uM2_lNr2c0E/s72-c/Sortable.jpg?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-9036978744187547896</id><published>2008-06-13T14:40:00.001-07:00</published><updated>2008-06-13T14:40:14.180-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software'/><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><category scheme='http://www.blogger.com/atom/ns#' term='applications'/><title type='text'>Yahoo Design Patterns</title><content type='html'>When visiting Yahoo's site for &lt;a href="http://www.omnigroup.com/applications/OmniGraffle/"&gt;OmniGraffle&lt;/a&gt; &lt;a href="http://developer.yahoo.com/ypatterns/wireframes/"&gt;stencils&lt;/a&gt;, I found that Yahoo also provides a nice set of &lt;a href="http://developer.yahoo.com/ypatterns/"&gt;design patterns&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;These Web/Web2.0 patterns are clear, concise and have pointers to how they are used within Yahoo. &lt;em&gt;&lt;strong&gt;Note:&lt;/strong&gt;&lt;/em&gt; this doesn't constitute a strong endorsement on my part. My approach to patterns is to broadly survey existing patterns with the goal of settling on one, or worst case to modify/create one using the results of the survey.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;BTW&lt;/strong&gt; OmniGraffle is an amazing drawing program -- I've been using simple/mid-level drawing programs for a while and its interface is in a different league entirely. It does things that I haven't seen before, hadn't thought of previously, but are totally obvious to use and highly useful. If you're on a Mac you owe it to yourself to give it a spin.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-9036978744187547896?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/9036978744187547896/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=9036978744187547896' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/9036978744187547896'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/9036978744187547896'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/06/yahoo-design-patterns.html' title='Yahoo Design Patterns'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-4478987200746437689</id><published>2008-06-06T06:57:00.001-07:00</published><updated>2008-06-10T12:22:30.345-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='hibernate'/><category scheme='http://www.blogger.com/atom/ns#' term='mysql'/><category scheme='http://www.blogger.com/atom/ns#' term='software'/><category scheme='http://www.blogger.com/atom/ns#' term='debugging'/><title type='text'>debugging hibernate/mysql systems</title><content type='html'>Just a quick note, on debugging hibernate/mysql systems.&lt;br /&gt;&lt;br /&gt;In my ‘webtwo’ system I had the following constraint in the db:&lt;br /&gt;&lt;br /&gt;&lt;code&gt;        constraint fk_submission_item_image foreign key (image_id) references image_data(id) ON DELETE cascade,                                    &lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;p&gt;However, image_data didn’t exist.&lt;/p&gt;&lt;br /&gt;&lt;br /&gt;During operation it caused the error&lt;br /&gt;&lt;pre&gt;&lt;code&gt;&lt;br /&gt;18:34:56,009 WARN  [JDBCExceptionReporter] SQL Error: &lt;br /&gt;	1452, SQLState: 23000&lt;br /&gt;18:34:56,009 ERROR [JDBCExceptionReporter] Cannot add &lt;br /&gt;	or update a child row: a foreign key constraint &lt;br /&gt;	fails (`webtwo_dev/submission_item`, CONSTRAINT &lt;br /&gt;	`fk_submission_item_image` FOREIGN KEY (`image_id`) &lt;br /&gt;	REFERENCES `image_data` (`id`) ON DELETE CASCADE)&lt;br /&gt;18:34:56,009 ERROR [AbstractFlushingEventListener] &lt;br /&gt;	Could not synchronize database state with session&lt;br /&gt;	org.hibernate.exception.ConstraintViolationException: &lt;br /&gt;	Could not execute JDBC batch update&lt;br /&gt;        at org.hibernate.exception.SQLStateConverter.convert&lt;br /&gt;		(SQLStateConverter.java:71)&lt;br /&gt;        at org.hibernate.exception.JDBCExceptionHelper.convert&lt;br /&gt;		(JDBCExceptionHelper.java:43)&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;I thought this indicated a problem with my Hibernate/java mapping since I’ve been working with the database for a few months without any errors being thrown by MySQL during db creation. I also thought that I had successfully populated all the tables.&lt;br /&gt;&lt;br /&gt;In retrospect this wasn’t the case: I had not populated the tables and it is also not surprising that MySQL didn’t signal an error since I “SET FOREIGN_KEY_CHECKS = 0; “ during table creation  -- so, my bad. &lt;br /&gt;&lt;br /&gt;The moral is that any unpopulated table may be fundamentally misspecified. &lt;br /&gt;&lt;br /&gt;This experience does serve to reinforce my heuristic to populate all the tables via an initial data population script at least for testing purposes. Realistically, this requires two load scripts: one to populate the data required for operation (ie., roles, menu values, etc.), and a second to do a test load to verify database integrity.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-4478987200746437689?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/4478987200746437689/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=4478987200746437689' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/4478987200746437689'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/4478987200746437689'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/06/debugging-hibernatemysql-systems.html' title='debugging hibernate/mysql systems'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-780326475973000914</id><published>2008-05-29T06:24:00.000-07:00</published><updated>2008-12-09T01:37:46.762-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='jbpm'/><category scheme='http://www.blogger.com/atom/ns#' term='seam'/><category scheme='http://www.blogger.com/atom/ns#' term='hygienic software'/><title type='text'>Modularity &amp; Hygiene II</title><content type='html'>A similar, but distinct situation involves the environment expected by a module when it is activiated:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;What needs to be set up for the called function&lt;/li&gt;&lt;li&gt;What is expected to be unperturbed from a “pristine” environment -- and the definition of that pristine environment. &lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;This places restrictions upon use e.g., &lt;a href="http://rdfsg.blogspot.com/2008/04/jboss-seam-jbpm.html"&gt;the inability to initiate workflows within a page flow for seam&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;These restrictions result from an implicit dependency upon the configuration of the calling environment. Since the appropriate configuration is assured if the module was called as planned, there is little checking to verify that the environmental assumptions have been met. Once the nested calling paradigm has been adopted, design choices are biased towards implicit configurations that cannot be easily set without perturbing the calling environment, since B (and A for that matter) can still reference the external environment.&lt;br /&gt;&lt;br /&gt;I’ve found a useful way to think of this as being the difference between a linear and a nested calling environment: a “linear environment” would pass parameters in as a single object which contains required values while the nested approach sets up one or more global variables for access by the called functionality.&lt;br /&gt;&lt;br /&gt;Linear&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_uhpaSaKsmiM/SCshe9IlqQI/AAAAAAAAAB0/yu6LgJMSOfc/s1600-h/linear.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_uhpaSaKsmiM/SCshe9IlqQI/AAAAAAAAAB0/yu6LgJMSOfc/s320/linear.jpg" alt="" id="BLOGGER_PHOTO_ID_5200287010419747074" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Nested&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_uhpaSaKsmiM/SCshfNIlqRI/AAAAAAAAAB8/OqMO4-DUp3k/s1600-h/nested.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_uhpaSaKsmiM/SCshfNIlqRI/AAAAAAAAAB8/OqMO4-DUp3k/s320/nested.jpg" alt="" id="BLOGGER_PHOTO_ID_5200287014714714386" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;In the “linear” illustration nothing in the external environment is perturbed, side effects are minimal and any arguments could be copied, modified and passed onto the next module in a sanitary fashion (with the usual caveats around shared “stream-like” objects).&lt;br /&gt;&lt;br /&gt;Admittedly there are some times when the nested approach makes the most sense, usually for “stream like” variables, e.g. an initialization step to read in a configuration file, initialize connections etc.. The problem arises when there is no way to spawn a new configuration (or initialize one if you’re called in a different context). In lieu of such an idealized situation, it would be useful, at a minimum, to be able to detect that you’re being called in the wrong context. When it is difficult for a module to decide if is being called in the correct context (or if it elides the context check), it is hard, if not impossible to provide easy to use modules.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Note: I think that the rails trick of &lt;a href="http://rdfsg.blogspot.com/2007_04_01_archive.html"&gt;overloading the const-missing exception handler&lt;/a&gt; is arguably in this space. The magic underlying this functionality wasn’t easy (for me) to find and it surfaced the capability in such a way that it couldn’t be used for my purposes, since it had no introspection capability and only covered the case of exceptions generated during system operation. Note: discovering this also helped answer one aspect of the environment that I had previously found opaque: “why does my new code seem to be loaded in some cases but not in others.” The answer being that the new code (class definitions etc.) was retrieved if the class had not been previously loaded. If it had been previously loaded the const-missing exception would not be generated and the class would not be reloaded.&lt;br /&gt;&lt;br /&gt;At some level I have no problem with this implicit ‘environment’ structuring as long as there are ways to determine what environment you’re in and have the ability to spin up a new environment appropriately configured as necessary.&lt;br /&gt;&lt;br /&gt;It’s also important to distinguish environment set-up and configuration using well defined files/apis and this is a perfectly reasonable and necessary practice to set up such things as database access, service locations etc in configuration files that are managed by an “environmental processor”&lt;br /&gt;&lt;br /&gt;They are usually&lt;br /&gt;&lt;ul&gt;&lt;li&gt;One shots done at startup, or in response to a specific reconfiguration request.&lt;/li&gt;&lt;li&gt; accessed through a well defined API &lt;/li&gt;&lt;li&gt; ‘stream-like’ in nature&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Again these are issues that I’m experiencing as I spin up my application. I’m not claiming that their solution is either easy or practical, only that it is desirable.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-780326475973000914?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/780326475973000914/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=780326475973000914' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/780326475973000914'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/780326475973000914'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/05/modularity-hygiene-ii.html' title='Modularity &amp; Hygiene II'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_uhpaSaKsmiM/SCshe9IlqQI/AAAAAAAAAB0/yu6LgJMSOfc/s72-c/linear.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-2889047491555439327</id><published>2008-05-14T10:08:00.000-07:00</published><updated>2008-05-14T10:21:04.282-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='xml'/><category scheme='http://www.blogger.com/atom/ns#' term='seam'/><category scheme='http://www.blogger.com/atom/ns#' term='lisp macros'/><category scheme='http://www.blogger.com/atom/ns#' term='jsf'/><category scheme='http://www.blogger.com/atom/ns#' term='hygienic software'/><title type='text'>Modularity &amp; Hygiene</title><content type='html'>This post (and the one to follow) discuss the issues of hygiene in coding libraries.&lt;br /&gt;It is prompted by experiences that I’ve had lately in using (primarily xml/jsf) libraries and the pernicious errors that can be introduced by either mixing different libraries or not using them exactly as designed.&lt;br /&gt;&lt;br /&gt;A little background: the term hygienic comes from the &lt;a href="http://en.wikipedia.org/wiki/Hygienic_macro"&gt;Lisp community&lt;/a&gt;. The short form definition: hygienic code is code that doesn’t have undue interactions with its calling environment.&lt;br /&gt;A lack of hygiene manifests itself in a couple of ways:&lt;br /&gt;&lt;ol&gt;&lt;li&gt; Modules trip over each other: The “mix and match” promise of modularity, and widget sets is violated  &lt;/li&gt;&lt;ul&gt;&lt;li&gt;modules work at one point in the development cycle but break after the addition of an apparently unrelated piece of code&lt;/li&gt;&lt;li&gt;mixing components from different widget sets requires much care and tweaking, if it can be done at all.&lt;/li&gt;&lt;/ul&gt;&lt;li&gt; Modules have an implicit order in which they must be called; there is no well defined way to kick off either a new “top level process” or a child process that has its own (unshared) context.&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;Note: this turned into a pretty long post, so I will cover the second item in a subsequent post.&lt;br /&gt;&lt;br /&gt;I’m not claiming that hygienic code is always necessary or even that “hygienic code” == “good code.” Situations vary, and in some cases e.g., device drivers, OS kernels being hygienic may not be worth it. Also as I describe below, I don’t think that completely hygienic systems are currently possible partly due to the limitations of xml.&lt;br /&gt;&lt;br /&gt;However, in most situations, the more hygienic the better.&lt;br /&gt;&lt;br /&gt;And now on to the discussion.&lt;br /&gt;&lt;br /&gt;1 Modules trip over each other&lt;br /&gt;The first issue concerns the problems encountered when modules end up tripping over each other, aka incorporating functionality from one set of module breaks existing functionality. I have seen this mostly in the javascript space (see this post on &lt;a href="http://rdfsg.blogspot.com/2008/01/ror-seam-part-1.html"&gt;integrating UI widget sets in seam&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;Although I’m working primarily in jboss/seam for this project, it is not a seam issue per se. If anything, the conventions seam uses for its generated code help to alleviate these issues.&lt;br /&gt;&lt;br /&gt;As far as I can tell, this “tripping” arises from an inability to cleanly nest scope in the environment. For example, in xml it is hard to insulate oneself from what goes on in the xml around you as exemplified by the inability to nest comments in an xml file. This deficiency, coupled with the fact that many of the widget sets “compile to xml/html” has arguably had the side effect of diminishing concern for hygienic operation within the development community. Combine this with the silent failure aesthetics of JavaScript and you can produce results that are truly painful to debug.&lt;br /&gt;&lt;br /&gt;The contrast with Lisp macros (a radically different idea from C macros Hall has a very nice &lt;a href="http://www.apl.jhu.edu/%7Ehall/Lisp-Notes/Macros.html"&gt;page&lt;/a&gt; on the differences plus some nice simple examples of problems and how to avoid them) is striking. Lisp macros represent the most successful instantiation of code generating behavior that I know of. Lisp macros work because of Lisp’s ability to generate new variable names and then bind incoming values to them so they can be used freely. Achieving a similar result without language support for system-wide unique item naming is (very) hard.&lt;br /&gt;&lt;br /&gt;The seam/richfaces framework does much to try to minimize this problem e.g., if you look at the source of the page that seam sends to the browser you will see a lot of html with the form id=”competition:tagtDecoration:j_id70” which is a nice try at preventing variable collisions etc.. However, without enforced namespace encapsulation or the use of a system-wide symbol generating facility (e.g., &lt;a href="http://www.lisp.org/HyperSpec/Body/fun_gensym.html"&gt;gensym&lt;/a&gt;) variable capture is still possible. I also have not been able to find documentation on when new bindings are generated etc. which probably means I’ll have to look at the source someday.&lt;br /&gt;&lt;br /&gt;These issues usually occur more often in scripting languages (for the sake of argument I’m including xml, html, xhtml as scripting languages) rather than in compiled ones because compiled languages normally restrict these “global” environment accesses to compile time. Run time access in compiled languages to environment variables is more difficult, and inherently has to address multi-core/multi-processor/multi-systems issues. The end result is that the “external environment” is generally harder to get at in compiled languages and is relegated to “systems level” utilities.&lt;br /&gt;&lt;br /&gt;I’m hoping that the next big thing in scripting languages revolves around scoping and environment giving one the ability to specify a particular type of environment for code to run in, set up a safe context for it to run in, etc.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Part II will be covered in my next post&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-2889047491555439327?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/2889047491555439327/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=2889047491555439327' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/2889047491555439327'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/2889047491555439327'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/05/modularity-hygiene.html' title='Modularity &amp; Hygiene'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-1228121294840949944</id><published>2008-04-26T10:20:00.000-07:00</published><updated>2008-05-01T19:25:15.474-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='hibernate'/><category scheme='http://www.blogger.com/atom/ns#' term='hibernate annotations'/><title type='text'>Hibernate &amp; Java /Hibernate Annotations</title><content type='html'>Just a quick post on Hibernate&lt;span style="font-family:courier new;"&gt; ‘@Where’&lt;/span&gt; annotation usage and a note on some odd behavior I’ve seen using Hibernate in Java (&lt;span style="font-weight: bold; font-style: italic;"&gt;my environment&lt;/span&gt;  NetBeans IDE 6.0 (Build 200711261600) Java: 1.5.0_13; Java HotSpot(TM) Client VM 1.5.0_13-119, System: Mac OS X version 10.5.2 running on i386;  jboss: 4.2.2.GA; seam: 2.0.0.GA)&lt;br /&gt;&lt;br /&gt;First, annotations:&lt;br /&gt;I had a hard time finding a “pasteable” explanation of the use of &lt;span style="font-family:courier new;"&gt;@Where&lt;/span&gt;&lt;br /&gt;Now that I’ve found how to do it, I thought I’d share.&lt;br /&gt;&lt;br /&gt;@OneToMany(cascade = CascadeType.ALL, fetch = FetchType.LAZY, mappedBy = “itemId”)&lt;br /&gt;&lt;div  style="font-family:courier new;"&gt;&lt;pre&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;@Where&lt;/span&gt;(clause = “item_table = ‘competition’”)&lt;br /&gt;public Set&lt;itemtagrating&gt; getItemTagRatings() {&lt;br /&gt;return this.itemTagRatings;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;public void setItemTagRatings(Set&lt;&gt; itemTagRatings) {&lt;br /&gt;this.itemTagRatings = itemTagRatings;&lt;br /&gt;}&lt;br /&gt;&lt;/itemtagrating&gt;&lt;/pre&gt;The effect of this is to set up an additional condition upon &lt;span style="font-family:courier new;"&gt;TagRatings&lt;/span&gt;, so that only those &lt;span style="font-family:courier new;"&gt;TagRatings&lt;/span&gt; that have the &lt;span style="font-family:courier new;"&gt;item_table&lt;/span&gt; value &lt;span style="font-family:courier new;"&gt;= ‘competition’ &lt;/span&gt;are retrieved (&lt;span style="font-family:courier new;"&gt;item_table &lt;/span&gt;is the name of the column in the db).&lt;br /&gt;&lt;/div&gt;&lt;itemtagrating&gt;&lt;itemtagrating&gt;&lt;br /&gt;&lt;br /&gt;The second item is odd. I cannot determine the underlying cause, so I’m just sharing the symptom and a workaround&lt;br /&gt;&lt;br /&gt;The loop below would only execute once (and occasionally I would get a  &lt;span style="font-family:courier new;"&gt;javax.faces.FacesException&lt;/span&gt; with the message: &lt;span style="font-family:courier new;"&gt;“#{competitionHome.persist}: java.util.ConcurrentModificationException”)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The concurrent modification exception was not a particular surprise, but the silent (no visible exception) execution of the loop a single time befuddles me. If supressWarnings wasn’t a compile time operation that’s where I’d place the blame but....  (BTW I have no doubt that I'm doing something deeply wrong in the not &lt;span style="font-weight: bold; font-style: italic;"&gt;working loop&lt;/span&gt; below,  my goal is to highlight the relatively opaque effect of the error)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Note:&lt;/span&gt; I designate the loop as ‘working’ since the query string shown will not work as written, it required some changes to work since the #{newName} did not evaluate to the appropriate value.&lt;br /&gt;&lt;br /&gt;The loop&lt;br /&gt;&lt;tag style="font-family: courier new;"&gt;&lt;br /&gt;&lt;/tag&gt;&lt;/itemtagrating&gt;&lt;/itemtagrating&gt;&lt;pre&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;for((String newName : newTagNames){&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;@SuppressWarnings(“unchecked”)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;List tags = (List&lt;/span&gt;&lt;tag style="font-family: courier new;"&gt;) em.createQuery(“select t from Tag t where lower(t.tagName) = #{newName}“).setMaxResults(pageSize).setFirstResult(page * pageSize).getResultList();&lt;br /&gt;&lt;br /&gt;if ((tags == null) || (tags.size() == 0)) {&lt;br /&gt;newTag = new Tag(newName, user);&lt;br /&gt;em.persist(newTag);&lt;br /&gt;tagAssist.useTag(newTag);&lt;br /&gt;}&lt;br /&gt;else{&lt;br /&gt;tagAssist.useTag(tags.get(0));&lt;br /&gt;}&lt;br /&gt;}&lt;/tag&gt;&lt;tag&gt;&lt;tag&gt;&lt;br /&gt;&lt;/tag&gt;&lt;/tag&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Breaking the loop up into multiple loops worked. The &lt;span style="font-weight: bold; font-style: italic;"&gt;‘working’&lt;/span&gt; loop(s)&lt;br /&gt;&lt;tag&gt;&lt;tag&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;for ( String newName : newTagNames) {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  ArrayList&lt;/span&gt;&lt;tag&gt;&lt;span style="font-family:courier new;"&gt; tags = null;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  try {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    Query q = em.createQuery(“select t from Tag t where lower(t.tagName) = #{newName} “);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    q.setMaxResults(pageSize).setFirstResult(page * pageSize);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    tags = (ArrayList&lt;/span&gt;&lt;tag&gt;&lt;span style="font-family:courier new;"&gt;) q.getResultList();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;   } catch ( IllegalStateException e) {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      log.error(“error getting tags: “ + e);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      e.printStackTrace();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;   } catch ( IllegalArgumentException e) {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      log.error(“error getting tags: “ + e);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      e.printStackTrace();&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;   }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;   if ((tags == null) || (tags.size() == 0)) {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;     Tag newTag = new Tag(newName, currentUser);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;     newTags.add(newTag);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  for ( Tag newTag : newTags) {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    em.persist(newTag);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  for ( Tag newTag : newTags) {&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    tagAssist.useTag(newTag);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  }&lt;/span&gt;&lt;br /&gt;&lt;/tag&gt;&lt;/tag&gt;&lt;/pre&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;/tag&gt;&lt;/tag&gt;&lt;tag&gt;&lt;tag&gt;&lt;br /&gt;&lt;br /&gt;&lt;/tag&gt;&lt;/tag&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-1228121294840949944?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/1228121294840949944/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=1228121294840949944' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/1228121294840949944'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/1228121294840949944'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/04/hibernate-java-hibernate-annotations.html' title='Hibernate &amp; Java /Hibernate Annotations'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-8787045969977669107</id><published>2008-04-11T13:30:00.000-07:00</published><updated>2008-04-11T17:10:04.958-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='jbpm'/><category scheme='http://www.blogger.com/atom/ns#' term='seam'/><category scheme='http://www.blogger.com/atom/ns#' term='workflow'/><category scheme='http://www.blogger.com/atom/ns#' term='jboss'/><title type='text'>jboss, seam, jbpm</title><content type='html'>I’ve been working in the jboss/seam framework for a few months now and recently tried to add  jbpm workflow functionality to a project. In attempting to do this, I’ve hit some rough patches that have caused me to place the jbpm parts of the project on hold as they were not “mission critical” to this prototype.&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;Note that these remarks are in the context of a seam 2.0/jboss 4.2x environment (jboss 4.2+ is required by seam 2+), running on OSX 10.5. The jboss/seam/jbpm combination appears to be the source of the problems,  OSX 10.5 appears irrelevant.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Seam offers two ways to utilize jbpm in your applications: the first is pageflow; the second is what the documentation terms an &lt;a href="http://docs.jboss.org/seam/1.2.1.GA/reference/en/html/jbpm.html"&gt;overarching business process&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The seam &lt;a href="http://docs.jboss.com/seam/latest/reference/en/html/jbpm.html#d0e5311"&gt;pageflow model&lt;/a&gt; is, as one would expect, a way to map desired page to page transitions using an xml file. It is described as being similar to other pageflow definition languages such as &lt;a href="http://static.springframework.org/spring-webflow/docs/1.0-ea/reference/introduction.html"&gt;spring web flow&lt;/a&gt;  but built with a conversation model in mind. The idea is that conversations allow better support for random user navigation e.g., hitting the back button. I cannot comment on that claim, but having used pageflows a bit I’ve found them adequately expressive and useful.&lt;br /&gt;&lt;br /&gt;The difficulties that I have had are centered around integrating an overarching business process  into my seam application. These business processes may include asynchronous server and user processes executing over long periods of time, e.g., weeks. A simple example is the verification of user identity in a web facing system: A user requests an account on the system, and activates the account by responding to an email sent by the system.&lt;br /&gt;&lt;br /&gt;The business process requires the system to send an email to the user following the initial request, validate the response or, in the absence of a response, retire the request/send another reminder&lt;br /&gt;&lt;br /&gt;The first issue I had in incorporating this workflow is the classic “who’s on top” problem, since I found that pageflows could not initiate workflows.&lt;br /&gt;&lt;br /&gt;My prototyping started with a pageflow, as pageflows are the natural frameworks for handling the request for a new user account.  The pageflow structure makes it easy to verify that the requested username/password etc satisfy system requirements. If they don’t, the application can be configured to just sit on the request page and not proceed to subsequent pages.&lt;br /&gt;&lt;br /&gt;For example, by enclosing the page redirect in a &lt;span style="font-family:courier new;"&gt;&lt;&gt;&lt;/span&gt; construct,  the return of a null from the &lt;span style="font-family:courier new;"&gt;userHome.persist&lt;/span&gt; method keeps the application on the page (there is a protocol (not shown) that allows the application to post error messages on the page).  &lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;code&gt;    &amp;lt;navigation from-action=?#{userHome.persist}?&amp;gt; &lt;/code&gt;                                                  &lt;br /&gt;&lt;code&gt;        &amp;lt;rule&amp;gt; &lt;/code&gt;                                                                                      &lt;br /&gt;&lt;code&gt;            &amp;lt;end-conversation/&amp;gt; &lt;/code&gt;                                                                     &lt;br /&gt;&lt;code&gt;            &amp;lt;redirect view-id=?/User.xhtml?/&amp;gt; &lt;/code&gt;                                                       &lt;br /&gt;&lt;code&gt;        &amp;lt;/rule&amp;gt; &lt;/code&gt;                                                                                     &lt;br /&gt;&lt;code&gt;    &amp;lt;/navigation&amp;gt; &lt;/code&gt;    &lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;However, as mentioned above, a pageflow cannot initiate a workflow, so it is necessary to enclose the pageflow within a workflow to achieve the desired result. This entails changing the preceding page so that it kicks off a jbpm workflow rather than simply going to the next page.&lt;br /&gt;&lt;br /&gt;This is the point at which I realized that I would need tools for monitoring and debugging workflows. The jbpm &lt;a href="http://www.redhat.com/docs/manuals/jboss/jboss-soa-4.2/html/JBPM_Users_Guide/ch01s04.html"&gt;console&lt;/a&gt; appears key to this. Disconcertingly, I could not get the console to run with jboss 4.2.2.GA. This appears to be a general problem. In my case, the instructions on the &lt;a href="http://wiki.jboss.org/wiki/DeployJbpm3.2WebAppUnderJBoss4.2.x"&gt;wiki&lt;/a&gt; enabled me to get my workflow initiated without it throwing exceptions but I still was unable to get the console working.&lt;br /&gt;&lt;br /&gt;One suggested path was to build the console from source. However, I proved to be unable to get it to build from source with the level of effort I was willing to expend. After reading posts such as &lt;a href="http://www.jboss.com/index.html?module=bb&amp;amp;op=viewtopic&amp;amp;p=4126627#4126627"&gt;this&lt;/a&gt;, I concluded that building from source may not be the most productive experience.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The bottom line is that you’re in an odd space working with seam 2.0 -- it needs jboss 4.2X but some of the other redhat/jboss tools don’t yet support  jboss 4.2. I’m a bit surprised that this is the case:  &lt;a href="http://sourceforge.net/project/showfiles.php?group_id=22866&amp;amp;package_id=163777"&gt;seam 2.0 GA was released 2007-11-01 06:49&lt;/a&gt; and &lt;a href="http://www.jboss.org/jbossas/downloads/"&gt;jboss 4.2.2 was released 2007-05-11&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Since I expect this situation to eventually be rectified, and as I mentioned, workflow was not “mission critical” I decided to wait for a working console application to be posted on the jboss.org site.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;In fairness&lt;/span&gt; I’m working from the community “minimally supported” version -- the situation could be radically different in the supported versions.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-8787045969977669107?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/8787045969977669107/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=8787045969977669107' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8787045969977669107'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8787045969977669107'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/04/jboss-seam-jbpm.html' title='jboss, seam, jbpm'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-1679617752272253339</id><published>2008-03-26T08:18:00.000-07:00</published><updated>2008-03-26T08:25:41.271-07:00</updated><title type='text'>Taxonomies, Ontologies and the Semantic Web</title><content type='html'>A couple of weeks ago I attended &lt;a href="http://www.iscb.org/cshals2008/"&gt;C-SHALS 2008&lt;/a&gt; (Conference on Semantics in Healthcare and Life Sciences), one aspect of it that I found striking was the number of people who conflated taxonomies with ontologies -- my initial reaction was to want to post  a remark about the confusion and highlight the distinctions (see &lt;a href="http://www.ontopia.net/topicmaps/materials/tm-vs-thesauri.html#sect-taxonomies"&gt;this&lt;/a&gt;&lt;a href="http://www.ontopia.net/topicmaps/materials/tm-vs-thesauri.html#sect-taxonomies"&gt; &lt;/a&gt;for a short set of descriptions of these and related terms).&lt;br /&gt;&lt;br /&gt;I’ve instead come to view this conflation as reflecting the pragmatic bias of these systems:  if the difference between taxonomies and ontologies isn’t apparent to you, the difference doesn’t matter for what you are trying to do (modulo the assumption that the speakers were competent, but that did appear to be the case). The implication is that such systems require no significant machine based inference across organizations. Significant inference, in this context, would involve something  beyond  the use of term matching to gather locally related terms/individuals (local vis a vis the terms being matched).  Note: although I categorize this as being ‘non-signficant’ that’s only from the standpoint of inference -- these systems do cover most of the Business Intelligence/anlaysis use cases being implemented today.&lt;br /&gt;&lt;br /&gt;As you might expect, given this characterization, these presentations involved the aggregation of data from multiple sites, using &lt;a href="http://www.w3.org/RDF/"&gt;RDF&lt;/a&gt; or taxonomies such as &lt;a href="http://www.cap.org/apps/cap.portal?_nfpb=true&amp;amp;cntvwrPtlt_actionOverride=%2Fportlets%2FcontentViewer%2Fshow&amp;amp;_windowLabel=cntvwrPtlt&amp;amp;cntvwrPtlt%7BactionForm.contentReference%7D=snomed%2Fsnomed_ct.html&amp;amp;_state=maximized&amp;amp;_pageLabel=cntvwr"&gt;Snomed&lt;/a&gt; to link data between sites. This is a good thing --  as I’ve mentioned a number of times having &lt;a href="http://rdfsg.blogspot.com/2007/01/aspects-of-platform-architecture-part-1.html"&gt;stable identifiers across systems is the key to integration&lt;/a&gt;. The system presented demonstrated that useful integration is possible even when the same term e.g., the same Snomed terms, have slightly different meanings in the different organizations.  (see &lt;a href="http://www.amazon.com/How-Doctors-Think-Jerome-Groopman/dp/0547053649/ref=pd_bbs_sr_1?ie=UTF8&amp;amp;s=books&amp;amp;qid=1206206630&amp;amp;sr=8-1"&gt;How Doctors Think&lt;/a&gt; for an anecdotal study of physicians classifying patients).&lt;br /&gt;&lt;br /&gt;This is an interesting result: although fully vetted, 100% one-to-one mappings would obviously be preferable, in these systems the value of more data outweighs the penalty imposed by increased noise. Rough quick integration is proving more valuable than detailed integration requiring a thorough analysis of all systems used -- probably because the difference between ‘rough, quick’ and ‘thorough, slow’ is measured in months, if  not years.&lt;br /&gt;&lt;br /&gt;This is related to a discussion at the conference on the contrast between developing ‘problem specific’ ontologies vs. ‘general use’ ontologies. That is: does taking the time to ‘get it right’ add any value? This is roughly equivalent to the old AI &lt;a href="http://en.wikipedia.org/wiki/Neats_vs._scruffies"&gt;scruffy vs. neat&lt;/a&gt; distinction.&lt;br /&gt;&lt;br /&gt;Although I wouldn’t go so far as to claim that a general purpose ontology is impossible (at least in some limited domain),  I am skeptical that it can be achieved.  My concern centers around the fact that when you are constructing a general use ontology it is hard to know where to stop e.g.,  given a small molecule bioactive compound you should represent the formula and chirality, but what about the (possibly fractional) salt form? or the formulation? what about radioisotopes and their decay rates? subnuclear particles etc. I understand pragmatic stopping points for modeling these issues, but I don’t know how to determine principled ones.&lt;br /&gt;&lt;br /&gt;It’s reassuring to see a number of researchers finding pragmatically useful parts of the semantic web, without the need for perfect definitions/ontologies. This, to me is the take-home message: there are a number of useful tools and techniques in the semantic web space, don’t be put off by the thought of merging ontologies and developing a grand unified theory of everything.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-1679617752272253339?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/1679617752272253339/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=1679617752272253339' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/1679617752272253339'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/1679617752272253339'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/03/taxonomies-ontologies-and-semantic-web.html' title='Taxonomies, Ontologies and the Semantic Web'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-5043966796064371544</id><published>2008-03-12T07:19:00.000-07:00</published><updated>2008-04-16T10:54:02.521-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Ruby'/><category scheme='http://www.blogger.com/atom/ns#' term='software'/><category scheme='http://www.blogger.com/atom/ns#' term='RubyOnRails'/><category scheme='http://www.blogger.com/atom/ns#' term='seam'/><category scheme='http://www.blogger.com/atom/ns#' term='jboss'/><title type='text'>Porting a Ruby on Rails Application to jboss seam</title><content type='html'>I did finally port my Ruby on Rails application to jboss seam.&lt;br /&gt;&lt;br /&gt;The capsule summary is that it took longer than expected (not particularly unusual for software), looks better than it ever did but still needs some performance tuning.&lt;br /&gt;&lt;br /&gt;Some specifics&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;image display&lt;/span&gt;&lt;br /&gt;I went with the seam graphicImage tag (note xmlns:s=”http://jboss.com/products/seam/taglib)&lt;span style="font-family:courier new;"&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;blockquote&gt;&lt;span style="font-family:courier new;"&gt;&lt; 's:graphicImage' value="”#{artwork.thumb}” rendered=”#{not empty artwork.thumb}” &lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt; 's:transformImageSize' height="”50”" maintainratio="”true”"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt; '/s:graphicImage'&gt;&lt;/span&gt;&lt;/blockquote&gt;&lt;span style="font-family:courier new;"&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;/span&gt;&lt;br /&gt;which I found to be slow -- much slower than doing a normal html &lt;span style="font-family:courier new;"&gt;&lt; width="”x”/"&gt;&lt;/span&gt;  tag. (update -- this is due to the use of the '&lt;span style="font-style: italic; font-family: courier new;"&gt;transformImageSize&lt;/span&gt;' tag -- rdf 24 March 2008)&lt;br /&gt;&lt;br /&gt;In the RoR code I conditionally chose one of two versions of the image tag&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;        &lt;/span&gt;&lt;blockquote&gt;&lt;span style="font-family:courier new;"&gt;&lt; %= &lt;/span&gt;&lt;span style="font-family:courier new;"&gt;if(@artwork.width_inches &gt; @artwork.height_inches)&lt;br /&gt;then&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;image_string = “&lt; id =""&gt; @artwork.id) + “\” width=100/&gt;”&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;else&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;image_string = “&lt; id =""&gt; @artwork.id) + “\” height=100/&gt;”&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;end&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;image_string&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;%&gt;&lt;/span&gt;  &lt;/blockquote&gt;&lt;br /&gt;which rendered much faster.&lt;br /&gt;&lt;br /&gt;The&lt;span style="font-family:courier new;"&gt; s:graphicImage&lt;/span&gt; tag doesn’t appear to be intended for rendering up to 20 images on a page -- my next revision will include the equivalent of the RoR code&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;data display&lt;/span&gt;&lt;br /&gt;I was easily able to get data tables with nicely alternating row colors by adding the following  to theme.css (I did it here, since the ‘alternating colors’ should vary with the theme)&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;/span&gt;&lt;blockquote&gt;&lt;span style="font-family:courier new;"&gt;.table-even { &lt;/span&gt; &lt;span style="font-family:courier new;"&gt;    background-color: #ffffff;&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;}&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;.table-odd {&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;    background-color: #eeeeee;&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;}&lt;/span&gt;&lt;/blockquote&gt;&lt;span style="font-family:courier new;"&gt;&lt;/span&gt;&lt;br /&gt;and then adding the following line to the *List.xhtml files&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;blockquote&gt;rowClasses=”table-even,table-odd”&lt;/blockquote&gt;&lt;/span&gt;I did have some minor problems getting it to work since I had overwritten a class that applied to each cell and had given it a background color. I forgot that this cell class would take precedence, but was able to figure it all out with &lt;a href="http://www.getfirebug.com/"&gt;Firebug&lt;/a&gt;, an indispensable tool!&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;seam generator&lt;/span&gt;&lt;br /&gt;The generator provided an invaluable starting point and some really nice features e.g.,  it creates tables with sortable columns for the list view. The table defaults to displaying the ID of nested objects, but it was trivial to change it to display something more appropriate while maintaining the expected sort behavior.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;ajax suggestions&lt;/span&gt;&lt;br /&gt;My goal was to consistently work within the framework. This occasionally put me at a level of abstraction above which I was comfortable (the operational metaphor being “trying to do X while wearing thick gloves”). This caused me to have a much more difficult time doing a drop down suggestion menu than expected e.g.,&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;/span&gt;&lt;blockquote&gt;&lt;span style="font-family:courier new;"&gt;&lt; id="”roleDecoration”" template="”layout/edit.xhtml”"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt; name="”label”"&gt;role &lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt; value="”#{peopleHome.instance.roles}”"&gt;&lt;span style="font-family:courier new;"&gt;mmediate=”true"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;s:selectitems value="”#{rolesList}&amp;quot;" label="”#{roles.description}”" var="”roles”"&gt;&lt;br /&gt;&lt;/s:selectitems&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;&gt;&lt;br /&gt;&lt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt; /h:selectOneMenu&gt;&lt;br /&gt;&lt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt; /s:decorate&gt;&lt;/span&gt;&lt;/span&gt;&lt;/blockquote&gt;&lt;span style="font-family:courier new;"&gt;&lt;/span&gt; since I was having a bit of a problem finding the exact ‘magic location’ for placing the  “&lt;span style="font-family:courier new;"&gt;&lt;s:convertentity&gt;&lt;/s:convertentity&gt;&lt;/span&gt;” tag relative to the &lt;s:selectitems&gt; tag, aka it all &lt;span style="font-weight: bold; font-style: italic;"&gt;‘just works&lt;/span&gt;’ if you have everything placed ‘&lt;span style="font-weight: bold; font-style: italic;"&gt;just right&lt;/span&gt;’&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Being at a higher level of abstraction also forced some patterns that I found difficult to work around. For example, in this trail system an &lt;span style="font-style: italic;"&gt;Artwork&lt;/span&gt; object has a &lt;span style="font-style: italic;"&gt;framed?&lt;/span&gt; attribute backed by a boolean. The behavior that I wanted in the listing ‘query by example’ code was to either&lt;br /&gt;&lt;/s:selectitems&gt;&lt;blockquote&gt;to return framed artworks if the  &lt;span style="font-style: italic;"&gt;framed? &lt;/span&gt;checkbox was checked &lt;/blockquote&gt;or&lt;br /&gt;&lt;blockquote&gt;to return all artworks if the &lt;span style="font-style: italic;"&gt;framed? &lt;/span&gt;checkbox is not checked.&lt;/blockquote&gt;However, I could neither come up with a way to have elements on the restriction list take multiple parameters nor to return different restriction lists depending upon the query&lt;br /&gt;&lt;br /&gt;my notes at that point say:&lt;br /&gt;&lt;blockquote&gt;if you do this&lt;br /&gt;&lt;span style="font-family:courier new;"&gt; List&lt;string&gt; restrictionList;&lt;/string&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;if ((this.artwork != null)&lt;br /&gt;&amp;amp;&amp;amp; (this.artwork.getShipable() != null)&lt;br /&gt;&amp;amp;&amp;amp; this.artwork.getShipable().booleanValue()) {&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt; restrictionList = Arrays.asList(SHIPABLE_RESTRICTIONS);&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;}&lt;br /&gt;else {&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;     restrictionList = Arrays.asList(RESTRICTIONS);&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;You break the transaction model&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;I was able to fix this by adding this line to the RESTRICTIONS&lt;br /&gt;“&lt;span style="font-family:courier new;"&gt;artwork.framed in (true, #{ artworkList.artwork.framed})&lt;/span&gt;”&lt;br /&gt;which obviously will not generalize beyond the boolean case.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;hibernate&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I found that the Hibernate documentation was very useful (when I took the time to read it in detail aka RTFM)&lt;br /&gt;The expression language used is described reasonably well &lt;a href="http://developers.sun.com/docs/jscreator/help/2update1/jsp-jsfel/jsf_expression_language_intro.html"&gt;here&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;richfaces&lt;/span&gt;&lt;br /&gt;When I moved to a new version of richfaces to get suggestion boxes working it broke other minor portions of the page layouts. Although not a big deal, I found it disconcerting. Building this rich functionality in the browser is cool and all, but it feels fragile and is causing me to think about trying out flex (or air, its latest incarnation)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;unit testing&lt;/span&gt;&lt;br /&gt;I used &lt;a href="http://htmlunit.sourceforge.net/"&gt;HTMLUnit&lt;/a&gt; since it tests the complete end-to-end interaction.&lt;br /&gt;Although I appreciate the ability to do faster, more thorough testing via &lt;a href="http://dev2dev.bea.com/pub/a/2005/10/mock_ejbs.html"&gt;mocks&lt;/a&gt;, I found that they gave me yet another thing to configure and wouldn’t give me the full end-to-end functionality that I was looking for.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Summary&lt;/span&gt;&lt;br /&gt;I think that jboss/seam will likely prove useful. I have one other application that I’m building as a precursor to the &lt;a href="http://rdfsg.blogspot.com/2008/02/extensible-system-for-discovery-data.html"&gt;extensible discovery system&lt;/a&gt; My biggest area of concern is the ability to do a  good UI in this space which might prompt me to investigate the &lt;a href="http://www.adobe.com/products/air/"&gt;air&lt;/a&gt;/&lt;a href="http://flex.org/"&gt;flex&lt;/a&gt; framework(s) at some point in the near future.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-5043966796064371544?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/5043966796064371544/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=5043966796064371544' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/5043966796064371544'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/5043966796064371544'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/03/porting-ruby-on-rails-application-to.html' title='Porting a Ruby on Rails Application to jboss seam'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-4428928306388961718</id><published>2008-02-26T06:17:00.000-08:00</published><updated>2008-03-12T13:18:45.175-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software'/><category scheme='http://www.blogger.com/atom/ns#' term='discovery informatics'/><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><title type='text'>Extensible System: Core Software Requirements</title><content type='html'>In my last post I promised to detail the software requirements that an &lt;a href="http://rdfsg.blogspot.com/2008/02/extensible-system-for-discovery-data.html"&gt;extensible system for discovery data&lt;/a&gt; shares with other Web 2.0 systems -- here they are:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Workflow Orientation: &lt;/span&gt;Support for  a complete workflow beyond that which is offered by &lt;a href="http://www.jsftutorials.net/jsf-navigation-by-examples.html"&gt;JSF navigation&lt;/a&gt;. This workflow must allow the orchestration of multiple events without requiring additional user interaction. Supported workflows may involve sequenced, conditional interaction with multiple back end systems (obviously existing on multiple platforms). For sanity and maintainability, the workflow language should be &lt;a href="http://en.wikipedia.org/wiki/BPEL"&gt;BPEL&lt;/a&gt; (or an slight extension) and should provide the ability to extend predicates and actions using a well designed and understood language (Java,  C#  etc.).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Integrateable&lt;/span&gt; (&lt;a href="http://en.wikipedia.org/wiki/Mashup_%28web_application_hybrid%29"&gt;mashable&lt;/a&gt;) data: The data stored by the application (and the results of any analyses performed on the data) are available for repurposing in other applications.  Repurposing should be supported in fine grained manner so as to put as few restrictions as possible upon its use.&lt;br /&gt;&lt;br /&gt;This has two implications: the first is &lt;a href="http://rdfsg.blogspot.com/2007/01/aspects-of-platform-architecture-part-1.html"&gt;stable identifiers&lt;/a&gt;; the second is &lt;a href="http://en.wikipedia.org/wiki/Representational_State_Transfer"&gt;restful interfaces&lt;/a&gt; which allow data to be retrieved by referencing a static URL.&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Note:&lt;/span&gt; Restful interfaces have a number of nice side effects: the seam *From trick mentioned below would be much more difficult without a restful interface. Additionally, the speed of rapid prototyping/development of web pages is greatly increased if one can directly access a ‘deep’ page without having to manually negotiate multiple precursor pages.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Event queues: &lt;/span&gt;Good workflow/system interaction is facilitated by message queues for guaranteed message delivery. Queuing systems also provide good interface points for logging and &lt;a href="http://www.hermesjms.com/confluence/display/HJMS/Home"&gt;analysis tools&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Rules engines:&lt;/span&gt; In its most general form a rules engine is a piece of code that evaluates a set of antecedent-consequent pairs e.g., if antecedent then consequent.  Given this abstract definition rules engines need to be distributed at a number of places within the product.  I see four distinct areas each with its own role&lt;br /&gt;&lt;br /&gt;1: Display within a page e.g., should the particular element be displayed corresponding to the ‘rendered’ predicate in JSF. &lt;span style="font-style: italic;"&gt;Rules involve availability of data appropriate for display and authentication/authorization restrictions.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;2: Predicates involving page flow (JSF, &amp;amp;REST, etc). &lt;span style="font-style: italic;"&gt;Rules involve what page gets displayed next? &lt;/span&gt;&lt;br /&gt;The jboss seam pages have a very nice convention in which the presence of a *From attribute/value pair allows an editing action, upon completion,  to return to the page from which it was launched. Here is an example using &lt;span style="font-weight: bold;"&gt;peopleFrom&lt;/span&gt;&lt;br /&gt;&lt;blockquote style="font-family: courier new;"&gt;&lt;s:button view="”/#{empty" style="font-weight: bold;"&gt;peopleFrom ? ‘PeopleList’ : &lt;span style="font-weight: bold;"&gt;peopleFrom&lt;/span&gt;}.xhtml”&lt;br /&gt;                     id=”done”&lt;br /&gt;                     value=”Done”/&gt;&lt;/s:button&gt;&lt;/blockquote&gt;&lt;br /&gt;Which will return the the PeopleList page when the editing action has been completed.&lt;br /&gt;3: Security rules for  CRUD operations. &lt;span style="font-style: italic;"&gt;Rules involve accessing and modifying data.&lt;/span&gt;&lt;br /&gt;4: Back end  BPEL operations e.g., if the request has been outstanding for more than a week then notify customer support. &lt;span style="font-style: italic;"&gt;Rules involve the overall operation of the system.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Logging:&lt;/span&gt; Effective debugging of complex systems requires the ability to gather an integrated log for each activity in the chain of events that produces a given result; supporting this requirement in an operational setting requires that all relevant logs can be time-aligned and assembled into a single report for analysis&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Monitoring and management:&lt;/span&gt; Business rules should be capable of being extended to monitoring system operation: server load, queue depth, latency etc.  allowing the system to be ‘self monitoring.’ The use of a common tool permits the maximal number of people to understand its operation.&lt;br /&gt;&lt;br /&gt;In addition, interfaces should be provided to allow information to be updated (recached) without bouncing the server.  &lt;a href="http://java.sun.com/javase/technologies/core/mntr-mgmt/javamanagement/"&gt;JMX&lt;/a&gt; is a reasonable example see &lt;a href="http://www-128.ibm.com/developerworks/java/library/j-jtp09196/index.html?ca=drs"&gt;also&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Security:&lt;/span&gt; Obviously any enterprise system must provide for some level of security minimally with LDAP support, hopefully with out of the box support for OpenID and SAFE. In practice, I would caution against making security/access overly fine grained since it must support  people changing their roles in the organization, changes in business processes etc..  The more fine grained your access model the more thought is required to get it right and the greater the probability of getting it wrong.&lt;br /&gt;&lt;br /&gt;I have personally found it useful to distinguish reading, writing, and editing data, opening up the reading and dissemination of the information while restricting writing and editing data to specific tools provided for specific stages of the process.&lt;br /&gt;&lt;br /&gt;For example,  given a standard lab workflow for data collection, analysis, upload and “publication” (to the persons requesting the tests and then the company at large): there is one tool for collecting, analyzing and uploading the data; there is a second set of tools for integrating and viewing the data in a larger context; and there may be a third set of tools for curating and editing data which has been found discrepant.&lt;br /&gt;&lt;br /&gt;This rounds out the software requirements for a practical production system. Although these requirements appear (and are) extensive most, if not all, of them appear in a number of enterprise level toolkits. As I said at the beginning of this post: there are clear best practices. &lt;a href="http://www.rdfsg.com-a.googlepages.com/RDFSG_Extensible_tool.pdf"&gt;A Powerpoint that covers both these posts is available&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-4428928306388961718?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/4428928306388961718/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=4428928306388961718' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/4428928306388961718'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/4428928306388961718'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/02/extensible-system-core-software.html' title='Extensible System: Core Software Requirements'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-7600091034934521872</id><published>2008-02-11T12:09:00.000-08:00</published><updated>2008-12-09T01:37:47.066-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software'/><category scheme='http://www.blogger.com/atom/ns#' term='discovery informatics'/><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><category scheme='http://www.blogger.com/atom/ns#' term='seam'/><category scheme='http://www.blogger.com/atom/ns#' term='jboss'/><title type='text'>An Extensible System for Discovery Data</title><content type='html'>I’ve been thinking about how to make discovery informatics tools significantly more flexible, extensible and perhaps even more maintainable than they currently are. I don’t think of this as being a software problem per se (at least as it involves any new thinking on my part). However imperfectly, the required underlying software capabilities already exist. That is, although we certainly need systems that are more configurable, workflow oriented and flexible in their ability to mix elements on a page (via mash ups etc), and most scientific software is unquestionably deficient in these areas,  the state of the art in software development in 2008 affords a clear set of proven practices to satisfy these needs.&lt;br /&gt;&lt;br /&gt;The key issue for scientific software is that the science which we are trying to support changes over time in two significant albeit essentially different ways.&lt;br /&gt;&lt;br /&gt;The first is that the areas of the business that we need to support/integrate with expands over time e.g., starting at in-vitro testing and moving (in both directions) to support synthesis and in-vivo testing, perhaps eventually into clinical trials. It is inevitable that over time the business changes, the strategy changes, the organizational structure changes -- in any case, once solid (organizational) boundaries become permeable.  These changes in the business have a strong impact on the design of scientific software adding anything from the need to support radioisotopes and formulations to tracking freeze/thaw cycles.&lt;br /&gt;&lt;br /&gt;The second, and more interesting axis of change is that the &lt;span style="font-weight: bold; font-style: italic;"&gt;science itself changes&lt;/span&gt;: new tests come on board that have a radically different cardinality&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The nature of the data changes: test results start as point averages, evolve into  time series and then become two dimensional vectors e.g., %INH, FLIPR, Imaging;&lt;/li&gt;&lt;li&gt;Our understanding of the biology changes e.g., one protein -&gt; one gene;&lt;/li&gt;&lt;li&gt; Our ability to simplify the problem (a simplification that may have been unconscious) has been shattered e.g., genetic variations of targets, protein pathways, post translational modifications begin to impact the data which we are gathering today. &lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;A common solution is to build the system to support the most complex case. However this doesn’t work if&lt;br /&gt;&lt;ul&gt;&lt;li&gt;We don’t know the most complex case.&lt;/li&gt;&lt;li&gt;The most complex case is a superset of all possible cases only one of which would ever occur in a particular system&lt;/li&gt;&lt;li&gt;The group currently being supported can’t supply the information that will enable the distinctions at a later date -- so the information collected is all the same. &lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;In my mind the best way to solve this is to modularize the domain into its &lt;span style="font-weight: bold; font-style: italic;"&gt;simplest building blocks&lt;/span&gt; and then build up the necessary complexity using these building blocks. I will admit that this is what we think we’ve been doing all these years, but I don’t think it’s true. I challenge you to look at the actual tables (or objects) in your systems and ask yourself if all of the columns (attributes) are strictly necessary in all (or even most) situations or do they (the columns/attributes) reflect the diffusion of business processes into our base design.  My proposal is simply the following:  for each building block we scale back the attributes for each entity to the absolutely necessary minimum, paying special attention to items which could potentially change the cardinality.&lt;br /&gt;&lt;br /&gt;The temptation in doing this type of analysis is to start at the “most fundamental” level  and work your way up. The problem I have with doing things that way is that I find the prospect of describing a mouse in terms of its constituent quarks to be both daunting and without obvious value. My current approach is therefore to start at a well grounded middle level (similar to the “&lt;a href="http://cognet.mit.edu/library/books/view?isbn=0262193639"&gt;middle distance&lt;/a&gt;” ) and stub out to one level beyond the current need.&lt;br /&gt;&lt;br /&gt;The goal is to be able to support multiple overlapping hierarchieries so that in different situations we can classify assays by technology, gene, both or neither.&lt;br /&gt;&lt;br /&gt;The diagrams below indicate the kind of modeling that I envision&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_uhpaSaKsmiM/R7CsOB3bf6I/AAAAAAAAABs/TdhzZSOrebI/s1600-h/Extensible_test_item.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_uhpaSaKsmiM/R7CsOB3bf6I/AAAAAAAAABs/TdhzZSOrebI/s320/Extensible_test_item.jpg" alt="" id="BLOGGER_PHOTO_ID_5165818129613029282" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The characteristics of such a “primal entity” include the &lt;span style="font-style: italic;"&gt;absence of any bp (business process) foreign keys&lt;/span&gt; (direct references) to entities related solely as an aspect of the business process. These &lt;span style="font-style: italic;"&gt;business process &lt;/span&gt;relationships are moved into relationship tables particular to the situation at hand and reflect the particular hierarchy of the business process currently being addressed. The foreign keys which are allowed under this  no bp foreign key constraint involve  items that must be present for &lt;span style="font-weight: bold; font-style: italic;"&gt;any entity of this sort&lt;/span&gt;  e.g., all synthesis batches must have components, methods of synthesis (even if ‘random’),  an operator (even if ‘unknown’) and a units of measure in the frames that most of us work in, these can be taken as relatively fixed.&lt;br /&gt;&lt;br /&gt;The advantage here is critical: your system, and more importantly your data, can survive in the face a significant unforeseen change in the  scientific or business environment. The reason for this is straightforward: the model is now capable of supporting multiple conflicting hierarchies of relationships, so the introduction of a new, conflicting hierarchy doesn’t break your model. This works in a manner similar to they way in which &lt;a href="http://rdfsg.blogspot.com/2007/05/rdfrails-on-rubyforge.html"&gt;an RDF export of your system &lt;/a&gt;can assist you in modeling its ontology,  since it can support multiple, potentially conflicting ontologies.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;My next post will focus on components of the software architecture.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-7600091034934521872?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/7600091034934521872/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=7600091034934521872' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/7600091034934521872'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/7600091034934521872'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/02/extensible-system-for-discovery-data.html' title='An Extensible System for Discovery Data'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_uhpaSaKsmiM/R7CsOB3bf6I/AAAAAAAAABs/TdhzZSOrebI/s72-c/Extensible_test_item.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-8316271018764417154</id><published>2008-01-24T11:15:00.000-08:00</published><updated>2008-03-12T13:17:27.451-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='RubyOnRails'/><category scheme='http://www.blogger.com/atom/ns#' term='seam'/><category scheme='http://www.blogger.com/atom/ns#' term='jboss'/><title type='text'>RoR =&gt; Seam Part 1</title><content type='html'>This is a short initial post about my first jboss seam project which is to move one of my small, but non-trivial Ruby-on-Rails applications to Seam.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;A couple of preliminary notes:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;I used seam-gen to generate the initial code/application structure. It gives you a good start -- the pages have a nice minimal and professional look to them and none of the default designs/layouts flows are crazy. All in all a nice job.&lt;br /&gt;&lt;br /&gt;My environment Mac OSX 10.5.1,  jboss 4.2.2 GA, seam 2.0.0 GA, mysql 5.0.45&lt;/blockquote&gt;I had a couple missteps/odd experiences that I wanted to share. They both revolved around trying to include some AJAX capabilities on a page&lt;br /&gt;&lt;br /&gt;One page of the application is for the creation of shows which have titles, curators, institutions and artworks. The curators, institutions and artworks are in separate tables with the ‘shows’ table holding foreign keys. Title is a required field.&lt;br /&gt;&lt;br /&gt;In AJAXifying the curator selection the onkeyup actionListener was not being called -- I was getting this error message in the background stream&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;blockquote&gt;14:07:00,243 INFO  [lifecycle] WARNING: FacesMessage(s) have been enqueued, but may not have been displayed.sourceId=&lt;span style="color: rgb(255, 0, 0);"&gt;shows:aName&lt;/span&gt; Decoration:name[severity=(ERROR 2), summary=(value is required), detail=(value is required)]&lt;/blockquote&gt;&lt;/span&gt;The referenced  &lt;span style="color: rgb(255, 0, 0);"&gt;aName&lt;/span&gt; section of the page was not involved in either end of the AJAX update. After struggling with this for a few days I realized that Name is a required field -- the framework  was checking this requirement first and giving me an error. The combination of the error in validation + being in the middle of an ajax call resulted in neither the error being posted to the page nor the onkeyup action being called.&lt;br /&gt;&lt;br /&gt;I worked around this by initializing the name on the page with a&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;blockquote&gt; #{empty showsHome.instance.name ? showsHome.instance.setName(“A New Show”) : showsHome.instance.name}. &lt;/blockquote&gt;&lt;/span&gt; I find the behavior odd and of course the workaround doesn’t cover the situation in which the user clears the title field and then jumps  to the curator field.&lt;br /&gt;&lt;br /&gt;Staying with the topic of AJAX support: keep in mind that the richfaces toolkit provides a lot of ajax functionality in a straightforward manner. For example, when looking for an ‘autocomplete’ functionality I mistakenly went down the path of trying to integrate a Yahoo UI component -- nothing against the YUI toolkit, but the exact functionality I was looking for was in richfaces &amp;amp; required a lot less work.&lt;br /&gt;&lt;br /&gt;Using richfaces with seam isn’t as obvious as it should be since most of the examples that you’ll currently find are for Seam version 1.x prior to the richfaces mindmeld that occurred in Seam 2.0. &lt;span style="font-weight: bold; font-style: italic;"&gt;Caveat:&lt;/span&gt; be sure to use richfaces 3.1.3+ since I found some of the most important &lt;span style="font-family:courier new;"&gt;rich:suggestionbox&lt;/span&gt; features broken in earlier versions&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-8316271018764417154?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/8316271018764417154/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=8316271018764417154' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8316271018764417154'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8316271018764417154'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/01/ror-seam-part-1.html' title='RoR =&gt; Seam Part 1'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-2599939572657590676</id><published>2008-01-11T07:20:00.001-08:00</published><updated>2008-03-12T13:07:34.702-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='discovery informatics'/><category scheme='http://www.blogger.com/atom/ns#' term='seam'/><category scheme='http://www.blogger.com/atom/ns#' term='jboss'/><title type='text'>Hello Seam</title><content type='html'>A note on getting a ‘hello world’ reverse engineered application up and running in jboss seam&lt;br /&gt;&lt;br /&gt;For this test I am moving two applications to seam. The first is an existing Ruby on Rails application that I’m using as a reference and will be replicating/improving the RoR functionality. The second is a bare bones web 2.0 application for which I have only recently completed the table design and for which I have  a minimal set of use cases. I plan to use this as a substrate for building workflow-based/“Web 2.0 enabled” applications in a number of areas including drug discovery&lt;br /&gt;&lt;br /&gt;In preparation,  I thought I’d take an infrastructure upgrade move to Leopard and Eclipse/Europa (+ for reasons that will be apparent Netbeans 6.0). This may not have been the best idea since I encountered some eclipse/Leopard incompatibilities that were eventually resolved using the technique described &lt;a href="http://www.jboss.com/index.html?module=bb&amp;amp;op=viewtopic&amp;amp;p=4113746#4113746"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The first step was to install jboss, seam (I initially went with the 1.2.1 GA version since the 2.0.0 GA had too many 0’s in it for me to be comfortable) and run the demos -- this was a very enjoyable process. The installation was easy and the demos were polished.&lt;br /&gt;&lt;br /&gt;The next step was to begin reverse engineering. With Eclipse the best result was had by first using ‘seam new-project’ to create a bare project and then import the project into eclipse (using the file-&gt;new-&gt;project as recommended). I then performed a ‘seam generate-entities’ to get all the hibernate code in place and then do a refresh of the files. This was advantageous since it allowed me to disambiguate Eclipse project import problems (which I didn’t have) from the Hibernate mapping problems (which I did have).&lt;br /&gt;&lt;br /&gt;In eclipse the only way I could find to get the project to run on the jboss app server was to right click-&gt;’jboss tools’-&gt;’add struts capabilities’ and then add it in from the project side right-click-&gt;’run as’-&gt;’open run dialog’. There’s probably some other way to do this but I couldn’t find it quickly.&lt;br /&gt;&lt;br /&gt;I tried to open the project in NetBeans since the seam doc indicated that you could ‘just’ open the project up in NetBeans without any special actions. This was true, but NetBeans just did not appear to understand the file layout of the 1.2.1 seam-gen project, so I dropped that path (at least until I started using seam 2.0).&lt;br /&gt;&lt;br /&gt;I did have some problems getting the reverse engineered project to run. After a bit (maybe more) of investigation I determined that the problems revolved around the fact that both projects had blobs (one has blobs of different sizes) going against a mysql back end.&lt;br /&gt;&lt;br /&gt;Seam structures the project in such a way that the first page won’t come up unless hibernate can successfully map the database. The default hibernate mysql/blob mapping is ‘tinyblob’ which didn’t match my applications’ blob datatypes of blob and mediumblob. In addition, Hibernate won’t take the mysql data types and it is necessary  to specify the &lt;a href="http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html_"&gt;appropriate lengths&lt;/a&gt; when I did this everything came up fine.&lt;br /&gt;&lt;br /&gt;I have to admit that I was a bit thrown off by the ‘first page error’ since I didn’t expect that mapping to be invoked until I started to hit those pages. This might stem from the fact that there is a navigation bar on the top of the first page that takes you to pages and map to the db, or might be standard Hibernate behavior, I have to admit that I’m not familiar enough with Hibernate to be sure.&lt;br /&gt;&lt;br /&gt;In any case, I was up and running and started performing some minimal customization.&lt;br /&gt;&lt;br /&gt;Seam’s default behavior for displaying blobs is to show a hex string of the first few bytes - -not a bad default as it allows you to see that it is accessing binary data and the data is different for each item. However it’s not the final behavior that I wanted.&lt;br /&gt;&lt;br /&gt;When I tried to add some images to the pages e.g., &lt;s:graphicimage value="”img/hydrant.jpg”"&gt;.&lt;br /&gt;I ran into issues where there were warnings of the types&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;WARN  [HtmlRenderKitImpl] Unsupported component-family/renderer-type: org.jboss.seam.ui.UIGraphicImage/javax.faces.Image&lt;/span&gt;&lt;br /&gt;This appeared to be a known bug in seam 1.2 that was fixed in 1.3 -- but as near as I can tell 2.0 was release which followed 1.2.&lt;br /&gt;&lt;br /&gt;Upgrading to 2.0 (including the attendant jboss upgrade) solved this issue but broke some of the Eclipse tools, specifically the jboss tools HTML editor was unable to find any of the &lt;span style="font-family:courier new;"&gt;@Name &lt;/span&gt;references(I’ve seen this noted on some of the jboss forums, so I’m not alone on this).&lt;br /&gt;&lt;br /&gt;Since I hate to go forward with an IDE showing me a ton of warnings I decided to look at Netbeans since the 2.0 version of seam has different file structure from seam 1.x. Happily, this has made the file structure much more NetBeans friendly.&lt;br /&gt;&lt;br /&gt;I think the only thing I had to change to get it working in NetBeans was to set the deploy project to ‘restart’. I’m now up and running and in the process of modifying things work work as I want.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;One side note:&lt;/span&gt; I find that I’m liking NetBeans more than expected. It’s a nice tool, noteworthy is integration of NetBeans with subversion,  it is trivial to revert to earlier versions in the repository. In addition to repository saves NetBeans graciously keeps track of all ‘local’ saves  to the file system. This means that  whenever I get a file to a point where I can’t figure out what’s going on anymore, I can easily recover  -- a very nice touch, that I make use of all too often.&lt;br /&gt;&lt;br /&gt;&lt;/s:graphicimage&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-2599939572657590676?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/2599939572657590676/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=2599939572657590676' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/2599939572657590676'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/2599939572657590676'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2008/01/hello-seam.html' title='Hello Seam'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-6718486683521859035</id><published>2007-12-22T10:45:00.001-08:00</published><updated>2007-12-31T10:59:08.689-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><category scheme='http://www.blogger.com/atom/ns#' term='seam'/><category scheme='http://www.blogger.com/atom/ns#' term='jboss'/><title type='text'>jboss Seam?</title><content type='html'>Lately I have felt the need to get familiar with a new toolset for “industrial strength” applications.  There are a couple of factors driving this:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;1. The methods that I currently use to build Web 2.0 applications revolve around RoR. However, RoR doesn’t have a strong user community in my customer base (at least on deployed applications). In addition, I’m much more experienced with the Java stack and would be more confident that my proposed web 2.0 (and web 3.0) capabilities would work in production if I knew how to build them using a Java based/oriented toolkit.&lt;br /&gt;&lt;br /&gt;2. In thinking about a ‘next generation’ flexible product for the pharmaceutical/discovery space (more on this in a future post) I feel that one of the key components is a good workflow engine. (see FootNote). For a workflow engine to be effective in a scientific environment  it is important to be able to drop down into a standard language to perform any special case processing that might be required. This is more important in a scientific setting than in a general business application since a portion of the flow will likely depend upon an algorithmic analysis of the data -- using algorithms that have been newly defined to analyze this dataset. This requires a solid, well defined language suitable for general use, rather than an ad hoc scripting language possibly designed by someone without a background in designing computer languages. My early heuristic in this area was that if Guy Steele hadn’t written on it or had not been involved in it you shouldn’t use it. This heuristic doesn’t appear as useful as it once was e.g., I have heard good things about C#. However,  if you read through some of the language specs that Steele’s been involved with, you see what I mean: functionality that is essentially &lt;a href="http://en.wikipedia.org/wiki/Turing_complete"&gt;Turing Complete&lt;/a&gt;, well thought out exception handling and a clear explanation of operation order, especially around object and class instantiation.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Given these core requirements an obvious candidate is the jboss seam toolkit. Seam has one other interesting feature: the concept of a conversation to handle user context. From the &lt;a href="http://docs.jboss.com/seam/2.0.0.GA/reference/en/html/tutorial.html#d0e1568"&gt;doc&lt;/a&gt;   &lt;span style="font-style: italic;"&gt;But suppose we suddenly discover a system requirement that says that a user is allowed to have multiple concurrent conversations, halfway through the development of the system. &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Conversations appear to allow the support of users who want to have multiple browser windows open to look at different result sets, perform different activities etc.. This is definitely a ‘nice to have’ for scientific work -- I have found that users often want to look at and drill down on multiple data sets for comparison and analysis. A similar capability  as developed in my group at  Millennium by Vlado and David (two very talented guys) with their PageDataServlets framework. It is satisfying to note that PageDataServlets were developed back in 1999 and are still in use. Although it isn’t something that you would use in a new product (1999 was a while ago), it’s a testimony to their quality and depth that they are still in use and provide a capability that you see in very few web sites today.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;Seam is open source and has some unusual characteristics for an open-source project:&lt;br /&gt;1. Attractive Demos that work ‘out of the box’ (&amp;amp; I’m running an Intel Mac with Leopard, so that is saying something)&lt;br /&gt;2. Extensive documentation&lt;br /&gt;&lt;br /&gt;It also has the prerequisite active user and developer community aka it has enough momentum so that it won’t die soon&lt;br /&gt;&lt;br /&gt;So as a test,  I’m going to try to get a RoR project ported to seam and also develop a deNovo web 2.0 project in it -- more in my next post.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;FootNote:&lt;/span&gt; I have been saying that workflow tools are important for a while -- my group shipped an workflow based application based upon BEA’s &lt;a href="http://edocs.bea.com/wlintegration/v2_0/processintegrator/index.htm"&gt;process integrator&lt;/a&gt; platform back around the turn of the century. The workflow space has taken much longer to mature than I expected. The standards groups appear to have split at one point an merged again. BPEL is starting to be a standard feature in product brochures, so I’m hoping that it is not premature to start to use it again.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-6718486683521859035?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/6718486683521859035/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=6718486683521859035' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/6718486683521859035'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/6718486683521859035'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2007/12/jboss-seam.html' title='jboss Seam?'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-886495663574734614</id><published>2007-06-20T12:27:00.000-07:00</published><updated>2007-06-25T11:00:52.425-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='semantic web'/><category scheme='http://www.blogger.com/atom/ns#' term='RubyOnRails'/><category scheme='http://www.blogger.com/atom/ns#' term='RDF'/><title type='text'>Using Semantic Tools with real data</title><content type='html'>In one of my previous posts I mentioned the question “what do you gain from having the instance data present and available for processing by semantic web tools?” The core idea is that this would simplify the detection of modeling errors and highlight domain misunderstandings,  but I couldn’t be sure until I had tried it with data from a real system.&lt;br /&gt;&lt;br /&gt;I’ve finally been able to review the results from one such system and have convinced myself that there it is useful.&lt;br /&gt;&lt;br /&gt;A  bit of context-- doing this in a way that I considered valid required data from of a system that I understood well. The best available system was an “art submissions” application that tracks available artwork and its attributes.&lt;br /&gt;&lt;br /&gt;The data was extracted using my&lt;a href="http://rubyforge.org/projects/rdf-rails/"&gt; rdf_rails&lt;/a&gt; utility which generate  RDF and OWL files from a RoR application&lt;br /&gt;&lt;br /&gt;After trying this out for a bit, I have come to the conclusion there is utility here.&lt;br /&gt;&lt;br /&gt;The utility can be best demonstrated with a simple example (all examples are in &lt;a href="http://www.racer-systems.com/"&gt;RacerPro&lt;/a&gt;, see my previous post)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;(retrieve (?x )   (and (?x Artwork) (neg (?x  shipable-art)) (neg (?x  unshipable-art))))&lt;/span&gt;*&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This query retrieves all artworks that are neither shippable nor unshippable.&lt;br /&gt;&lt;br /&gt;Since the shippable-art &amp;amp; unshippable-art concepts were intended completely cover the Artworks category, any non-null result indicates either an error in domain modeling/understanding or data cleanliness, either of which should be detected prior to rolling out the model and the application that embodies it.&lt;br /&gt;&lt;br /&gt;I admit that dichotomous coverings are a simple case but think that it clearly demonstrates the value of the approach.&lt;br /&gt;&lt;br /&gt;While on the topic of semantic web tools it would be remiss of me to not mention the &lt;a href="http://spreadsheets.google.com/pub?key=pGFSSSZMgQNxIJUCX6VO3Ww"&gt;swtools spreadsheet&lt;/a&gt; at  which is a handy and thorough catalog of available semantic web tools&lt;br /&gt;&lt;br /&gt;Published by &lt;a href="http://www.mkbergman.com/?page_id=325"&gt;Michael K. Bergman&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-family:verdana;"&gt;Notes:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;* Note from the RacerPro documentation neg is a unary constructor, the negation as failure (NAF) negation. The argument is&lt;br /&gt;a query body.&lt;br /&gt;&lt;br /&gt;The concepts shippable-art and unshippable-art are defined as follows”&lt;br /&gt;&lt;span style=";font-family:courier new;font-size:85%;"  &gt;&lt;br /&gt;(define-concept shipable-art&lt;br /&gt;(or (boolean= Artwork_framed #T)&lt;br /&gt;(and&lt;br /&gt;(some Artwork_weight_pounds   (&lt;&gt; racer-internal%has-real-value  10) )&lt;br /&gt;(some Artwork_width_inches (&gt; racer-internal%has-real-value  10) )&lt;br /&gt;(some Artwork_height_inches (&gt; racer-internal%has-real-value  10) )&lt;br /&gt;(some Artwork_depth_inches (&gt; racer-internal%has-real-value  10) )&lt;br /&gt;))&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-886495663574734614?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/886495663574734614/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=886495663574734614' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/886495663574734614'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/886495663574734614'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2007/06/using-semantic-tools-with-real-data.html' title='Using Semantic Tools with real data'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-3385965740518869677</id><published>2007-06-18T11:50:00.000-07:00</published><updated>2007-06-18T11:55:27.280-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='OWL'/><category scheme='http://www.blogger.com/atom/ns#' term='semantic web'/><category scheme='http://www.blogger.com/atom/ns#' term='ontologies'/><category scheme='http://www.blogger.com/atom/ns#' term='RDF'/><title type='text'>RacerPro</title><content type='html'>I’ve been doing some ontology analysis using &lt;a href="http://www.racer-systems.com/"&gt;RacerPro&lt;/a&gt;. I decided to try out RacerPro primarily because my current modeling project uses numeric relationships to determine class membership. My quick scan of the available tools determined that RacerPro offered the best support in this space (any suggestions about other tools supporting such operations would be appreciated)&lt;br /&gt;&lt;br /&gt;RacerPro supports the use of numerics in two ways.&lt;br /&gt;&lt;br /&gt;The first is as a simple query e.g.,&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;(retrieve (?x) (?x  (and (&lt;=  Artwork_width_inches 10) (&lt;=  Artwork_height_inches 10))))&lt;/span&gt;&lt;br /&gt;;; aka retrieve every object for which the role (property) Artwork_width_inches &lt; 010 and Artwork_height_inches &lt;=  10&lt;br /&gt;&lt;br /&gt;The second is as a class/concept&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;(define-concept can-ship &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;(and &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;(some Artwork_width_inches (&lt;&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;(some Artwork_height_inches (&lt;&gt;&lt;br /&gt;&lt;br /&gt;Which is essentially the same only this time defining a concept rather than issuing a query.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;It is fair to say that the &lt;span style="font-style: italic;"&gt;racer-internal%has-real-value&lt;/span&gt; term did not leap out from the documentation. However the Racer technical support was both very accurate and extremely responsive, getting me up and running pretty quickly.&lt;br /&gt;&lt;br /&gt;All in all, I’m happy with the product.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-3385965740518869677?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/3385965740518869677/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=3385965740518869677' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/3385965740518869677'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/3385965740518869677'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2007/06/racerpro.html' title='RacerPro'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-3738380341748934435</id><published>2007-05-29T07:13:00.000-07:00</published><updated>2007-05-29T07:16:34.133-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='OWL'/><category scheme='http://www.blogger.com/atom/ns#' term='semantic web'/><category scheme='http://www.blogger.com/atom/ns#' term='RubyOnRails'/><category scheme='http://www.blogger.com/atom/ns#' term='ontologies'/><category scheme='http://www.blogger.com/atom/ns#' term='RDF'/><title type='text'>rdf_rails on rubyforge</title><content type='html'>I just posted a project, &lt;a href="http://rdf-rails.rubyforge.org/"&gt;rdf-rails&lt;/a&gt;, up on RubyForge that will take a RoR application and convert it to OWL/RDF (including all of the instance data)&lt;br /&gt;&lt;br /&gt;This is very much an  alpha level project -- I only had one RoR application available for testing.&lt;br /&gt;&lt;br /&gt;If you’d like to give it a quick spin I’d appreciate any feedback.&lt;br /&gt;&lt;br /&gt;I’m interested in two types of feedback&lt;br /&gt;&lt;br /&gt;The first is the conventional software testing: bugs, missing features, inadequate documentation etc.&lt;br /&gt;&lt;br /&gt;The second revolves around the utility of the function -- what do you gain from having the instance data present and available for processing by semantic web tools?  My postulate is that this would allow a more thorough understanding of the data constituting the domain. An application could then be initially deployed with minimal constraints while the domain is still under exploration. Then, as experience is gained and the system has been populated with data, intuitions about the ontology of the domain could be validated against the data.&lt;br /&gt;&lt;br /&gt;I haven’t  fully convinced myself of this as yet. It still appears possible, but the compelling example hasn’t yet surfaced. I would like to hear any (positive or negative)   experiences that you’ve had in this area.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-3738380341748934435?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/3738380341748934435/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=3738380341748934435' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/3738380341748934435'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/3738380341748934435'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2007/05/rdfrails-on-rubyforge.html' title='rdf_rails on rubyforge'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-8451009550810834164</id><published>2007-05-02T16:37:00.000-07:00</published><updated>2007-05-09T10:04:20.434-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software'/><category scheme='http://www.blogger.com/atom/ns#' term='BioIT World'/><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><category scheme='http://www.blogger.com/atom/ns#' term='CDISC'/><title type='text'>Architecture and Spreadsheets</title><content type='html'>I gave a presentation May 1st at BioIT World 2007 as part of a panel&lt;span style="font-weight: bold; font-style: italic;"&gt; Interoperability an&lt;/span&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;d Standards: Progress towards an Industry Architecture&lt;/span&gt; in which I  develop an example of the evolution and increasing scope of a spreadsheet to show the ramifications of software engineering principles, software architecture and eventually industry architecture upon bringing timely data to a cell in your spreadsheet.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.rdfsg.com-a.googlepages.com/RDFSG_BioIT_2007.pdf"&gt;Here’s a link to the PDF&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-8451009550810834164?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/8451009550810834164/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=8451009550810834164' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8451009550810834164'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/8451009550810834164'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2007/05/architecture-and-spreadsheets.html' title='Architecture and Spreadsheets'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-584170883525640993</id><published>2007-04-12T11:57:00.000-07:00</published><updated>2007-04-12T12:03:25.607-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='semantic web'/><category scheme='http://www.blogger.com/atom/ns#' term='Ruby'/><category scheme='http://www.blogger.com/atom/ns#' term='RubyOnRails'/><category scheme='http://www.blogger.com/atom/ns#' term='RDF'/><title type='text'>RDF and RubyOnRails</title><content type='html'>Haven’t posted in a while -- I’ve been working on a ruby utility that generates RDF and RDFS files from a Ruby On Rails application.&lt;br /&gt;&lt;br /&gt;The goal is to enable the use of semantic tools to explore the underlying structure of your data, using the built in-capabilities of these tools to quickly discover contradictions between postulated structure and the actual data. e.g., “I think that every event only has a single location”&lt;br /&gt;&lt;br /&gt;In this scenario, one uses the utility to build a base ontology (including instances) from the production system. An application could then be deployed with relatively loose constraints, with the constraints being refined after the application has had the opportunity to accumulate real data. This is especially useful where the character of the data is not well understood/controversial at the time of the initial deployment.&lt;br /&gt;&lt;br /&gt;The project has a useful side benefit of allowing me to gain a better understanding of both Ruby and the RoR framework (see the note on class loading below)&lt;br /&gt;&lt;br /&gt;The good news is that it was relatively straightforward to get .rdf and .rdfs files generated so that I could read them into Protege (which may not be the best tool for this, but it is one with which I’m familiar and has a substantial amount of documentation)&lt;br /&gt;&lt;br /&gt;The bad news is that I couldn’t add any refinements to the an rdf/rdfs project in Protege -- it requires a ‘real ontology’ e.g., an .owl file (which is obvious in retrospect but.....)&lt;br /&gt;&lt;br /&gt;I’m now in the process of generating the .owl file.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Class Loading: The classes in a  Ruby On Rails application are not loaded when the console is started. ActiveRecord subclasses are loaded on demand from a hook added to the missing_const exception&lt;br /&gt;&lt;br /&gt;The way it is handled is interesting so I’m reproducing it here&lt;br /&gt;(from le: /usr/local/lib/ruby/gems/1.8/gems/activesupport-1.4.1/lib/active_support/dependencies.rb in my installation)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;class Module #:nodoc:                                                                                                  &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  # Rename the original handler so we can&lt;br /&gt;# chain it to the new one                                                      &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  alias :rails_original_const_missing :const_missing                                                                   &lt;/span&gt;&lt;br /&gt;                                                                                                                 &lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  # Use const_missing to autoload associations&lt;br /&gt;# so we don’t have to                                                     &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  # require_association when using single-table inheritance.                                                           &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  def const_missing(class_id)                                                                                          &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    Dependencies.load_missing_constant self, class_id                                                                  &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  end                                                                                                                  &lt;/span&gt;&lt;br /&gt;                                                                                                                 &lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  def unloadable(const_desc = self)                                                                                    &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    super(const_desc)                                                                                                  &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  end                                                                                                                  &lt;/span&gt;&lt;br /&gt;                                                                                                                 &lt;br /&gt;&lt;span style="font-family:courier new;"&gt;end&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-584170883525640993?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/584170883525640993/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=584170883525640993' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/584170883525640993'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/584170883525640993'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2007/04/rdf-and-rubyonrails.html' title='RDF and RubyOnRails'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-6015513353787304868</id><published>2007-02-05T09:58:00.000-08:00</published><updated>2007-02-05T09:59:53.605-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='microformats'/><category scheme='http://www.blogger.com/atom/ns#' term='RDF'/><category scheme='http://www.blogger.com/atom/ns#' term='mashupcamp3'/><category scheme='http://www.blogger.com/atom/ns#' term='mashup'/><title type='text'>Microformats,  RDF and Life Sciences</title><content type='html'>I attended the microformats session at mashupcamp3 the week of Jan 17th which had an interesting sidebar about how microformats are not RDF.&lt;br /&gt;&lt;br /&gt;For those  unfamiliar with microformats the core reference appears to be http://microformats.org/. Microformats consist of ~10 specifications plus approximately the same number of draft specifications&lt;br /&gt;&lt;br /&gt;The sidebar centered upon the fact that a lot of real work can get done with micro formats but some people particularly in the biology/life sciences domain are very heavily committed to RDF. The question arose as to why. There wasn’t enough time remaining (and probably not enough interest) to explore the why.&lt;br /&gt;&lt;br /&gt;Looking at the adopted and proposed formats I am struck my a few things. The first is that they each capture a nice nugget of functionality: calendars, contact information, news feeds etc. The second is that they are designed to capture the common cases while avoiding the complexity of handling the uncommon situations, which I think is a good thing.&lt;br /&gt;&lt;br /&gt;The nice side effect of this aesthetic is that you can be up and running quickly doing real work, exchanging information etc. The bad side effect is that if you need to do something more complicated the hooks don’t exist to allow you to describe what’s going on.&lt;br /&gt;&lt;br /&gt;In general, the more unstructured you are willing to be, the easier it is to capture all of the information. The difficulty arises when you try to curate it or use it in another context. As an example, think of the initial attempt that often appears in a database design: a single table consisting of N text fields. It can work and can hold pretty much anything. Detecting duplicates and understanding the structure come later if at all. However, depending upon the scale and use of the information this may just be fine.&lt;br /&gt;&lt;br /&gt;In my opinion, at the enterprise level we sometimes overemphasize the scale and structural integrity issues. Scale can be a big deal if you’re trying to achieve perfect reconciliation of information. If “good enough” is OK, large-scale integration can be achieved in practice with very unstructured data. A good example of this is Hype Machine http://hype.non-standard.net/  (which was demo’d at mashupcamp3) which mines music blogs effectively -- something that I would have thought impossible, given all the issues around spelling, new band names etc. It works partly because the problem is to find something rather than to find everything with complete accuracy.&lt;br /&gt;&lt;br /&gt;At a deeper level what is more characteristic of the areas addressed by microformats is that we can develop a good understanding of what’s going on from our intuitions of how the world works and these intuitions should be able to cover a good number of the situations that we will actually encounter.&lt;br /&gt;&lt;br /&gt;In life sciences this is simply not the case. Our intuitions are often wrong, in the clinical area the number of potential confounding factors is immense (e.g.  http://gforge.nci.nih.gov/docman/view.php/53/2278/MedDRA_Source_Information.html&lt;br /&gt;lists the MEDRA term count as 65872 )  so there is an understandable push to design formats that are extensible and can capture information in well defined and reusable ways.&lt;br /&gt;&lt;br /&gt;However, microformats and mashups in general do raise the question “Is this a case of the best driving out the good?” Despite my bias towards ‘scalability’ and ‘enterprise solutions’, it is hard to argue with standing up an application in a few hours that provides some utility to end users (and some real data on use etc.). Even if requires some real work to migrate to a more scalable application when it comes online.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-6015513353787304868?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/6015513353787304868/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=6015513353787304868' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/6015513353787304868'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/6015513353787304868'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2007/02/microformats-rdf-and-life-sciences.html' title='Microformats,  RDF and Life Sciences'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-1019824807166875741</id><published>2007-01-22T18:34:00.000-08:00</published><updated>2007-01-23T09:34:29.561-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='semantic web'/><category scheme='http://www.blogger.com/atom/ns#' term='ontologies'/><category scheme='http://www.blogger.com/atom/ns#' term='RDF'/><title type='text'>RDF vs Ontologies</title><content type='html'>The way I look at it, &lt;span onclick="BLOG_clickHandler(this)" class="blsp-spelling-error" id="SPELLING_ERROR_0"&gt;RDF&lt;/span&gt; talks about what you have, &lt;span onclick="BLOG_clickHandler(this)" class="blsp-spelling-error" id="SPELLING_ERROR_1"&gt;Ontologies&lt;/span&gt; talk about what you can have. The combination of ontology and the data can then be fed into various reasoning engines to tease out the implications of your data.&lt;br /&gt;&lt;br /&gt;This is pretty scary. Given my assumption that the science advances and the thinking  about what you can have will change over time, incorporating inferred “facts” leaves one open to fundamental system instability.&lt;br /&gt;&lt;br /&gt;My classic example in this is “sorry, we don’t really mean one protein per gene anymore.” The ontology and the  implications drawn from &lt;span onclick="BLOG_clickHandler(this)" class="blsp-spelling-error" id="SPELLING_ERROR_2"&gt;inferencing&lt;/span&gt; upon the data are wrecked but identifiers for gene, protein, transcription etc are unaffected.&lt;br /&gt;&lt;br /&gt;&lt;span onclick="BLOG_clickHandler(this)" class="blsp-spelling-error" id="SPELLING_ERROR_3"&gt;RDF&lt;/span&gt; at its most basic, gives  you stable identifiers for what you have and allows the declaration of “stable” relationships between these objects. This allows you to communicate clearly about what you have and (possibly) easily version it when the time comes to change (see below). These statements should remain valid even if the ontology which they are thought to be embedded in changes radically.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;&lt;span onclick="BLOG_clickHandler(this)" class="blsp-spelling-error" id="SPELLING_ERROR_4"&gt;Versioning&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Thinking at the &lt;span onclick="BLOG_clickHandler(this)" class="blsp-spelling-error" id="SPELLING_ERROR_5"&gt;RDF&lt;/span&gt; triple level also allows a low overhead means of &lt;span onclick="BLOG_clickHandler(this)" class="blsp-spelling-error" id="SPELLING_ERROR_6"&gt;versioning&lt;/span&gt; your information in a manner analogous to the &lt;span onclick="BLOG_clickHandler(this)" class="blsp-spelling-error" id="SPELLING_ERROR_7"&gt;ZFS&lt;/span&gt; file system (http://www.opensolaris.org/os/community/zfs/)  and its built in “copy-on-write” facility.&lt;br /&gt;&lt;br /&gt;If I remember correctly this copy-on-write is performed at the disk block level rather than at the file level. This is thought to be the basis for Apple’s “time machine” capability (in the next release of the  OS X). The system  can just look back and determine the valid blocks at a particular time and reconstitute the file to appear as it did at that time.&lt;br /&gt;&lt;br /&gt;A similar functionality could be made to work at  the &lt;span onclick="BLOG_clickHandler(this)" class="blsp-spelling-error" id="SPELLING_ERROR_8"&gt;RDF&lt;/span&gt; triple level. Triples that changed would be “overwritten” but the old information would still be available with a timestamp of valid from/to dates. It is easy to overlay some provenance information on top (similar techniques are  used in data warehouses to allow clear tracking of when information was updated/corrected).&lt;br /&gt;&lt;br /&gt;This turns the  fine grained structure of &lt;span onclick="BLOG_clickHandler(this)" class="blsp-spelling-error" id="SPELLING_ERROR_9"&gt;RDF&lt;/span&gt; into a feature  that provides similar advantage to what is seen in these file systems: the incremental disk space required for &lt;span onclick="BLOG_clickHandler(this)" class="blsp-spelling-error" id="SPELLING_ERROR_10"&gt;versioning&lt;/span&gt; can be very small -- to keep multiple versions of a documents just requires the incremental &lt;span onclick="BLOG_clickHandler(this)" class="blsp-spelling-error" id="SPELLING_ERROR_11"&gt;diskspace&lt;/span&gt; ~= the size of the changes and is independent of the document size.&lt;br /&gt;&lt;br /&gt;I have to admit a certain hesitance in saying  “&lt;span onclick="BLOG_clickHandler(this)" class="blsp-spelling-error" id="SPELLING_ERROR_12"&gt;RDF&lt;/span&gt; is good” (aside from the &lt;span onclick="BLOG_clickHandler(this)" class="blsp-spelling-error" id="SPELLING_ERROR_13"&gt;rdf&lt;/span&gt; represented by my initials). I have not implemented an &lt;span onclick="BLOG_clickHandler(this)" class="blsp-spelling-error" id="SPELLING_ERROR_14"&gt;RDF&lt;/span&gt; based system, I plan to do one in the next month or so and will update.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-1019824807166875741?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/1019824807166875741/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=1019824807166875741' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/1019824807166875741'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/1019824807166875741'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2007/01/rdf-vs-ontologies.html' title='RDF vs Ontologies'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-1191422902755204872</id><published>2007-01-15T06:04:00.000-08:00</published><updated>2007-06-18T14:05:38.400-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='tagging'/><category scheme='http://www.blogger.com/atom/ns#' term='semantic web'/><category scheme='http://www.blogger.com/atom/ns#' term='ontologies'/><category scheme='http://www.blogger.com/atom/ns#' term='data integration'/><title type='text'>Data Integration and Ontologies</title><content type='html'>Ontologies&lt;br /&gt;&lt;br /&gt;It is useful to think about three types of data integration&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Type 1.&lt;/span&gt; &lt;span style="font-style: italic;"&gt;Document level &lt;/span&gt;-- the user can determine what documents might have information of interest&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-weight: bold;"&gt;Type 2. &lt;/span&gt;Term level&lt;/span&gt; -- the user can build reports using items from multiple documents/systems e.g., each cell in a spreadsheet can come from different systems.&lt;/li&gt;&lt;li&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-weight: bold;"&gt;Type 3.&lt;/span&gt; Inference level&lt;/span&gt; -- terms from one or more documents/systems can be combined to derive information (new terms) in the system being examined.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Both the functionality and ontological commitment increases from type 1 to type 3 systems.&lt;br /&gt;&lt;br /&gt;The increasing level of ontological commitment as perceived by a user of the system appears as follows&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Type 1:&lt;/span&gt; There is something here which may be meaningful,&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Type 2: &lt;/span&gt;If something does exist it is meaningful&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Type 3:&lt;/span&gt; The implications of a thing’s existence is meaningful&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Rather than attempting to determine the costs of these systems a priori let’s look at some examples&lt;br /&gt;&lt;br /&gt;The simplest &lt;span style="font-weight: bold; font-style: italic;"&gt;Type 1&lt;/span&gt;system involves simply placing documents in a file system; with some attention to naming and structure this system allows the contents to be easily retrieved and accurately assessed. However anecdotal and personal experience shows that the retrievability of the information degrades over time and it does not scale beyond small collections of items. This degradation stems in part from the fact that these systems allow &lt;span style="font-style: italic;"&gt;only a single axis of retrieval&lt;/span&gt; based upon the heuristics embedded in the (path)name of the files.&lt;br /&gt;&lt;br /&gt;The next level up in complexity for &lt;span style="font-weight: bold; font-style: italic;"&gt;Type 1 &lt;/span&gt;systems is the web and file/url tagging systems. Such systems continue to make few &lt;span style="font-style: italic;"&gt;a priori&lt;/span&gt; claims for the utility of the retrieved items but the use of search engines and URL tagging allow for multiple axes of queries to be retrieved based upon either the algorithms embedded in the search engines or the tags  and the sources of those tags. Local file system supporting tags allow the users to (eventually) retrieve their tag definitions either via introspection or an examination of other documents containing suspected tags.&lt;br /&gt;&lt;br /&gt;Some of the terminology limitations of having free text tags are alleviated by the fact that the items being tagged are urls/files and therefore unique and retrievable. Retrieving and examining the tagged information allows one to assess the information content  (retrieval power) of each tag in the context of the current search. Tagging has been getting a lot of traction on the web with sites such as del.icio.us http://del.icio.us/ ,  shadows http://shadows.com/ and flickr http://www.connotea.org/ appearing as popular web tools for gathering and sharing tags (social bookmarking). Similarly modern file systems allow tagging of files, directories and applications for images and other media content allow sorting and management of media files via tags&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Type 2&lt;/span&gt; Systems: Term level integration is the provence of what is commonly called enterprise integration, which allow reporting and integration of applications within the enterprise (Enterprise Application Integration -- EAI). In practice achieving integration requires stability of the term referents and their use in communication between systems. The stability is what might be termed “stability of use. Commonly the more general the use the more restricted the interface. This involves a conscious decision to “narrow” the functionality when moving from the internal data model to the published interface, often restricting the interface to data transfer objects with a limited number of attributes.&lt;br /&gt;&lt;br /&gt;Examples include Service Oriented Architectures (SOA’s) with well defined semantics and ways of modifying them over time. Changes to public information require either explicit revision control and/or verification with all stakeholders that any changes will operate as expected. In general, the “wider” the interface the more frequently the verification is required, with the concomitant drag on system agility.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Type 3 &lt;/span&gt;Systems Semantic/Inference level integration: Allows inference of new data from existing/newly added information e.g., IF &lt;span style="font-weight: bold;"&gt;A&lt;/span&gt; AND &lt;span style="font-weight: bold;"&gt;B&lt;/span&gt; THEN &lt;span style="font-weight: bold;"&gt;C&lt;/span&gt; can be inferred. This can cascade into  IF &lt;span style="font-weight: bold;"&gt;B&lt;/span&gt; AND &lt;span style="font-weight: bold;"&gt;C&lt;/span&gt; THEN &lt;span style="font-weight: bold;"&gt;D&lt;/span&gt; etc. This is a very strong ontological commitment that requires understanding the implications of the complete set of constraints and inferential mechanisms in the system. The payoff is substantial in that it becomes possible to infer a great deal from just a few additional pieces of information.  This does however represent a significant “widening” of the interface, with potentially severe implications for system verification and evolution.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Practical Implications&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In&lt;span style="font-weight: bold; font-style: italic;"&gt; Type 1 &lt;/span&gt;systems imply no ontological fanout from a local commitment and so it is possible to spontaneously evolve the “definition in use” (“ in use” signifies that there is no requirement for an analytic definition) since the definitions are mostly manual or derived from manual definitions.&lt;br /&gt;&lt;br /&gt;On the other hand, for change to occur in &lt;span style="font-weight: bold; font-style: italic;"&gt;Type 2 &lt;/span&gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt;and&lt;span style="font-weight: bold; font-style: italic;"&gt; Type 3 &lt;/span&gt;systems, the implications of the change must be understood for downstream systems that rely upon this changed information as an integral part of an automated process.&lt;br /&gt;&lt;br /&gt;In&lt;span style="font-weight: bold; font-style: italic;"&gt; Type 2 &lt;/span&gt;systems there is a restricted ontological commitment which requires that changes be verified with systems that couple with the system being changed. The analysis is restricted since the change occurs through a restricted interface the analysis is similarly constrained. All else being equal the greater the functionality and use of the data, the greater the analysis that must be preformed.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Type 3&lt;/span&gt; systems, with their ability to build inferences upon new data, have the largest analysis burden of any of these systems. This implies that they will be the least amenable to revision. There is a possiblity that the use of ontologies will assure that the implications of changes are a priori  accounted for. Thus any changes consistent with the ontology will easily be integrated. The problem is that  there is no historical precedent for developing stable systems of this type.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-1191422902755204872?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/1191422902755204872/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=1191422902755204872' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/1191422902755204872'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/1191422902755204872'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2007/01/data-integration-and-ontologies.html' title='Data Integration and Ontologies'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-6809232709220376069</id><published>2007-01-10T06:42:00.000-08:00</published><updated>2008-12-09T01:37:47.838-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software'/><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><category scheme='http://www.blogger.com/atom/ns#' term='scientific'/><category scheme='http://www.blogger.com/atom/ns#' term='applications'/><title type='text'>Aspects of a Platform Architecture: Part 2 - Evolution of a Platform</title><content type='html'>Again, the goal is to allow a small number of applications, that share some core processes/entities to interact in a loose way and ship and evolve as independently as possible in the face of changing science, user needs, infrastructure and developer/application allocation.&lt;br /&gt;&lt;br /&gt;In the last section, I talked about what a platform architecture looks like as a static entity. However the reason for having a platform architecture is the evolution of the application suite over time.&lt;br /&gt;&lt;br /&gt;With stable identifiers and the architectural components discussed previously, I have found that middleware can be used as a tool for coherent platform development. Again, I’m basing everything on stable identifiers.Without stable identifiers everything is very hard, with them some things are possible, sometimes even easy.  In the absence of stable identifiers it is hard for any common substrate to get a leverage point that allows a clear value add to all of the products under development.&lt;br /&gt;&lt;br /&gt;My definition of Middleware is a bit broader that that in wikipedia&lt;br /&gt;&lt;br /&gt;http://en.wikipedia.org/wiki/Middleware : &lt;span style="font-style: italic;"&gt;Middleware is the enabling technology of Enterprise application integration. It describes a piece of software that connects two or more software applications so that they can exchange data.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;For me middleware is also a place to incorporate the cross application business logic that allows users to interact with and see the data in a consistent manner. The middleware also shields custom applications from the hidden details of the database(s) or other persistent storage etc..&lt;br /&gt;&lt;br /&gt;Some real life examples that I’ve seen in the life sciences include a situation in which the same substance may two different identifiers, the other is where the data is to be combined using a non-trivial algorithm e.g., geometric mean with outlier removal. In both cases the middleware served as the foundation for consistency since  it was critical that all of the applications present the same information to users, and that there is only one implementation of the retrieval/calculation methods (for any non-trivial application the result given by two different implementations can skew over time).&lt;br /&gt;&lt;br /&gt;Some of the considerations around method naming, signatures etc. are shared with library design and development. The best resource I know for addressing those considerations is &lt;span style="font-weight: bold;"&gt;Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;by Krzysztof Cwalina&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;A conventional diagram of such a system appears below&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_uhpaSaKsmiM/RaT9DnkWcTI/AAAAAAAAAAM/bpw58gYeBlM/s1600-h/substrate_diagram_1.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_uhpaSaKsmiM/RaT9DnkWcTI/AAAAAAAAAAM/bpw58gYeBlM/s320/substrate_diagram_1.jpg" alt="" id="BLOGGER_PHOTO_ID_5018414123400458546" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;A more accurate diagram, given the goal of supporting rapid system evolution is&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_uhpaSaKsmiM/RaUAjnkWcUI/AAAAAAAAAAY/b224uTF7y4o/s1600-h/substrate_diagram_2.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_uhpaSaKsmiM/RaUAjnkWcUI/AAAAAAAAAAY/b224uTF7y4o/s320/substrate_diagram_2.jpg" alt="" id="BLOGGER_PHOTO_ID_5018417971691155778" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Where the red links show ad-hoc connections which to support rapid development. My preference is to have the middleware be the responsibility of a single person, as it is the key leverage point for the long term evolution of the architecture.&lt;br /&gt;&lt;br /&gt;This person is given the time not only to evaluate architectural ideas that come in from other members of the team who may have implemented solutions that should be made available to others in a slightly generalized fashion but also to examine what’s going on in the industry as far as standards,  toolkits etc that will help long term product evolution. In addition, for the ad-hoc connections and implementations to be capable of being moved to the middleware as described below, it is important for this person to have influence upon the design of these interfaces. I’ve found it uniformly tempting for the application owners to embed too much information into their interfaces so as to simplify their short term development, thereby hindering long term development.&lt;br /&gt;&lt;br /&gt;Still what I’ve described so still sounds a bit static -- how does it play out over time?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Evolution proceeds as follows:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Even if we start with the Platonic "conventional diagram” shown above, it will quickly evolve into something along the lines of the more realistic version which shows that some ad-hoc connections that have evolved over time to give the individual applications the flexibility to meet their requirements.&lt;br /&gt;&lt;br /&gt;The next release of the middleware (1.1) is picked up by the Ensemble Review application. In this case the release supports functionality that had required and “outside the box” access solution for the Ensemble Review and so its access can now occur through the middleware. Green arrows show functionality that has moved to the middleware with the release shown.&lt;br /&gt;&lt;br /&gt;The arrow from the Small Group Drill Down application to the Ensemble Review app is shown as now going through the middleware since my practice was to ship the middleware as a labeled jar rather than a web service. Although this had the downside of increasing the footprint of each application it did allow the the interface to remain very transparent.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_uhpaSaKsmiM/RaUA4HkWcVI/AAAAAAAAAAg/9rzXIWlKCvc/s1600-h/substrate_diagram_3.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_uhpaSaKsmiM/RaUA4HkWcVI/AAAAAAAAAAg/9rzXIWlKCvc/s320/substrate_diagram_3.jpg" alt="" id="BLOGGER_PHOTO_ID_5018418323878474066" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;The next release of the middleware (1.2) is picked up by the New Data Requests application. We now have three versions of the middleware in production, each application has shipped independently, but they are all moving in the same direction and there has been no forking for feature support -- forking for bugs is of course possible.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_uhpaSaKsmiM/RaUA4HkWcWI/AAAAAAAAAAo/5lYb6gpQJgo/s1600-h/substrate_diagram_4.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_uhpaSaKsmiM/RaUA4HkWcWI/AAAAAAAAAAo/5lYb6gpQJgo/s320/substrate_diagram_4.jpg" alt="" id="BLOGGER_PHOTO_ID_5018418323878474082" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;and of course as shown below an application can pick up the latest version without requiring any improvements.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_uhpaSaKsmiM/RaUA4HkWcXI/AAAAAAAAAAw/EJ29V1FopsA/s1600-h/substrate_diagram_5.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_uhpaSaKsmiM/RaUA4HkWcXI/AAAAAAAAAAw/EJ29V1FopsA/s320/substrate_diagram_5.jpg" alt="" id="BLOGGER_PHOTO_ID_5018418323878474098" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;And then the cycle repeats. The only time that a “synchronized ship” (that is when all applications ship to production simultaneously with the same version of the middleware) is required  is when there is an incompatible structural change to the shared data structures or a core business process/algorithm. At this point everyone picks up the same version of the middleware, the shared data store is migrated and extensive testing occurs.&lt;br /&gt;&lt;br /&gt;The advantages of this sort of approach include:&lt;br /&gt;The testing time for each application is reduced. An application need not pick up a version of the middleware if the new version doesn’t provide any functionality/bug fixed that it requires (an underlying assumption is that there is adequate regression test coverage to  assure that the functionality required by the the application is not broken in the new release).&lt;br /&gt;&lt;br /&gt;When an application needs the new functionality it upgrades to the current revision.&lt;br /&gt;This also means that the application is not dependent upon the new middleware functionality is not governed by its timelines shipping&lt;br /&gt;&lt;br /&gt;This has the additional benefit of allowing middleware releases to be more focussed on the need of a particular product, or to engage the product architect to support early testing of a feature that will help them.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-6809232709220376069?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/6809232709220376069/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=6809232709220376069' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/6809232709220376069'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/6809232709220376069'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2007/01/evolution-of-platform.html' title='Aspects of a Platform Architecture: Part 2 - Evolution of a Platform'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_uhpaSaKsmiM/RaT9DnkWcTI/AAAAAAAAAAM/bpw58gYeBlM/s72-c/substrate_diagram_1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4522919003955435321.post-433127090038715485</id><published>2007-01-10T04:54:00.000-08:00</published><updated>2008-02-16T10:14:45.474-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='software'/><category scheme='http://www.blogger.com/atom/ns#' term='architecture'/><category scheme='http://www.blogger.com/atom/ns#' term='scientific'/><category scheme='http://www.blogger.com/atom/ns#' term='applications'/><title type='text'>Aspects of a Platform Architecture: Part 1</title><content type='html'>How does one create a “reasonable” platform architecture? By which I mean, how does one put a system in place that allows a small number of (~ 10) related products to evolve and ship independently while building upon an infrastructure substrate to allow for common policies, consistent data access, and presentation.&lt;br /&gt;&lt;br /&gt;First to be clear on the high level goals what is a Enterprise platform architecture&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Multiple products&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;different developers, different use cases (users may overlap),  same general business area&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Multi-year timelines&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;changing developers, resourcing budgets, technologies,  even the rate of change will all change.&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Enterprise level&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Multiple applications serving a particular group of areas within a business, which  are different from  issues involved in building an industry platform, Windows, Spring, Linux etc.&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Common views upon the data&lt;/span&gt;&lt;ul&gt;&lt;li&gt;When the expectation around the data is the same, the result used/displayed should be the same, no matter how complex the processing is to derive it.&lt;/li&gt;&lt;li&gt;Ability to deviate when the expectation is unique or novel. Achieving the responsiveness required for iterative/agile techniques requires the ability to “special case”  data access and processing methods. This allows these special cases to be well grounded before they are moved into the substrate (as appropriate).&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Products should be normally  able to ship independently&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Products should be able to communicate&lt;/span&gt; between each other so that users can switch between applications (web based in this case) when desired with some minimal context being maintained.&lt;br /&gt;&lt;br /&gt;Building this requires creation of a system consisting of a&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Data Architecture&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;How do you identify what you’re looking for, where do you find it?&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Application Roadmap&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;What application should own the functionality, and when will it be deployed?&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Technology Architecture&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;What are the technologies that we’ve settled upon, how are we exploring new ones, how do we decide what to explore/who does it.&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;Functional&lt;/span&gt; &lt;span style="font-weight: bold; font-style: italic;"&gt;Architecture&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;How are we structuring the functionality and identifying common components, how much of the implementation can we hide, what are the hiding requirements (timeliness etc.)&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;span style="font-weight: bold; font-style: italic;"&gt;The first step, a Data Architecture foundation.&lt;/span&gt;&lt;br /&gt;Select a small set of entities and assure that they have stable, anonymous identifiers.  This is surprisingly difficult when building systems out of existing products in scientific domains.&lt;br /&gt;&lt;br /&gt;I have been involved in frequent discussions with users who wish to embed domain information in the identifiers so that they can easily identify what they are holding in their hands without needing to go to a computer to find out what it is, or worst case wait for an application to be developed that will let them go to a computer to find out what it is -- a.k.a. every user’s worst nightmare. The best solution I’ve come up with is to print both when necessary.&lt;br /&gt;&lt;br /&gt;This selection is also constrained by existing/off the shelf systems and what they can support. Selecting internal identifies from the core of these systems and building synonym tables is a reasonable compromise.&lt;br /&gt;&lt;br /&gt;The issues of the remaining Data Architecture plus the other aspects of and Application Roadmap, Technology Architecture  and Functional Architecture are important. However, given our multi product, multi-year, frequent revision goals, the issues are as much about building an infrastructure that supports a culture as much as building  an architecture.  Such a culture involves setting minimal contracts and some processes around their evolution more than instantiating any particular set. The core expectation is that the platform will outlive any applications and it is at the platform level that practices around technology adoption, quality practices, and user interaction should be set. &lt;span style="font-weight: bold; font-style: italic;"&gt;Note:&lt;/span&gt;  this is not meant to imply that the practices should be uniform across the applications, but the patterns of &lt;span style="font-style: italic;"&gt;use categories&lt;/span&gt; vs &lt;span style="font-style: italic;"&gt;development strategy&lt;/span&gt; should be set at the platform level.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/4522919003955435321-433127090038715485?l=rdfsg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://rdfsg.blogspot.com/feeds/433127090038715485/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=4522919003955435321&amp;postID=433127090038715485' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/433127090038715485'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4522919003955435321/posts/default/433127090038715485'/><link rel='alternate' type='text/html' href='http://rdfsg.blogspot.com/2007/01/aspects-of-platform-architecture-part-1.html' title='Aspects of a Platform Architecture: Part 1'/><author><name>rdf</name><uri>http://www.blogger.com/profile/04410128869406318890</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='27' height='32' src='http://1.bp.blogspot.com/_uhpaSaKsmiM/S42AB_Uaz8I/AAAAAAAAAHI/YkvlVjFcmpY/S220/rdf+on+2010-03-02.jpg'/></author><thr:total>0</thr:total></entry></feed>
