Monday, November 9, 2009

Seam + {jboss 5.0 | jboss 5.1} = ?

About a month ago I completed the upgrade of all my Macs to Snow Leopard. This generally went smoothly -- not a surprise for a release that has been charactered as "more refinement than upgrade."

One exception was a jboss 4.2.2/seam 2.1 application that would hang when generating a list view of objects that included images. My initial reaction was that this provided an opportunity to upgrade to jboss 5.x and partake of whatever enhancements that offered.

This proved to be a task that ended in frustration. I spent ~ 40 hours on it and eventually gave up. I fell back to the earlier version and upgraded to jboss 4.2.3 which solved the problem (which I think was related to using Java 1.6).

I thought I'd share some of my experiences, just in case someone else finds it useful:

The first glitch was that the version of seam I was running didn't appear to work with jboss 5.x so I upgraded to version 2.2

The attendant upgrades caused me to change some of the DB mappings
change blob annotations (mysql specified)
From: @Column(name = “data”, length = 8000000)
To: @Column(name = “data”, length = 8000000, columnDefinition = “mediumblob”)

I also tried switching to jboss developer studio to see if that would help me uncover the problem -- this had no real impact.

The core symptom was that nothing was coming back from pages that generated a list of items in the DB and no Hibernate queries showed up in the back end stream.

I eventually tried to go in to one of the more "internal" pages

(a real advantage of Seam's rest interface) and finally saw a hibernate query on the background stream with the warning:

WARN [Param] could not create converter for: competitionId
javax.el.PropertyNotFoundException: Target Unreachable, identifier ‘competitionHome’ resolved to NULL

This warning was similar to the error I was getting that the authenticate method resolved to NULL.

What appeared to be happening is that the seam annotations weren't being processed correctly (specifically @Name("competitionHome") ).

After searching on this error I found this link
which made me think that things are basically broken.

As I said, rolling back to the original code, the problems went away in jboss 4.2.3. However, I must admit that I'm surprised that the issue exists in the newer versions of seam/jboss

5.0.0.GA Stable 104 MB 2008-12-05 LGPL 134971 Download Notes
5.1.0.GA Stable 130 MB 2009-05-23 LGPL 181731 Download Notes
JBoss Seam 2.2 2.2.0.GA Production 111 MB 30.07.2009 LGPL Notes Download

I know that this is "unsupported" code, but I would still think that there would be better testing. After all, 5.1 was out in May of this year and Seam 2.2 claims

Seam 2.2 examples target JBoss Application Server 5.1.

Now, I do realize I could have been anywhere from 1 minute to 1 month away from a solution for this problem (if anyone has a solution, I'd be more than happy to try it), but I have two closing thoughts:
  • Allocate more time than you might have expected towards making the transition

  • It would have been appreciated if the various teams involved paid more attention to migration tools (even documentation) and/or backward compatibility. My various searches trying to solve this problem turned up a lot of people having obscure issues with the transition: this is not the way to encourage wide uptake of a tool set.

Tuesday, October 20, 2009

XCode 3.x

I've been coming up to speed on iPhone development and thought I'd share some of my experiences.

The first is that Beginning iPhone 3 Development is a very useful starting point. I tried a couple of other resources but finally settled on this. I do like books better than video when learning a new environment but this book also has the advantage of being up-to-date and accurate. I hate trying to learn an environment/languiage when the examples are wrong! Beginning iPhone 3 Development employed a technical reviewer who worked and verified all of the examples. Shouldn't every book like this have a technical reviewer? -- the world would be a better place.

A few observations on the development environment, which although reasonable is a bit more primitive than netbeans or eclipse, especially around refactoring
renaming a class/header file doesn't rename all of the imports throughout the project, and there isn't a "Refactoring" capability that I've found that does this.

The C aspects certainly harken back to an earlier era, e.g., one has to define a function in a .m file and declare it in a .h file for it to work correctly. At least Code Sense minimizes the chances for mistyping in this case.

XCode only allows you to view the interface specification (the .xib file) via the interface builder--however it is useful to realize that the .xib file is really an xml file that can be viewed in a normal text editor e.g., emacs.

Misspellings count and don't seem to generate errors:
  • Surprised that Code Sense doesn't prompt when overriding methods from the superclasses, which causes the the classic "why wasn't this method called debugging session"

  • Similarly the compiler doesn't tell if you're calling undefined methods

  • Misspelling accessor e.g.,
    childController.tltle =;
    childController.title =;
    gets the error: request for member ‘tltle’ in something not a structure or a union

Also can't believe that there isn’t enough introspection so that you still have to do these:

#pragma mark NSCoding
-(void)encodeWithCoder:(NSCoder *)encoder{
[encoder encodeObject:field1 forKey:kField1Key];
[encoder encodeObject:field2 forKey:kField2Key];
[encoder encodeObject:field3 forKey:kField3Key];
[encoder encodeObject:field4 forKey:kField4Key];
-(id)initWithCoder:(NSCoder *)decoder{
if(self = [super init]){
self.field1 = [decoder decodeObjectForKey:kField1Key];
self.field2 = [decoder decodeObjectForKey:kField2Key];
self.field3 = [decoder decodeObjectForKey:kField3Key];
self.field4 = [decoder decodeObjectForKey:kField4Key];
return self;
#pragma mark -
#pragma mark NSCopying
-(id)copyWithZone:(NSZone *)zone{
FourLines *copy = [[[self class] allocWithZone:zone] init];
copy.field1 = [[self.field1 copyWithZone:zone] autorelease];
copy.field2 = [[self.field2 copyWithZone:zone] autorelease];
copy.field3 = [[self.field3 copyWithZone:zone] autorelease];
copy.field4 = [[self.field4 copyWithZone:zone] autorelease];

On The Bright Side

@synthesize obviates a lot of useless typing.

Categories seem cool and I plan to explore them further. Categories let you add methods to an existing class -- the source code of the existing class is not required.

From the xcode 3.1 doc Categories

  • Provide a simple way of grouping related methods. Similar methods defined in different classes can be kept together in the same source file.

  • Simplify the management of a large class when several developers contribute to the class definition.
    Let you achieve some of the benefits of incremental compilation for a very large class.

  • Can help improve locality of reference for commonly used methods.
    Enable you to configure a class differently for separate applications, without having to maintain different versions of the same source code.
    To declare informal protocols.
    See “Informal Protocols ,” as discussed under “Declaring Interfaces for Others to Implement.”

The doc also contains a suitable caveat:
Although the language currently allows you to use a category to override methods the class inherits, or even methods declared in the class interface, you are strongly discouraged from using this functionality. A category is not a substitute for a subclass.

That is, categories are powerful and can blow your foot off, the "power tool" version of shooting yourself in the foot, if you're not careful.

You can schedule actions to happen in the future and then cancel them when superseded by a subsequent user action.
[NSObject cancelPreviousPerformRequestsWithTarget:self selector:@selector(singleTap) object:nil];
[self performSelector:@selector(doubleTap) withObject:nil afterDelay:.4];

It is very nice to have that capability just "built in."

Monday, September 14, 2009

Patterns in Network Architecture

I recently finished reading Patterns in Network Architecture by John Day. It's an attempt to rethink network architectures and polish up "the unfinished demo" that is the internet.

Now, I'm not a network guy, so I can't evaluate the quality of his proposed solutions in any detail, but I liked his thought process and found it a useful read for anyone interested in a good example of thinking through a hard problem and coming up with a disciplined "minimal covering" solution.

Day focuses upon discovering the appropriate layers and layer structures necessary for communication. He works up from interprocess communication on a single machine to processes communicating across multiple machines.

The implications of this analysis are interesting in and of themselves and closely resemble structures seen in other systems. His metaphors are primarily in terms of name lookup and binding, using compilers and operating systems as examples (I have to admit that this only feels partially correct to me: I think the full problem is more akin to providing the data/instructions to a processor and therefore needs to include the mapping from a "memory location" to an actual address accessible by the chip's execution unit e.g., it should take into account caching, TLBs et al.).

The first and foremost conclusion in Day's opinion is that there is one layer that provides interprocessor communication and it replicates. That is, the structure of each network layer is the same, but the policies and optimizations differ depending upon the particulars of what the layer is connected to. Every layer has three parts, data transfer, IPC (Interprocess Communication) control and IPC management -- where control is short cycle management.

In his words

"Layers have two major properties that are of interest to us: abstraction and scaling (i.e., divide and conquer). Layers hide the operation of the internal mechanisms from the users of the mechanisms and segregate and aggregate traffic. But most important, they provide an abstraction of the layers below"

When moving from a shared to a distributed environment the core new functionality is an Error- and flow-control protocol (EFCP). This protocol replaces the shared memory mechanisms of the single system to ensure reliability and to provide flow control in the environment of communication between two systems. An EFCP PM is a task of the IPC process. Although in theory, such a process could be included even when communication is on a shared processor, in practice, this communication is so reliable as to make it redundant.

I think that the core insight/technique was to frame communication from the network perspective as being from application to application and not as interacting with the network e.g., in the figure below (6-15 from the book) communication is conceptualized as being across, that is between applications at the same layer, rather than down through the network and back up to the other application.

The application concerns itself with developing a shared state with its partner application. The N-1 layer provides an an abstract aggregated API to support the application's view of the communication and performs whatever aggregation and abstraction necessary to develop a shared state with the N-1 application on the other side. It then hands off details to the N-2 layer which gets it to the N-2 layer on the other side, etc.


Yes, it is just encapsulation all over again, but as we all know, finding the right thing to encapsulate and doing it in a practical way takes a lot of work.

I'm eliding a number of the other key findings of the book such as
  • The observation than an address only needs to be unique within a (distributed) application layer
  • A connection is made only after authentication has been obtained and the connection authorized etc.

All are developed in a thoughtful way showing deep insight into the problem.

Although not the easiest read for someone without a strong networking background, it is an interesting and useful exercise to watch someone so well versed go through the process so thoughtfully.

Monday, August 24, 2009

Flavors of Architects and Analysts

I was recently involved in a discussion on the difference between architects and business analysts and decided to put together my thoughts on the subject.

Here they are: for each category “Architect”, “Business Analyst” there are a number of sub-categories

I normally think of three levels of Architecture:

  • Application – addresses evolution and delivery of a single application (small set of highly related functions), activities include:
    • Partitioning functionality within an application.

    • Developing best practices.

    • Assuring flexibility to meet current and immediate business needs.

  • Platform – addresses evolution and delivery of a multiple application for a particular business area (applications grouped by functionality/user community), activities include:

    • Assuring a commonality of results.
    • Providing for fine grained interoperability.

    • Developing frameworks that allow multiple applications to ship on a common substrate.

    • Building in flexibility to meet business developments on the planning horizon (this year/next year goals) for moderate-sized groups within the company (~100 people)
    • Assuring that a substrate achieving these goals is in place so the applications can pick it up at the appropriate time.

  • Enterprise:
    • Identify core data elements and services that will be important over the strategic timeframe.

    • Assure that there is an appropriate mix of flexibility/capabilities to meet strategic goals e.g.,
      • If acquisition of companies is a strategic goal, methods for rapidly merging personnel, purchasing, and operational information systems are important.
      • If acquiring products is a strategic goal, capturing data about supply and delivery chains etc. is important.

On the business analyst side I similarly think of three levels:
  • Department Level:

    • What are the processes involved in performing a function: including as is and to be states?

  • Division Level:
    • What is the external business goal that is being addressed?

    • Is this the right way to address it?
    • Should functions be merged/refactored?

  • Corporate/Strategic:
    • What are the strategic business differentiators going to be in the business that we want to be in 3-5 years?

    • What must the business look like to support them?

At all levels there should be some time spent to look at potential inflection points that might radically change the structure of delivery and build in flexibility to address that potential, e.g.,
  • For architecture think outsourcing, software as a service, location aware computing, etc.

  • For business think increased competition in product acquisition, competition from generic products, regulatory/legal landscape.

How these various functions are actually assigned to people depends a lot upon the scale of the problem, the level of risk/uncertainty, the talents of the people involved, and the flexibility of the organization. At one extreme, a star performer building upon a solid platform architecture (in the sense used above) can be a combination business analyst/application-architect/developer for a system serving 50+ users in a non-validated environment.

I think it important that business analysts are able to understand the business processes and vocabulary well enough so that there is a good transmission of information between those expressing the needs and the analyst. This implies a greater stickiness between the analyst and the user community than is necessary for the architecture or project management functions.

Similarly there are commonalities in the level of abstractions used in the architecture level (Application, Platform and Enterprise) that imply that levels are sticker than business areas or technology.

In theory, project management is more transferable, but the stickiness here revolves around legal requirement, diversity of end use (geography, user types), system novelty.

Sunday, July 26, 2009

Open Source as an Architectural Driver

Phillip Longman's post about open source products in healthcare (specifically, the VA's "health IT system") talks about how the Midland Memorial Hospital's installation of the VA system went well because it was easy to use, and it was easy to use because it was open source.

Well, the first one wasn't a surprise. Well-designed easy to use software is well, easy to use. The fact that this leads to successful system uptake/user adoption should be no more surprising than the fact that people like their iPhones. The second factor being easy to use because you are open source is a bit of mental speed bump: Easy to use open source? Well, maybe if you are a developer. The article states that the ease of use stemmed from its ease of modification. Now I can't comment on that since I am unfamiliar with the product and have never been involved in a hospital centric system.

However, open source and ease of modification? Yes, that fits. A successful open source project, by definition, must be relatively easy to modify: An interested developer should be able to jump in, modify the code and stand up a running test build in short order. Otherwise the project won't attract enough attention to survive. More importantly, I think a system that is easier to modify will leap ahead in functionality even if it starts out behind.

This is one of the reasons we've seen such useful build/test tools come out of the open source community e.g., ant/junit/maven etc.. All open source projects needs tools like these to succeed since they are critical situations where you cannot afford a dedicated buildmeister or QA organization e.g., you're a developer modifying the code to satisfy your needs.

Similarly a clean, modular, layered structure is going to be favored, codependencies (A depends upon B, B depends upon A) are going to be rejected, since they require understanding two pieces of code and their interrelationship to perform a successful modification.

Both of these issues can be more easily compensated for within a "closed source" shop, since revenue-generating projects employing full time personnel can invest the time and discipline to keep things working, even if the software has a few points of poor structure. In addition, if the points of 'bad architecture" are manageable there is little incentive to fix the problem, since it might easily cost more to fix than it's worth, given the costs of running a full-time development team.

One of the strongest examples I think we've seen of this is the EJB vs. Hibernate controversy with the resultant "conciliation" of EJB3. Hibernate, being open source, could give developers what they needed rather than what they "should want" & it won due to its simplicity and speed.

This argues for a bias towards an open source style, even in a closed source system. For example, can your consultants (internal or external) add/modify deep system functionality? Designing your system in a way that supports such modifications will make the architecture better and help keep the product fresh.

Again, why this works for hospital systems is beyond me, unless there has been an undocumented rash of coding by doctors and nurses.

Tuesday, June 30, 2009

iPhone: changing the way we think

I'm struck by how the iPhone has changed the way we think about what can be done with software based devices assisting us as beings-in-the-world. I'm doing a Heidegger reference here because the iPhone is more than just ubiquitous computing: a device always at my side that could answer those important questions like:
  • Is there good coffee close by?

  • What's the weather going to be like later?

  • How old was Kennedy when he was elected?

Although it certainly is that, it has become a lot more, changing both the economics of software delivery and what it means for software to be delivered.

It's not just that there are a billion apps (or so it seems) in the app store, but the economics of iPhone software is such that a small gaming company can do a novel game e.g., tying rope around wooden blocks, get traction with it and make money. That didn't sound that amazing until I read an interview with the developers in Gamasutra that reminded me how hard it was to make money in computer games pre-iPhone. Not that it is easy now, but compared to the stories I heard when I attended a few Game Developer conferences earlier in the decade, it is trivial. Let's just say that the economics of doing a platform (XBox, Playstation, wii) or PC based game were daunting, to say the least, and the likelihood of getting paid for your game was minimal, even if the game was successful.

The core of the iPhone's difference is as a platform that is easy to use, location aware and ready-to-hand -- more like a hammer than a computer.

As a platform it is sufficiently distinct that it is also effecting the way we think about delivering healthcare. Looking at this list highlights core features that are "new," not "new" in the sense of being completely unheard of, but new in the sense of being practically available for use by the overwhelming bulk of the user community -- sort of like the difference between having a generator kit/knowing about electricity and having an electric grid that you can plug your device into.

As a user-assistant, the iPhone allows me fully exploit the affordances of my current location. I can see where I am on a map, look at overhead imagery of my current neighborhood to see if there is something that I want to photograph, and, if there is, use a small application to grab the geo coordinates so I can later tag the photos I took with my (non-GPS enabled) camera.

The end result is something that is always with you, knows who you are, knows where you are, has connectivity both up (3G/internet) and down (bluetooth to local devices) while providing a simple effective mechanism to easily add functionality in small increments.

I think this makes it the biggest game changer since the rollout of the internet to the general public. However, I also realize that this means that it is time to code up a small test application for the iPhone.

PS: I don't have any experience with the Google android platform or the Palm Pre; these observations may apply equally as well to them.

Monday, June 1, 2009

Linked Data

Finally, thanks to a discussion with Eric Neumann a few weeks ago, I'm beginning to understand what Linked Data is all about. First a caveat -- although I credit Eric for helping me see how linked data fits into what I'm doing, the following interpretation is strictly my own as are errors of omission, commission or orthogonality, although I think my view is supported by the Design Issues document.

The short story is that linked data provides stable identifiers for stuff (a more abstract form of things). These stable identifiers then allow you to say things about this (particular) stuff without necessarily making a strong ontological commitment.

I like this. It provides for interoperability and integration. It does not provide any inference guarantees which is fine by be, and something that I have been advocating for a while. The Linked data site also has links to a number of datasets which publish stable identifiers for useful stuff. The site also gives examples of how to publish your own data.

Hopefully will provide its data in this form in the near future.

Sunday, May 17, 2009

Wolfram Alpha

Wolfram Alpha is supposed to be launching in the next few days and has been getting a lot of publicity. For background, here's a link to a short YouTube demo of Wolfram Alpha, and a NY Times article and Doug Lenat has a nice post on his impressions.

From what I can see (and I don't have access) even though it doesn't live up to some of the early hype, it achieves a very interesting result: it allows retrieval of general computable information using a simple natural language processing (NLP) interface.

This allows for analysis similar to that permitted by a data warehouse, but within different design space. The design goals of WolframAlpha, unlike those of a data warehouse, preclude prestructuring the data in marts to allow rapid querying of the data in relatively well defined ways. However similar to the mart/warehouse situation you must still provide a speedy response to the quantitative queries to prevent users from drifting away while waiting for an answer.

The question is how is this done? Rumors on the net indicate that the underlying data is an RDF triple store, which makes a lot of sense since RDF Triples constitute a vertical, model free storage approach. In operation, I imagine that the queries provide nice entry points for initiating a spreading-activation fan-out process on the graph. When the activations intersect you can proceed to roll back up to the initiation points suggested by the query, clustering in a bottom-up data-driven fashion along the way. The clustering also affords a natural way to structure the data for presentation to the user.

Although I'll admit that this is just an educated guess as to the mechanism, it does suggest an interesting set of technologies involving fast linking and roll up of data for ad-hoc queries without requiring a lot of effort to tune the data to a specific query.

Generating a set of vetted and annotated data is a different problem, but hopefully would not require a significantly greater level of effort than the ETL portion of current warehousing efforts.

Wolfram Alpha therefore constitutes another factor leading me to be more vertical in my storage designs. In the coming months, I'm hoping to run some benchmarks on production hardware/datasets so as to ground the practicality of this approach and then get permission to publish the results.

Update 18 May 2009: I did try Wolfram Alpha today and it failed on my first try "age distribution of England vs UK," not so much from any idiosyncrasies in parsing my query, but because it appears to be encoded with the identity "England == UK." This just goes to show how important it is to be spot-on with your identity information aka "synonym tables are easy, antonym tables on the other hand......aren't."

Wednesday, May 6, 2009


I just upgraded to a new laptop (driven mostly by the need for more RAM -- hopefully 6G will be adequate for a couple of years). It got me thinking: even though it's great that the Mac will copy all of your old apps over effortlessly to your new machine, it also happily copies all your old unused cruft over to your new machine, and that's not so great.

So, in the spirit of good hygiene (and H1N1 preparedness), I decided to open up the console and look to see what I might find. I discovered that I had a couple of launchd jobs that referenced executables which didn't exist on my system any more e.g., carbon copy cloner.

I have been able to rid myself of all the launchd issues, by cleaning up the Launchdemons/launchagents under the Library folder but
I still haven't been able to rid myself of all of these
/Applications/[54428]: Warning: accessing obsolete X509Anchors.

This is even after searching the web a couple of times. I think the problem starts up after I open an article from NewsFire, but I'm not completely sure. This is definitely a space in which I believe correlation is not causality.

If anyone has any ideas on how to fix this, I'd appreciate it.

BTW it is really nice to have a built in tool like Console: it is simple and effective with just that little bit of extra functionality (string filtering) that makes all the difference in usability.

Monday, April 20, 2009

Java Concurrency

A predictable side effect of having (way too) many years of experience in Java is that certain "new" features escape your notice. This is particularly true if the IDE's don't pressure you into changing your previously successful, and still functional patterns (the way they do with generics).

I realized this when reading Java Concurrency in Practice. It's a very good book -- I can't say it really opened my eyes on concurrency since I had done some work on multi-master VME based real-time systems years ago, but it is spot on, well written, and a nice refresh. In addition, it made me aware of the thread/concurrency capabilities available in newer versions of Java such as ThreadPoolExecutor

I recently built a file crawler/hash-calculator/storage system as part of my namedData work using an ArrayBlockingQueue and explicitly created threads. ThreadPoolExecutor appeared to allow an easier approach with cleaner shutdown/interrupt semantics.

Java tips has a clear example -- the primary change that I would make to this example is to size the thread pool based upon the number of processors available (on my laptop this returns the number of cores).

It took me less than an hour to make this change, test the code, etc. The final product is a lot cleaner, has better shutdown behavior, and even feels like it runs faster. Definitely the right way to go.

Monday, April 6, 2009

owl:sameAs is a very strong assertion

There's been an interesting discussion on the public-semweb-lifesci mailing list with the subject "blog: semantic dissonance in uniprot" which, appropriately enough, was spurred by a blogpost entitled semantic dissonance in uniprot. This post talks about a uniprot entry which listed a Drosophila (fruit fly) protein sequence as having been isolated from "a young sporophyte contained within a seed."

The point being that although one doesn't find fruit fly genes in plants, following the owl:sameAs link leads directly to that conclusion. This generated a very long, fairly thoughtful and minimally flame based conversation on owl:sameAs and identity in general.

As the discussion progressed, the problem with associating identity across graphs (ontologies/systems of data developed by different organizations) was noted, e.g., (in pseudo annotation) mySystem:itemA owl:sameAs yourSystem:itemX, the issue being that the use of the terms is usually subtly (and often not so subtly) different between the two systems. This problem is especially apparent when making assertions about real objects which exist independently out in the world. For example: "gold" may have a property, but does the property adhere to a single molecule, or a group of gold molecules and if so what characterizes a group of the appropriate size? For example given:
  • A nanotechnology view of gold (still under development)

  • A semiconductor view of gold (probably reasonably well characterized)
  • A jewelry view of gold

what are the precise boundaries of their applicability? The issue doesn't arise in a system developed for nanotechnology, semiconductors, or jewelry. The problems surface only when these systems are linked together.

My thought is that the difficulty centers around the extreme power of owl:sameAs which indicates that things are identical in all contexts. However in the physical world not only is context everything, but context is also inherently incompletely specified.

In practice many of us heuristically treat identity in the physical world as operating as if identity means indistinguishable in this context, with the context being implicitly dependent upon the issue being considered. I would claim that this is the only reasonable way to proceed when reasoning in a practical manner about what is true about particular objects in the world (abstractions can obviously satisfy stronger conditions since they are abstractions -- with the context factored out to any level desired).

In the physical world, we cannot assure that even the ability to track a particular item with unlimited precision would allow us to make statements about that item which would hold through time. For example, although we might make assertions about a particular atom (#0x177FFEAA) of gold and its behavior, some if not all of the assertions may fail under unexpected conditions, e.g., after an event that alters the structure of the nucleus (nuclear collisions, extremely high temperatures etc.). Exhaustively specifying all of these conditions is impractical at best -- which is one of the reasons the phrase ceteris paribus has remained with us for so long.

In my own work, since I never worry about tracking individual atoms. I gravitate toward weak rather than strong assertions of identity, trying to be very attentive to context. This is very much in the spirit of the middle distance as developed in Brian Cantwell Smith's On The Origin of Objects. Smith's point is that our intuitions are well tuned to objects about our size that we interact with frequently. In data integration and architecture work (I had to get there eventually) it implies that integrating across fields that interact to some degree in the "world" is going to be more feasible than integrating across those that don't interact. The give and take of the practical interaction has allowed us to identify the particular features of each item that are important in context.

Monday, March 23, 2009

OSX Performance Analysis: Instruments

I started working with OSX's Instruments performance analysis tool, partly out of curiosity and partly because I had just fixed a performance problem in an application using an ad hoc a priori analysis. It happened to solve the problem, but I have enough experience with performance issues to know that the a priori guess is often wrong.

Instruments is heavily related to dtrace and shares a lot of its core attributes. The key attributes are that it is low overhead and works with (almost) anything running on your systems (OSX apparently has the capability for some applications to turn off monitoring for security/DRM reasons).

There's a lot to like here: you can easily get it up and going on your system and it the analysis section is very user friendly:


Especially nice features include
  • Low overhead: the peak CPU usage I saw for the tool was ~ 16%
  • The ability to display exactly what is going on under the read head (the upside down triangle above the graph)
  • Being able to display parameters that you didn't think of turning on during the run. All parameters are captured. The selection only impacts the display -- a godsend for anyone who has had to rerun a test because they forgot to capture a parameter

That said, I couldn't get any particular instrument to focus only on the process specified. As you can see, all of the instruments capture all of the activity, even though they were set to focus on different processes. Additionally, the "default action" kept resetting whenever I dragged a new instrument onto the display.

It is still a very worthwhile tool, but if anyone has any tips as to how to get around these issues, I'd appreciate it.

Wednesday, March 4, 2009


It's a bit off topic, but I thought I'd point out how useful a Kindle can be for consulting. You can carry at least 500 reference books on it (and who needs more than 490 anyhow?). It is also very light and easy to read.

I do have a couple of qualms. It is a page oriented display (no scrolling), no touchscreen and has no spatial indexing e.g., the top side of the right page half way in, but other than that it's a win.

An important note on utility: O'Reilly e books can be read on the Kindle. The truly great thing about O'Reilly's e-books is that you get both the Kindle compatible mobiPocket files and the more aesthetically pleasing PDF files (for me, aesthetics matter--even in a SQL guide).

You can mix and match reading and reference between the formats depending upon your preference. Thankfully the files aren't copy protected. Thanks, O'Reilly, this is a very nice touch.

Tuesday, February 17, 2009

Seambay modifications to access Seam Annotations

This post extends my last one about accessing Seam from the command line. Here I describe the transition from using EntityManager to using EntityHome.

The first thing I did was to create a new folder for the webSevice sources, which meant that I had to add this directory into the build.xml file and add all of the libraries into the compile path in NetBeans (both of which are obvious, but both of which I always forget to do).

The next was to make my action work similarly to an .xhtml page and interact with a home object rather than directly with the EntityManager
going from:

if (fileData == null) {
fileData = new FileData();
// various actions on fileData

if (fileData == null) {
fileDataHome.persist(); //side effect of creating the defined instance
fileData = fileDataHome.getDefinedInstance();
// various actions on fileData

which also required adding these lines to components.xml


None of which was particularly difficult and I was up and running in a hour or so.

Monday, February 2, 2009

Seam From a Command Line

I recently wanted to access some seam derived functionality from a command line java program (something that I could run via cron). I ran into a few minor problems and thought I'd share their solutions.

The first problem was that seam annotations like @Logger won't work. I guess it isn’t that surprising, but the jboss seam annotations are unavailable to a command line program (at least not easily) since the portions of the framework that enables these annotations are designed to operate within a server.

This was disappointing. The @Logger annotation is really useful, but I couldn't come up with a way to get it going.

This pushed me into wanting to use web services as much as possible to take advantage of other annotations that I had built into my system, e.g., the ability to automatically stamp an object with time modified and time created to support temporal data operations.

I did not find the seam documentation about accessing seam web services particularly clear (especially when using netbeans) so I turned to the netbeans tutorial and was quickly up and running with the seambay example.

    The WSDL for the seambay example is found at (assuming your server is local and you have deployed the seambay example) http://localhost:8080/jboss-seam-bay-jboss-seam-bay/AuctionService?wsdl

    An overview of all services at the host (again, assuming your server is local) appears at

Monday, January 5, 2009

Data Deduplication

IEEE Computer recently published a short survey on data deduplication. Conceptually deduplication is isomorphic to the named data approach I posted about a few weeks ago.

The vendors discussed include

Since I'm not sure how long the pdf of the article will be available, I'm posting this as a follow up to my previous post.

My preference would be to see these capabilities built right into the internet/operating systems rather than separate utility servers.