Thursday, September 25, 2008

Using AspectJ to diagnose ServiceMix component performance

An inventive FUSE customer showed me an approach to diagnosing his ServiceMix integration flow by injecting some aspect-oriented code, and it's so cool that I thought it would be worth sharing. The problem is this: given that you have an integration flow, how can you diagnose how long each endpoint in the flow is taking to do its business? The customer in particular was experiencing exceptionally high latency on his integration flow (in the order of 900ms) that just seemed plain wrong: the question was, where in the flow was he incurring the hit? Was it in transformation, EIP, resequencing, or something else?

I used AspectJ to reproduce my customer's approach, weaving in some diagnosis code into the ServiceMix call stack. You can do this by creating a simple aspect like the one below, which is called whenever a call is made to Flow.send(). The measurePeformance() method does some logging, but also does a simple timer around the message invocation. The result will only be accurate to the millisecond; however, if you're trying to isolate big-elephant-in-the-room bottlenecks then this should be sufficient.

public class PerformanceAdvice {

@Around("execution(* org.apache.servicemix.jbi.nmr.flow.Flow.send(..))")
public Object measurePerformance(ProceedingJoinPoint thisJoinPoint) {
Object ret = null;
try {
MessageExchange me = (MessageExchange) thisJoinPoint.getArgs()[0];
System.out.println("send() to " + me.getEndpoint().getServiceName() + "... (status: " + me.getStatus() + ", role: "
+ (me.getRole() == MessageExchange.Role.PROVIDER ? "PROVIDER" : "CONSUMER")
+ ")");
long startTime = System.currentTimeMillis();
ret = thisJoinPoint.proceed();
long endTime = System.currentTimeMillis();
System.out.println(me.getEndpoint().getServiceName() + " done; elapsed time = " + (endTime - startTime) + " ms.");

} catch (Throwable e) {
return ret;

To weave this code into ServiceMix (I used version from was surprising easily: I added the JARs for aspectj (aspectjlib.jar, aspectjtools.jar, aspectjrt.jar, aspectjweaver.jar) into ServiceMix's lib directory. I also jarred up my PerformanceAdvice class and dropped it into the lib directory too. Finally, I modified the ServiceMix configuration file - conf/servicemix.xml - to create my advice and turn on auto-proxying for AspectJ. First, I added the Spring AOP namespace to the tag:


Then, I added the following elements within the element:

<!-- Turn on AspectJ auto-proxying -->
<!-- Create my performance advice aspect -->
<bean id="performanceAdvice" class="">

And that's it! To test, I ran a simple integration flow that reads from a JMS queue and uses a pipeline to transform the message and send the result to another queue. Here's the output:

Invoking on {}Pipeline... (status: Active, role: PROVIDER)
{}Pipeline done; elapsed time = 0 ms.
Invoking on {}XsltTransformer... (status: Active, role: PROVIDER)
{}XsltTransformer done; elapsed time = 1 ms.
Invoking on {}XsltTransformer... (status: Active, role: CONSUMER)
{}XsltTransformer done; elapsed time = 0 ms.
Invoking on {}JmsOutput... (status: Active, role: PROVIDER)
{}JmsOutput done; elapsed time = 0 ms.
Invoking on {}JmsOutput... (status: Done, role: CONSUMER)
{}JmsOutput done; elapsed time = 1 ms.
Invoking on {}XsltTransformer... (status: Done, role: PROVIDER)
{}XsltTransformer done; elapsed time = 0 ms.
Invoking on {}Pipeline... (status: Done, role: CONSUMER)
{}Pipeline done; elapsed time = 1 ms.

You can see that there's lots of messages being sent out - we're most interested in those where the status is Active and the role is PROVIDER (you can modify the PerformanceAdvice class to just print those out if you like). From this, I can see that the call to the XsltTransformer endpoint is taking just 1ms; a nice verification that my XSLT transform is doing anything crazy.

Using an approach similar to this, my customer was able to show that his bottleneck was actually in one of his own handmade components that invoked on a back-end server using RMI - the hit was in the order of 850ms, using the lion's share of the latency. I'm not sure what he's done since then to minimize this hit, but at least we know where the problem is.

In summary: Aspect-Oriented Programming is great for weaving in cross-cutting concerns such as logging, security or transactionality into your application code - it's also a nice tool to have for performance diagnosis.

1 comment:

ceppo said...

Very interesting approach.

It's possible to use System.nanoTime() instead of System.currentTimeMillis() in order to obtain a higher time resolution.