Monday, November 3, 2008

JMS/JCA Flows in ServiceMix: the wrong level of abstraction for distributed ESB

The ongoing argument over REST versus RPC has created useful debate around programming abstractions, particularly with respect to the use of abstractions in distributed computing. Abstractions are useful in that they remove underlying complexities and allow us to focus on the task at hand. However astractions fail if they hide away important detail that could influence design or implementation decisions. For example, the problem with RPC, some RESTful folks would argue, is that in making remote invocations look like local invocations, RPC facilitates poor design decisions that neglect caching, error handling, or optimization of network traffic. And, they may have a good point here; however, in this blog entry I don't wish to go down the well-beaten REST/RPC track, instead though I want to discuss the use of a particular abstraction in implementing integration solutions with FUSE ESB (Enterprise Apache ServiceMix). I've found that while this abstraction - the JMS/JCA flows - offers exciting possibilities, it is the wrong abstraction, and should be avoided.

I had been pondering for some time about the use of abstractions within the ServiceMix implementation of the JBI Normalized Message Router (NMR). The NMR is used to send messages from one component to another; in ServiceMix, this is implemented using a number of different "flows". Depending on the kind of message exchange (syncronous or asynchronous), quality of service (reliable, transacted), and whether the target service is clustered, the NMR will choose a "flow" to match.

Four flows are provided in ServiceMix - single-threaded (ST) , SEDA, JMS & JCA. Of these, the latter two have interesting and exotic qualities. First, because they're both based on JMS, they are reliable: messages can be persisted as they travel between components. Second, and again, because they are implemented using ActiveMQ JMS queues, this means that if the ServiceMix container is deployed in a cluster, then messages can transparently be delivered across JVM containers. This gives us a truly "distributed ESB".

Many are drawn to this visionary usage of the NMR; however, I strongly believe that clustering & persistence of the NMR is the wrong level of abstraction. First, you are unwittingly exposing your solution to potentially unacceptable performance hit. With a little knowledge of the NMR's message passing semantics, you'll see that even a simple integration solution (like my File -> Pipeline EIP -> transfomer -> JMS) takes more NMR messages than you think: in this case, seven messages are used. Wow, a simple little integration solution using the JMS-flow takes a total of three persisted JMS queues and four temporary queues! And each of those persistant queues, in ActiveMQ, will need it's own thread! In fact, if you've left the default flow settings enabled in ServiceMix, you'll see that we create two queues (and hence, two threads) for each service on the NMR (one for JMS-flow, and another for JCA-flow!). I was curious about the impact of these flows on ServiceMix threading, so played around with the configuration. I found that by simply disabling the JMS & JCA flows in ServiceMix3, I was able to reduce the number of threads on startup by about 50%. Yup, that's right. I halved the number of threads lurking in the JVM.

So, a deeper look shows that this "flow" abstraction is going to consume resources and impact on your latency & throughput. It gets even hairier though: with a flow like the one I detailed above, it's possible that in a clustered environment your message exchange could start out on the file component of machine A, get pipelined by machine B, get transformed back in machine A, and then get send to the JMS provider on machine B. All that remote network traffic when all I ever wanted was to do a little transformation on a file and drop it into a queue.

I had the good fortune to be in town with Guillaume Nodet, Godfather of Servicemix, last week, and talked to him about this; Guillaume is an intelligent and fun guy and if you get a chance to meet him then take it! It turns out that the JMS/JCA flow feature of ServiceMix was there before the NMR was refactored to become JBI-compliant. So, back when it was invented, it wasn't burdened by the additional NMR traffic mandated by the JBI spec. So, these flows made more sense back then, but really don't make so much sense now. You can get everything you want in terms of reliability by using explicit queues as checkpoints in a transactional SEDA architecture: just use the SEDA flow, and, where transaction propagation is required, set the message exchange to be synchronous.

Rather than engage in feature debridement in ServiceMix3, Guillaume has made the right move in choosing to leave these flows in ServiceMix3 but omit them from ServiceMix4. There is talk about reimplementing the JMS flow for ServiceMix4 for "backwards compatability", but I for one would rather see Guillaume invest his and the community's efforts on other more pressing stuff. A really nice feature of ServiceMix4 is that you will be able to transactional propagation on the SEDA flow with asynchronous as well as synchronous messages, so those who invest in the SEDA appraoch now can avail of better thread-performance at the flick of a switch when they move to ServiceMix4.

Both Guillaume and I were agreed in that if you want to introduce clustering, reliability or transactionality into your integration solution, then do so using explicitly named JMS queues. That way, you have full control over when and where your message exchanges are persisted, transacted or clustered. It's easy, and more importantly, it's the Right Level of Abstraction for integration solutions.

2 comments:

Konrad A. said...

Ade,

I beg to disagree with you.

1. As an ESB user I may wish to see the internal bus implementation as a blackbox with some features I can rely on. One of the is QOS (in thise case - reliable delivery). So when I design my ESB flow using shipped, reusable components I want the bus to guarantee my QOS without bothering how to design my queues in between the components so that the flow becomes reliable. In some cases I may choose not too worry with performance overhead too much. Automatic guarateed delivery may be more important to me.
2. The manual usage of the in-between queues increases the design and implementation complexity, which may be not desired in many cases.
3. Using in-between queues does not solve the whole QOS problem. By using them, you just maintain the state between the components. You do not address the important problem of maintaining the state of the components themselves. For example, some EIPs like Resequencer or Aggregator may loose their state in case of failover when using just a custom set of in-between queues.

The problem of the state of some EIPs underlines the need of consistent support for given (configurable) level of QOS by the bus. In my opnion the reliable delivery should not be the responsibility of the flow designer but should be perceived as a feature of the bus (and all its elements, including EIP implementations).

vinod said...

Ade,
We are using 3.3.18 fuse, is there any issue for JMS flow to work with servicemix camel router, as we see errors like "Premature end of file" if router is involved, and it goes away for SEDA flow.
Thanks
vinod