Ade On Middleware: 2010

Monday, September 13, 2010

FUSE Community Day Paris 2010

I'll be at the FUSE Community Day, Paris event on October 14th! Feel free to sign up and come along - If the London event in June is anything to by I know it's going to be a great day out :) We've got some great speakers lined up (including all of our FUSE rock stars like Claus Ibsen, Rob Davies, and James Strachan.

Rockin'!

Wednesday, September 8, 2010

Survival of the fittest: the evolution of the SOA registry / repository concept

I've decided to write a little about SOA registries and repositories, as a few customers have brought up these issues recently and I wanted to clear the air (and, in some sense, clear my own head too). I dragged in some sanity checks from my fellow consultants at FUSE just to get their feeling on the adoption of SOA registry tooling, so hopefully what follows will make some sense.

There was a flurry of activity some years ago (say, around 2007/2008) about SOA (Service-Oriented Architecture) Registries and Repositories. SOA evangelists, architects and "thinkers' waxed lyrical about how real SOA experts (with capital letters) used registries and repositories, and that the 'little-SOA' folk just didn't get the concept, god bless them. Vendors rushed to provide (sometimes expensive) SOA registry / repository tools, a move probably motivated more from the realization that underlying SOA infrastructure and middleware was becoming a commodity, rather than the fact that customers really did need a registry.

Funny that.

Because in many of the SOA customers I've worked with, they simply haven't needed a repository/registry. In fact, the mandate from on top to get a SOA registry or repository has been a distraction! Now, don't get me wrong: I'm not saying the SOA registries are a bad idea. Not at all. I'm just saying that, as a tool, they solve a particular problem. And, if you don't have that problem, then you don't need the tool.

Let's consider what a SOA registry / repository might do for you.

Store contracts, service level agreements, code, deployable artifacts, documentation (this is the 'repository' nature of the tool)
Act as a run-time lookup for physical location of services - thus abstracting a services physical network address from a 'logical' network address. (this is the 'registry' nature of the tool)
Keep a track of who is using what services, and how much (this is the 'governance' nature).
Enforce service level agreements (again, this is more governance: making sure that the services are available and doing what they should do in a timely fashion)

Interestingly, I've found that the 'repository' problem is often solved using readily available file-sharing tools, HTTP servers, or even source-code management tools like SVN, GIT or CVS. Not glittery, sexy, 'made-to-measure' SOA repositories, but pragmatic, workable and relatively simple. One customer I know took a pragmatic approach which I really admire: "We're going to use a Wiki for our first set of services. When our Wiki becomes a pain, then we'll look at getting something more appropriate. But not until then."

The 'registry' problem is sometimes solved in software: for example, the Artix Locator, integrated with Apache CXF, allows services to register themselves automatically with a well-known-registry, which clients can then look up at runtime. Nice! I've seen other CXF users write their own registries to achieve something similar - it's really not rocket science (although, you can make your registry as complicated as you like!). I would like to see an open-source implementation of such a runtime-registry, perhaps making use of recent innovative projects like Apache Zookeeper to keep track of who's up and who's down. As an additional note on registries: one customer noted that the OSGi registry, particular in the context of distributed OSGi (dOSGi), adds another dimension to the idea of dynamic, transparent lookup of services. And, if you're acting smart in a JBI or ServiceMix 4 world, you can achieve location transparency through the NMR or ActiveMQ POQs (plain-old-queues).

Interestingly though, if you have your network folk in the room, they'll argue that they can solve this whole lookup and 'location transparency' problem in a heart-beat in hardware with a network switch: think about it: I can go to www.google.com every day - in fact, so can my mam - and never worry about what machine my search gets directed too. So, often people can achieve the 'registry' functionality easy, sometimes with zero code.

So, while there is certainly some argument for using registries and repositories for storage and retrieval of information about what services you have on the network, there are often many options available to you that don't need a dedicated software tool. And, keep in mind, if you have only a small number of services - say up to 20 - you really don't have a management problem - so ad hoc approaches like Wiki's may be perfectly adequate.

With this in mind, I think that 'SOA registry-repository' offerings begin to offer value when they tackle SOA Governance issues, and I think this is the evolutionary route these products are taking. IT owners want to know that security policies are being enforces - these policies should be configurable, not 'designed into' the fabric and code of a service where they're difficult to change. They want to know if the response times are being sluggish. They want a record of who accessed what, and when. This could be for a legal audit trail, but is probably more useful internally as a 'proof-point' to show that service re-use is happening, and, more importantly, a way for a core SOA Centre of Competence to demonstrate the value that its services are offering the rest of the organization. Now this, for me, is a place where these tools can offer real value. As a Progress employee I am biased, and will of course recommend Actional which does an awesome job in this regard - but, as a fair and open-minded person, I have seen users of CXF propose to use HP SOA Centre to achieve something architecturally similar. Actional acts as a 'gateway'. You register your services with it (hey, a touch of 'registry!'). You apply your security policies. And away you go: all SOA requests get passed through gateway and SLAs are enforced and reported. Governance made easy. The only problem with this architecture, from my perspective, is that it creates a potential 'gateway bottleneck; however, with appropriate load-balancing in place this hasn't yet been a problem in practice.

At the end of the day, the tooling is evolving to the point of value around the concepts of governance - I welcome this. However, I still believe that if you're embarking on creation or adoption of SOA concepts, you shouldn't get stalled in the early stages by selection of SOA registry / repository. First, understand your needs - your real needs - in this area, and then, go look for right tool for the job.

Versioning WSDL interfaces in an OSGi world

Some time ago (2007, I think) I wrote a short paper with Oliver Wulff on WSDL versioning. Recently, I shared this paper with some smart OSGi-savvy ServiceMix users, and it raised a discussion about how does WSDL interface versioning mix with the versioning concepts in OSGi? Below is an extract of an email response I sent back to them: for me, the bottom line is that WSDL interface versioning (in fact, any interface versioning) is a separate concern to the way we version implementations.

For anyone out there who is implementing Web Services using OSGi runtimes like FUSE ESB (built on Apache ServiceMix and Apache Karaf), you might find the thoughts below of interest.

From my side, I think the most important point is that my paper concerns WSDL versioning, that is, the versioning of an *interface*; the versioning of an OSGi bundle is, from a service consumer's perspective, an *implementation* detail and a totally separate concern. I believe you should consider the versioning of the interface and the implementation as two *separate* evolutionary paths.

So, I could have, say, version 1 of an 'foo' web service, in namespace http://www.myorg.com/services/foo/v1, defined in foo-v1.0.wsdl. We might generate the Java JAX-WS/JAXB-B classes, and then package as a bundle foo-ws-api-1.0.jar. I could implement it today in bundle foo-v1.0-impl-1.0.0.jar, and then fix a bug later in foo-v1.0-impl-1.0.1.jar, add a cool new feature in foo-v1.0-impl-1.1.0.jar, and then do a completely new, high-performance implementation in foo-v1.0-impl-2.0.0.jar. All of these implementation bundles would implement *the same* version of the WSDL!!

Now, imagine we need to do a minor change to the web service interface: say add a new operation, or add a new parameter to a method. We can do this *without* breaking on the wire interoperability by creating the new PortType in foo-v1.1.wsdl, but still in the namespace http://www.myorg.com/services/foo/v1. We would deploy the JAX-WS content for this in foo-ws-api-1.1.jar. We could do a new implementation, and put the implementation of this in foo-v1.0-impl-1.0.0.jar. Now, we can deploy the new version of both the interface and the implementation, and, if we've been careful, the new deployment will handle requests existing consumers of version 1.0 of the interface. KEEP IN MIND THAT WHILE THIS IS POSSIBLE TO DO ... (sorry for shouting!) it's actually quite tricky to understand and I think that the complexity overweighs the benefits for most users. So, it's probably better to try to keep WSDL interface's versioning at major release numbers.

Now, with this in mind, we can consider what happens when we deploy a whole new version of the WSDL interface! We can deliver foo-v2.0.wsdL, with portTypes and all in namespace http://www.myorg.com/services/foo/v2. We can deploy this API as bundle foo-ws-api-2.0.jar, and deliver our first implementation bundle as foo-v2.0-impl-1.0.0.jar. And, later, when we fix some bugs or add some features, we can deploy the new implementation as foo-v2.0-impl-1.0.1.jar.

Get the picture? For me, the rate of change of an interface should be different to that of an implementation - ideally, it should be significantly slower, changing maybe on a yearly base. We need to apply different versioning semantics, and evolve them at different rates to the implementation bundles.

Hope that helps!

Best,
Ade.

Wednesday, August 25, 2010

Karaf's Fabulous Features, and, what you can do to make them even more fabulous

I've been working for sometime with the Karaf OSGi Shell, through my exposure to the great ServiceMix 4. The features mechanism provided by Karaf/ServiceMix allows you to leverage the modularity and control of OSGi bundles by grouping sets of bundles - containing your Camel routes, web services, business logic, RESTful services - into easily manageable 'features'. In this post, I'm going to talk a little about what features are, why they're good, and what we could do in Karaf to make their usage even easier. If you like the proposals, please, follow the JIRA links and vote for them so that we can get the community behind it. These enhancements are small, but, based on my experience, they could have a big impact on ease-of-use and adoption of a very powerful deployment mechanism.

Features: a group of bundles by any other name would sell as sweet.

A feature is just a set of bundles, described using a very simple XML file - this file is called a 'feature descriptor', and is also referred to as a 'feature repository'. Here's an example that describes a single feature, 'feature-b', that depends on another feature 'feature-a', which is itself described in another repository. I've thrown in some default configuration as well that will be synched into the OSGi Config Admin service; don't worry about this for now, I'm just showing off.



<features name="feature-b-0.0.1">
 
 <repository>mvn:com.fusesource/common-features/0.0.1/xml/features</repository>
 
 <feature name="feature-b" version="0.0.1>
  <feature version="0.0.1">feature-a</feature>
  <bundle>mvn:com.fusesource/bundle-b/0.0.1</bundle>
  <config name="feature-b">
   a=1
   b=2
  </config>
    </feature>
    
</features>

Features can have sensible names, like 'InsuranceQuoteService' or 'CustomerUpdatesFlow', and they can be versioned so that you can track their evolution and upgrade or rollback with ease. Features can 'depend on' other features, which means that when you install a feature, it and all of its dependent features get installed too. This is very neat: how many times have you realized that, yet again, all your SOA, RESTful services and integration flows all rely on the same common backend code? You can describe these dependencies easily and elegantly using Karaf features. And, there's a set of tools that allow you to suck down all of your feature dependencies from Maven servers onto a local drive in an elegant directory structure - automatically, as part of your build - so that you can tar.gz or .zip it all and and deliver your feature into the production environment. This last point is so important: these techniques allow you to use all of your Maven-style bundle URIs on production machines that don't have Maven installed and don't have access to the external Internet.

Making features really, really easy to use

While all of this goodness is there for the taking in Karaf, there are a number of small improvements that, I think, will go a big way to ease the adoption of the features mechanism. My own usage of 'features' is based on what I've learnt and observed from the Karaf source itself: I want to make it easier for other developers to create, package and deploy features. And so, I've created today a number of issues on the Karaf JIRA to get the ball rolling.

KARAF-165: Create an improved Maven feature-assembly plugin. Right now, to make a feature I've got to add almost a hundred lines of Maven verbage to my pom.xml in order to assemble a feature. I've got to use the attach-artifact goal of the org.codehaus.mojo/build-helper-maven-plugin to deploy my features file into Maven. I've got to use the add-features-to-repo goal from the org.apache.karaf.tooling/features-maven-plugin to suck down all the dependent bundles. I've got to a whole load of other stuff to perform the packaging to .tar.gz and .zip. The problem here is that I'm using a whole load of generic plugins to do a very specific job, and I'm having to tell the plugins what to do instead of telling them what I want done. I'd prefer to have a single more declarative plugin to do this. It might look like this:
```
<plugin>
 <groupId>org.apache.karaf.tooling</groupId>
 <artifactId>feature-assembly-plugin</artifactId>
 <version>2.2.0</version>
 <executions>
  <execution>
   <id>create-repo</id>
   <phase>generate-resources</phase>
   <goals>
    <goal>create-repo</goal>
   </goals>
   <configuration>
    
    <featureFile>file:${basedir}/target/classes/features.xml</featureFile>

     
    <features>
     <feature>feature-a</feature>
    </features>
   </configuration>
  </execution>
 </executions>
</plugin>
```
The plugin should produce a .tar.gz and .zip file, containing the feature descriptor (and all dependent descriptors) and all bundles (and dependent bundles)in a Maven-style directory, similar to the system/ directory currently used in Karaf. Note that this plugin doesn't need you to list out all the feature repositories / descriptors that your feature file may transitively include - it will detect these dependencies at runtime and work out the details.

KARAF-151: We need to add a hot-deploy mechanism, that will detect feature assemblies dropped in the deploy/ directory, and unarchive the file into, say, a contrib/ directory. 'contrib/' would be the application-level equivalent to the current 'system/' directory; it would contain all bundles that are required to run your features. After exploding a feature assembly into the contrib/ directory, Karaf should scan the directory structure for feature repository files, and add these dynamically to the runtime.

The result? If we implement these two enhancements to Karaf, we'll end up with a double-whammy. Developers will find it incredibly easy to create feature assemblies. Administrators and operations folk will be delighted that all the have to do is copy a feature assembly to the deploy directory, and then ssh into the Karaf/ServiceMix runtime and list, install and upgrade or rollback features using the 'feature' commands.

If you like these proposals, please vote for the issues!

Monday, August 23, 2010

Seeing the wood for the trees: a <treeConnector> for ActiveMQ?

I have proposed a treeConnector for ActiveMQ - see the JIRA issue here and please vote if you like it. Am including the text of the proposal below.

The ActiveMQ network connector is excellent at facilitating self-organizing networks of ActiveMQ brokers with dynamically discovered routing, and can be used in situations where network availability is sporadic and not guaranteed. Network connectors can be used along with master-slave pairs to create a distributed ‘messaging fabric’. The network connector can also accommodate the creation of hierarchical networks, however, I believe there is a good argument for treating tree topologies as a special case in their own right. In this note, I’m going to describe why trees should be treated as a special case, and describe what a treeConnector might look like.

Hierarchical (or ‘tree’) networks are a special case of the abstract notion of a network; they have numerous applications in the real world in cross-geography, wide-area deployments. Consider a retail organization with 1,000’s of stores, who wish to send and receive information to and from head-quarters (HQ) and the stores: you can envisage a hierarchy of ActiveMQ brokers: one for HQ, one for each of the regions (e.g. Ireland, UK, France, Germany), and then a broker in each store/outlet in each of the regions. When HQ wants to send a message to a store, it should be able to write a message to the store’s queue and have it dynamically routed to the store via the regional broker. Likewise, if a store wants to send a message to HQ, it should be able to write to a local queue in the store and have the message dynamically routed to HQ.

Right now in ActiveMQ, there are a number of issues with using a network connector to achieve a tree topology:

Spillage of consumer advisories. When a consumer connects to a store, an advisory message is generated and sent to all brokers within the networkTTL range. These advisories don’t just travel up the tree to the top (HQ): they can also spill back down the tree to peers who are simply not interested. We have seen this in a network with 1,000 brokers connected to a regional broker, and a networkTTL of two: the advisory gets sent to the regional broker, HQ, *and* the 999 peers of the broker.

Resource wastage (as a consequence of spillage). When a consumer advisory from a store broker (say, broker ‘A’) reaches another broker within the networkTTL (say, broker ‘B’), then ‘B’ creates a subscription for the destination (to do forwarding) *and* can result in the allocation of a thread for this destination. So, it’s possible that Broker B will allocate a thread for the destination ‘BrokerA.Incoming’, despite the fact that it will never receive or send to this queue. If there are only a small number of peers at this level then this is not a problem; however, if there are many peers (as in the case of 1000 stores per region) then this will be noticeable.

Sensitivity to networkTTL. In order to minimize spillage and resource wastage, you need to se the networkTTL on each broker to the distance between the broker and the root - in this example, 2. However, a later reconfiguration of the network could result in this setting being too low (which means messages won’t go the distance) or too high (which means you get inadvertent spillage)

These issues can be addressed if we create a new tree connector for ActiveMQ:

Brokers would identify themselves as ‘leaf’, ‘branch’ or ‘root’ nodes. There should only ever be one root. If a broker has not been identified as either ‘leaf’ or ‘branch’ then it should consider itself to be the root.

Tree connectors would be configured using a ‘many to one’ approach. Leaf brokers would configure a tree connector to their branch. Branch brokers would configure a tree connector to the root. The root shouldn’t need to configure anything, except of course a transport listener.


  <treeConnector nodetype="branch" uri="tcp://headquarters.myorg.com:61616"/>


 <treeConnector nodetype="leaf" uri="tcp://uk.myorg.com:61616"/>

If a consumer connects to a leaf node, then the consumer advisory travels up the tree, all the way to root: the concept of networkTTL is ignored. If this advisory travels through a ‘branch’ then the branch delegates the advisory upwards. Branches do not pass advisory messages from children to their other child nodes.

If a consumer connects to a branch node, then the consumer advisory travels up the tree. Additionally, the branch broker will send the advisory downwards to its children.

In this way, we reduce spillage of advisory messages, and ensure that trees can self organize without too much intervention or worrying about the network time-to-live.

Thoughts?

Friday, August 13, 2010

An easy, useful, NMR: Monsieur Nodet, vous êtes une légende.

I was discussing the ServiceMix NMR component this week with a colleague; and my interest was piqued enough to take a fresh look at this little fella. I grew to dislike the JBI NMR some time ago, and have written about these feelings a number of times on this blog. But its time to draw a line under those dark times, and look to the future. And, in the brave new world of ServiceMix 4, the new NMR becomes lightweight, liberating, and startlingly useful.

Rather than explain the technology first, let's talk about the problem that it might solve. In ServiceMix, you can deploy integration or business logic as OSGi bundles - this much we know. Now, say you want to send some information between two bundles: how can you do it? There are a number of options open to you - here's some of the most popular options.

Use OSGi services. I'm a huge fan of the POJO-based approach that comes with OSGi services. It's easy to use. Drawbacks? There are two I can think about - whether they're relevant to your use case will depend on, well, your use-case. First, the call is always going to be executed synchronously on the callers thread. If you've got lots of throughput going through the system, this may not be desirable. Second, in order to invoke on the OSGi service, you will need to marshal your incoming payload into POJO objects: maybe this marshaling is something you simply don't want to do. Maybe you just want to pass data through the system as fast as you can.
Use JMS queues. We love ActiveMQ, and ServiceMix is tightly integrated with ActiveMQ. But it seems that forcing the use of a JMS queue just to send some information from A to B, in the same JVM, feels like overkill - despite all the underlying optimizations available in ActiveMQ for in-memory messaging.

Use the NMR. Ching! The penny drops. The ServiceMix NMR allows you to send anything to another bundle, and allows you to do this either synchronously or asynchronously. So: you get choice on whether you want your code to be handled on a separate thread. And (and this is the best bit) the NMR, unlike the old JBI NMR, does not demand that the payload be XML. You could send your granny through. If your Granny was a Java Object. If it wouldn't cause too much confusion, I'd vote for a name change, and call this the Denormalized Message Router (DMR), as it no longer tries to enforce a canonical format.

Playing with this, I put together a little demo in a few minutes where one bundle did a file pickup (of a non-XML file) via Camel, sent the file to an NMR service in another bundle over the NMR, where the file is processed. Worked a treat. No pain. I was shocked at how easy this worked. The other thing that shocked me was how simply the Camel-NMR component is to configure. I was expecting tricks like 'you must provide an XML QName to describe your endpoint', or any number of other intellectual land-mines. But the short page of documentation on the Camel website was all I needed.

So where do I go from here? I feel really, really positive about this NMR now, in a way that I didn't before. It's become a technology I can use without having to worry about the mechanics underneath, like how I can drive my car without having to know how my engine works.

I must compliment Guillaume Nodet's work in this area. Taking the hard stuff out, and leaving a very useful and very usable NMR core, is smart thinking.

Tuesday, August 10, 2010

OK OK OK I give up: I’m getting really weary of XML

At a water-cooler in a global Swiss financial institution I had a great chat with an enterprise architect. How’s it going? he asked. The correct answer to such questions is normally just to politely say ‘grand, thanks’. This time though I opened up and let out on what what has been a burning, smoldering, nagging realization: I’m getting really, really tired of XML. It’s everywhere. It’s on my queues, it’s in my SOAP messages, it’s in my integration flows, it’s in my configuration (thanks Spring!). It’s unwieldy, it’s unkind, it’s finicky, it’s bloated and most of the time it’s plain unreadable. In the last few weeks I’ve been debugging some difficult SOAP integration flows, and despite my ability to parse XML in my head, I’m finding it tiresome.

Did XML really deliver what it said it would? All those great features... Schema Validation: a great feature, but rarely powerful enough, and one most users end up disabling for performance reasons. XSLT? Cryptic. XPath: very cool, and very useful. XQuery: never got to use it in anger. Namespaces? Overly complicated and very messy when you start to use large numbers in the same message. Versioning? You have to do it with namespaces, and even then it’s very tricky to roll it out - do it wrong, and your versioning scheme just amounts to rolling out new schema releases that are incompatible with those of the past. Dynamic resolution of namespaces from the web? How many times have I had something work in development only to have it break in pre-production environments because those machines are fire-walled and can’t reach the outside world?! Partial message encryption and digital signatures? Very cool, but one of those advanced features that many people may never get to use.

Rant over.

But what happened at the water cooler? Our conversation became animated, exciting, and in the end cut far too short by pressing and more immediate work issues. Here’s the gedanken experiment: could we provide a set of enterprise services - providing things like access to data and core business task and transactions - *without* using XML all? If so, then what would it look like? Here’s some of the options we discussed in our brief encounter:

Revitalize IDL and the Common Data Representation of CORBA. Great stuff, CORBA (I salute you). However, CORBA got some things wrong: the representations are binary not human readable without appropriate tools. There’s no technology that allows you to XPATH style querying of data (very nice if you want to access just one part of the payload, instead of unmarshaling the whole kit and kaboodle). Also, IDL mappings to languages like Java, C++ and others tended to be clunky and are certainly dated at this stage.

JSON. Who can argue with simplicity of a data format that can be described in a single HTML page, is simple and fast to parse, and is supported by oodles of languages (see http://json.org). Great stuff indeed, and I think this is an area very much worthy of investigation further. Bringing in schema definitions for JSON allows us to be more specific about the content that can be held in JSON payload - another plus [need citation]. I think JSON would need more though to make it truly ‘enterprisey’: for example, need an XPATH-like way to get into parts of the payload. And, how can you do things like partial message encryption?

CSV. Don’t dismiss this one straight away. Comma separated value format, and it’s close cousins ‘fixed width fields’ and ‘name-value pairs’ are arguably the simplest formats around, providing minimal overhead.

Serialized Java Objects. I shudder at the though of restricting my data to a single programming language. No thanks. Enough said.

The options above are not exhaustive, and I have not included other approaches that may be in the pipeline. I have a prescient CORBA-savvy friend who stood firm on all of this ten years ago and said ‘Screw this XML stuff. It’s rubbish and will all blow over when people realize that’ (I paraphrase for emphasis - I’m sure he used slightly different words; however, his passion on the subject was clear). He’s written his own payload format that quite possibly will destroy XML, if it catches on.

In our water cooler discussion we didn’t come to any conclusions; there is no ‘winner’ here. However, there is a strengthening of something I’ve always believed since I got into middleware: there is no single, perfect, complete solution. Heterogeneity is crucial in operating systems, hardware platforms, programming languages, and frameworks; there is no reason why supporting heterogeneity of middleware transports and payloads should be any less important. Understanding that the conditions whereby one approach is better than another is key, and far more valuable than adopting a ‘one-size-fits-all’ approach. And then adopting open standards and technologies that can support this heterogeneity (and here I can smugly wear my ServiceMix, CXF and Camel hat) is the next most important thing. When I think of how CXF does RESTful content negotiation my mouth waters.

And so reinvigorated, I returned to work, and tamed the outstanding XML issues on my plate. As I lid-down for the night, I fear that some day this blog entry will haunt me, but I shall publish and be damned.

Wednesday, June 30, 2010

The six thousand topic man: hosting many topics in the same ActiveMQ

An ActiveMQ user was enquiring about whether ActiveMQ (get it from fusesource.com!) could handle a publish-subscribe messaging architecture with six thousand topics. I've often seen production deployments of FUSE support tens to hundreds of JMS destinations; however, I wasn't quite sure how it would perform with a huge number of topics. Of course, you could reduce your number of topics by introducing message selectors on a smaller number of topics: but that avoids the question rather than answering it up front.

Throwing some questions at the FUSE engineering team got back a lot of confidence that it would indeed work just fine. Still though, I always like to try things and out and see for myself. So, I slapped together a JMS client that wrote 1,000,000 non-persisted messages to 6,000 JMS topics. Then, I put together another JMS client with 6000 consumers, with appropriate session and connection pooling in place. The result? Alarmingly straightforward: it worked just fine! While quietly content with this outcome, it's worth mentioning some background things I did on the Broker...

First, I needed to switch off the default 'thread-per-consumer' model in ActiveMQ. This is done by setting the JVM system variable -Dorg.apache.activemq.UseDedicatedTaskRunner=false - by default, this is set to 'true' in the ./bin/activemq[.bat] startup script. I tested what happens if I leave this to 'true' and indeed I ended up with 6,000 threads in the broker. Ouch. When you disable this setting, the consumers are served from a pool of threads, and total thread count never got above sixty threads.

Next, I configured the Broker's transport connector to use 'nio:' rather than 'tcp:': this means we get a cleaner, more scalable threading model within the broker.

And so, it all works just fine. Dejan Bosonac's article on Python messaging: ActiveMQ and RabbitMQ suggests that you can get up to as much as 32,000 JMS destinations on a single broker; that's good to know, but I can't think of a situation where I'd need that right now.

Tuesday, June 29, 2010

ActiveMQ pooling: a pool by any other name would smell as sweet

Working with a FUSE customer today who voiced confusion at all the different ways you can do a pooled connection factory in ActiveMQ. With confusion comes fear, uncertainty and distrust. Should I be using Jencks? Or should I be using the Spring CachedConnectionFactory? Wasn't there something on the activemq-pool documentation that said it didn't actually pool consumers? Help!

I took this confusion directly to the source, and had an enlightening discussion with James Strachan and Gary Tully at FuseSourceabout this. As a nice drop off, we ended up updating the documentation on Camel ActiveMQ, ActiveMQ Spring Support, and the Javadoc for the org.apache.activemq.pool.PooledConnectionFactory.

The bottom line is this: while Jencks was recommended in the past, it's no longer necessary as you can just use the org.apache.activemq.pool.PooledConnectionFactory from activemq-pool project. Alternatively, you can of course use the Spring CachedConnectionFactory, as outlined in this great article.

Here's the real sneaky thing though: the JavaDoc documentation of org.apache.activemq.pool.PooledConnectionFactory suggested that this connection factory doesn't pool consumers. This is in fact not not a drawback or a failing: it simply doesn't make sense to 'pool' JMS consumers. Maintain a collection of them for concurrent consumption in parallel? Sure! But keep a 'pool' of consumers, whereby you return consumers into the pool for reuse later on when you're finished? Don't do it! It simply doesn't make much sense - and, at a technical level - could end up creating havoc as the 'idle' consumers would still get messages delivered to their internal 'prefetch' queues, where they'd dwell until the consumer is activated again. We updated the documentation to better explain what the PooledConnectionFactory does.

We realized that the confusion comes from a number of outdated resources on the web that mentioned a myriad combination of ways to do pooling. That, and, the fact that it's easy to confuse the different concepts of 'pooling' and simply maintaining a collection of resources: the former involves sharing and reuse, while the latter does not.

Bottom line: forget about Jencks. Use activemq-pool's PooledConnection or Springs CachedConnectionFactory to manage your connection, session and producer pools. And don't go talking about 'consumer pools' - it really doesn't make sense - talk about 'collections of consumers'.

Friday, April 23, 2010

Four things you need to know about the new JBI cluster engine in ServiceMix 4

My thanks to Gert Vanthienen, who helped clear the light today as I dug my way through some odd behaviour from the new JBI clustering mechanism in ServiceMix 4 - odd behaviour that was down to some configuration 'gotchas'. I've written about this clustering mechanism before on an earlier blog; there are, however, a number of tricks you need to know about the way to use this mechanism, and I think it's worth mentioning here.

Make sure you register all JBI endpoints when using the OSGi packaging

Cluster the endpoint the sends to the NMR, not the endpoint that receives (this is counter-intuitive!)

If using a network of brokers, disable conduitSubscriptions on the ActiveMQ network connector.

Give each SMX instance a unique clusterName

.

Note: I still haven't fully made my peace with JBI; however, I must say that I'm mellowing a little now that ServiceMix 4 allows you to do simpler packaging (as bundles rather than as service-assemblies) and provides some optimizations to get rid of excessive NMR traffic. Anyway: on to that list:

First, make sure you register your endpoints as JBI endpoints. If you're using the new OSGi packaging of JBI endpoints (which I recommend!) then make sure you add the appropriate EndpointExport to your Spring context. You would of course be doing this for non-clustered JBI endpoints, but you do also need to be doing this for clustered endpoints as well.


<bean class="org.apache.servicemix.common.osgi.EndpointExporter" />

Second, cluster the producer, not the consumer. You would think that 'clustering' would involve some kind of configuration for each of the replicated endpoints 'listening' on the NMR. However, in the SMX 4 implementation, it goes the other way. You need to cluster the endpoint that's putting the information onto the NMR. For example, if you have a file poller endpoint, and you want to send the incoming file to a cluster of bean consumers, you need to 'cluster config' the file-poller using the OsgiSimpleClusterRegistration bean. This is entirely non obvious; here's how to do it. On the file-poller, do:


<beans xmlns="http://www.springframework.org/schema/beans"
 xmlns:file="http://servicemix.apache.org/file/1.0"
 xmlns:clu="http://fusesource.com/clusterdemo"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://servicemix.apache.org/file/1.0 http://servicemix.apache.org/schema/servicemix-file-2009.01.xsd
       http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd">

 <file:poller id="filePoller" service="clu:file-poller" endpoint="endpoint"
  targetService="clu:payload-receiver" file="/tmp/incomingXML" />

 <bean
  class="org.apache.servicemix.jbi.cluster.engine.OsgiSimpleClusterRegistration">
  <property name="endpoint" ref="filePoller" />
 </bean>

  <bean class="org.apache.servicemix.common.osgi.EndpointExporter" />
  
</beans>

and for the 'clustered' bean endpoint, you can omit the OsgiSimpleClusterRegistration bean:


<beans xmlns="http://www.springframework.org/schema/beans"
 xmlns:bean="http://servicemix.apache.org/bean/1.0" xmlns:clu="http://fusesource.com/clusterdemo"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://servicemix.apache.org/bean/1.0 http://servicemix.apache.org/schema/servicemix-bean-2010.01.0-fuse-01-00.xsd
       http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd">

 <bean:endpoint id="beanConsumer" service="clu:payload-receiver" endpoint="endpoint"
  bean="#myBean" />

 <bean id="myBean" class="org.file.processor.su.MyBean" />

  <bean class="org.apache.servicemix.common.osgi.EndpointExporter" />
 
</beans>

Third, and this is the trickiest, disable conduit subscriptions when using networked brokers. If you are using a network of embedded brokers for your internal cluster queue, you must disable conduit subscriptions to ensure that the use of message selectors is respected across the different consumers listening on the cluster queue. How do do this? Simply set conduitSubscriptions=false on your network connector; something like this:


        <networkConnectors>
            <networkConnector name="brokerA" uri="static://(tcp://localhost:61616)" duplex="true"
            conduitSubscriptions="false"/>

And, lastly, you do need to give each instance of SMX a unique name for clustering purposes. You can do this be dropping a file called 'org.apache.servicemix.jbi.cluster.engine.config.cfg' into the etc/ directory of your ServiceMix 4 instance. In this properties file, you can provide a variable called 'clusterName' to identify the this ServiceMix instance in the cluster.


clusterName = smx1

And that's it!

Wednesday, February 10, 2010

New book on CXF by Packt

Great to see that Packt have published a book on CXF. As a project, CXF has been consistently establishing itself over the last few years, with a growing community, and, importantly, an increased footprint in production systems. That a publishing house has finally gotten around to putting a book together on CXF is a great endorsement of the project’s success. A copy of the book is winging it’s way to me as I write, so I look forward to providing a fuller review when I get it.

*Sigh*. This is the book I would have liked to have written if I had had the time ;)

Ade On Middleware