Wednesday, August 25, 2010

Karaf's Fabulous Features, and, what you can do to make them even more fabulous

I've been working for sometime with the Karaf OSGi Shell, through my exposure to the great ServiceMix 4. The features mechanism provided by Karaf/ServiceMix allows you to leverage the modularity and control of OSGi bundles by grouping sets of bundles - containing your Camel routes, web services, business logic, RESTful services - into easily manageable 'features'. In this post, I'm going to talk a little about what features are, why they're good, and what we could do in Karaf to make their usage even easier. If you like the proposals, please, follow the JIRA links and vote for them so that we can get the community behind it. These enhancements are small, but, based on my experience, they could have a big impact on ease-of-use and adoption of a very powerful deployment mechanism.

Features: a group of bundles by any other name would sell as sweet.


A feature is just a set of bundles, described using a very simple XML file - this file is called a 'feature descriptor', and is also referred to as a 'feature repository'. Here's an example that describes a single feature, 'feature-b', that depends on another feature 'feature-a', which is itself described in another repository. I've thrown in some default configuration as well that will be synched into the OSGi Config Admin service; don't worry about this for now, I'm just showing off.



<features name="feature-b-0.0.1">

<repository>mvn:com.fusesource/common-features/0.0.1/xml/features</repository>

<feature name="feature-b" version="0.0.1>
<feature version="0.0.1">feature-a</feature>
<bundle>mvn:com.fusesource/bundle-b/0.0.1</bundle>
<config name="feature-b">
a=1
b=2
</config>
</feature>

</features>



Features can have sensible names, like 'InsuranceQuoteService' or 'CustomerUpdatesFlow', and they can be versioned so that you can track their evolution and upgrade or rollback with ease. Features can 'depend on' other features, which means that when you install a feature, it and all of its dependent features get installed too. This is very neat: how many times have you realized that, yet again, all your SOA, RESTful services and integration flows all rely on the same common backend code? You can describe these dependencies easily and elegantly using Karaf features. And, there's a set of tools that allow you to suck down all of your feature dependencies from Maven servers onto a local drive in an elegant directory structure - automatically, as part of your build - so that you can tar.gz or .zip it all and and deliver your feature into the production environment. This last point is so important: these techniques allow you to use all of your Maven-style bundle URIs on production machines that don't have Maven installed and don't have access to the external Internet.


Making features really, really easy to use

While all of this goodness is there for the taking in Karaf, there are a number of small improvements that, I think, will go a big way to ease the adoption of the features mechanism. My own usage of 'features' is based on what I've learnt and observed from the Karaf source itself: I want to make it easier for other developers to create, package and deploy features. And so, I've created today a number of issues on the Karaf JIRA to get the ball rolling.


  • KARAF-165: Create an improved Maven feature-assembly plugin. Right now, to make a feature I've got to add almost a hundred lines of Maven verbage to my pom.xml in order to assemble a feature. I've got to use the attach-artifact goal of the org.codehaus.mojo/build-helper-maven-plugin to deploy my features file into Maven. I've got to use the add-features-to-repo goal from the org.apache.karaf.tooling/features-maven-plugin to suck down all the dependent bundles. I've got to a whole load of other stuff to perform the packaging to .tar.gz and .zip. The problem here is that I'm using a whole load of generic plugins to do a very specific job, and I'm having to tell the plugins what to do instead of telling them what I want done. I'd prefer to have a single more declarative plugin to do this. It might look like this:


    <plugin>
    <groupId>org.apache.karaf.tooling</groupId>
    <artifactId>feature-assembly-plugin</artifactId>
    <version>2.2.0</version>
    <executions>
    <execution>
    <id>create-repo</id>
    <phase>generate-resources</phase>
    <goals>
    <goal>create-repo</goal>
    </goals>
    <configuration>
    <!-- Specify the feature file to use. -->
    <featureFile>file:${basedir}/target/classes/features.xml</featureFile>

    <!-- Specify what features to include. This is actually optional: if no features
    are specified, then include all features in the file by default. -->
    <features>
    <feature>feature-a</feature>
    </features>
    </configuration>
    </execution>
    </executions>
    </plugin>


    The plugin should produce a .tar.gz and .zip file, containing the feature descriptor (and all dependent descriptors) and all bundles (and dependent bundles)in a Maven-style directory, similar to the system/ directory currently used in Karaf. Note that this plugin doesn't need you to list out all the feature repositories / descriptors that your feature file may transitively include - it will detect these dependencies at runtime and work out the details.


  • KARAF-151: We need to add a hot-deploy mechanism, that will detect feature assemblies dropped in the deploy/ directory, and unarchive the file into, say, a contrib/ directory. 'contrib/' would be the application-level equivalent to the current 'system/' directory; it would contain all bundles that are required to run your features. After exploding a feature assembly into the contrib/ directory, Karaf should scan the directory structure for feature repository files, and add these dynamically to the runtime.



The result? If we implement these two enhancements to Karaf, we'll end up with a double-whammy. Developers will find it incredibly easy to create feature assemblies. Administrators and operations folk will be delighted that all the have to do is copy a feature assembly to the deploy directory, and then ssh into the Karaf/ServiceMix runtime and list, install and upgrade or rollback features using the 'feature' commands.

If you like these proposals, please vote for the issues!

Monday, August 23, 2010

Seeing the wood for the trees: a <treeConnector> for ActiveMQ?

I have proposed a treeConnector for ActiveMQ - see the JIRA issue here and please vote if you like it. Am including the text of the proposal below.

The ActiveMQ network connector is excellent at facilitating self-organizing networks of ActiveMQ brokers with dynamically discovered routing, and can be used in situations where network availability is sporadic and not guaranteed. Network connectors can be used along with master-slave pairs to create a distributed ‘messaging fabric’. The network connector can also accommodate the creation of hierarchical networks, however, I believe there is a good argument for treating tree topologies as a special case in their own right. In this note, I’m going to describe why trees should be treated as a special case, and describe what a treeConnector might look like.

Hierarchical (or ‘tree’) networks are a special case of the abstract notion of a network; they have numerous applications in the real world in cross-geography, wide-area deployments. Consider a retail organization with 1,000’s of stores, who wish to send and receive information to and from head-quarters (HQ) and the stores: you can envisage a hierarchy of ActiveMQ brokers: one for HQ, one for each of the regions (e.g. Ireland, UK, France, Germany), and then a broker in each store/outlet in each of the regions. When HQ wants to send a message to a store, it should be able to write a message to the store’s queue and have it dynamically routed to the store via the regional broker. Likewise, if a store wants to send a message to HQ, it should be able to write to a local queue in the store and have the message dynamically routed to HQ.

Right now in ActiveMQ, there are a number of issues with using a network connector to achieve a tree topology:

  • Spillage of consumer advisories. When a consumer connects to a store, an advisory message is generated and sent to all brokers within the networkTTL range. These advisories don’t just travel up the tree to the top (HQ): they can also spill back down the tree to peers who are simply not interested. We have seen this in a network with 1,000 brokers connected to a regional broker, and a networkTTL of two: the advisory gets sent to the regional broker, HQ, *and* the 999 peers of the broker.


  • Resource wastage (as a consequence of spillage). When a consumer advisory from a store broker (say, broker ‘A’) reaches another broker within the networkTTL (say, broker ‘B’), then ‘B’ creates a subscription for the destination (to do forwarding) *and* can result in the allocation of a thread for this destination. So, it’s possible that Broker B will allocate a thread for the destination ‘BrokerA.Incoming’, despite the fact that it will never receive or send to this queue. If there are only a small number of peers at this level then this is not a problem; however, if there are many peers (as in the case of 1000 stores per region) then this will be noticeable.


  • Sensitivity to networkTTL. In order to minimize spillage and resource wastage, you need to se the networkTTL on each broker to the distance between the broker and the root - in this example, 2. However, a later reconfiguration of the network could result in this setting being too low (which means messages won’t go the distance) or too high (which means you get inadvertent spillage)

These issues can be addressed if we create a new tree connector for ActiveMQ:



  1. Brokers would identify themselves as ‘leaf’, ‘branch’ or ‘root’ nodes. There should only ever be one root. If a broker has not been identified as either ‘leaf’ or ‘branch’ then it should consider itself to be the root.


  2. Tree connectors would be configured using a ‘many to one’ approach. Leaf brokers would configure a tree connector to their branch. Branch brokers would configure a tree connector to the root. The root shouldn’t need to configure anything, except of course a transport listener.


  3. So, a branch broker for the ‘UK’ region might have the following (to connect to headquarters):


    <treeConnector nodetype="branch" uri="tcp://headquarters.myorg.com:61616"/>


    And, a leaf broker at a store or outlet in the UK might have the following (to connect to the ‘UK’ region’):


    <treeConnector nodetype="leaf" uri="tcp://uk.myorg.com:61616"/>


  4. If a consumer connects to a leaf node, then the consumer advisory travels up the tree, all the way to root: the concept of networkTTL is ignored. If this advisory travels through a ‘branch’ then the branch delegates the advisory upwards. Branches do not pass advisory messages from children to their other child nodes.


  5. If a consumer connects to a branch node, then the consumer advisory travels up the tree. Additionally, the branch broker will send the advisory downwards to its children.


  6. If a consumer connects to the root, then the consumer advisory is filtered downwards to all children.




In this way, we reduce spillage of advisory messages, and ensure that trees can self organize without too much intervention or worrying about the network time-to-live.

Thoughts?

Friday, August 13, 2010

An easy, useful, NMR: Monsieur Nodet, vous êtes une légende.

I was discussing the ServiceMix NMR component this week with a colleague; and my interest was piqued enough to take a fresh look at this little fella. I grew to dislike the JBI NMR some time ago, and have written about these feelings a number of times on this blog. But its time to draw a line under those dark times, and look to the future. And, in the brave new world of ServiceMix 4, the new NMR becomes lightweight, liberating, and startlingly useful.

Rather than explain the technology first, let's talk about the problem that it might solve. In ServiceMix, you can deploy integration or business logic as OSGi bundles - this much we know. Now, say you want to send some information between two bundles: how can you do it? There are a number of options open to you - here's some of the most popular options.

  • Use OSGi services. I'm a huge fan of the POJO-based approach that comes with OSGi services. It's easy to use. Drawbacks? There are two I can think about - whether they're relevant to your use case will depend on, well, your use-case. First, the call is always going to be executed synchronously on the callers thread. If you've got lots of throughput going through the system, this may not be desirable. Second, in order to invoke on the OSGi service, you will need to marshal your incoming payload into POJO objects: maybe this marshaling is something you simply don't want to do. Maybe you just want to pass data through the system as fast as you can.

  • Use JMS queues. We love ActiveMQ, and ServiceMix is tightly integrated with ActiveMQ. But it seems that forcing the use of a JMS queue just to send some information from A to B, in the same JVM, feels like overkill - despite all the underlying optimizations available in ActiveMQ for in-memory messaging.


  • Use the NMR. Ching! The penny drops. The ServiceMix NMR allows you to send anything to another bundle, and allows you to do this either synchronously or asynchronously. So: you get choice on whether you want your code to be handled on a separate thread. And (and this is the best bit) the NMR, unlike the old JBI NMR, does not demand that the payload be XML. You could send your granny through. If your Granny was a Java Object. If it wouldn't cause too much confusion, I'd vote for a name change, and call this the Denormalized Message Router (DMR), as it no longer tries to enforce a canonical format.


Playing with this, I put together a little demo in a few minutes where one bundle did a file pickup (of a non-XML file) via Camel, sent the file to an NMR service in another bundle over the NMR, where the file is processed. Worked a treat. No pain. I was shocked at how easy this worked. The other thing that shocked me was how simply the Camel-NMR component is to configure. I was expecting tricks like 'you must provide an XML QName to describe your endpoint', or any number of other intellectual land-mines. But the short page of documentation on the Camel website was all I needed.

So where do I go from here? I feel really, really positive about this NMR now, in a way that I didn't before. It's become a technology I can use without having to worry about the mechanics underneath, like how I can drive my car without having to know how my engine works.

I must compliment Guillaume Nodet's work in this area. Taking the hard stuff out, and leaving a very useful and very usable NMR core, is smart thinking.

Tuesday, August 10, 2010

OK OK OK I give up: I’m getting really weary of XML

At a water-cooler in a global Swiss financial institution I had a great chat with an enterprise architect. How’s it going? he asked. The correct answer to such questions is normally just to politely say ‘grand, thanks’. This time though I opened up and let out on what what has been a burning, smoldering, nagging realization: I’m getting really, really tired of XML. It’s everywhere. It’s on my queues, it’s in my SOAP messages, it’s in my integration flows, it’s in my configuration (thanks Spring!). It’s unwieldy, it’s unkind, it’s finicky, it’s bloated and most of the time it’s plain unreadable. In the last few weeks I’ve been debugging some difficult SOAP integration flows, and despite my ability to parse XML in my head, I’m finding it tiresome.

Did XML really deliver what it said it would? All those great features... Schema Validation: a great feature, but rarely powerful enough, and one most users end up disabling for performance reasons. XSLT? Cryptic. XPath: very cool, and very useful. XQuery: never got to use it in anger. Namespaces? Overly complicated and very messy when you start to use large numbers in the same message. Versioning? You have to do it with namespaces, and even then it’s very tricky to roll it out - do it wrong, and your versioning scheme just amounts to rolling out new schema releases that are incompatible with those of the past. Dynamic resolution of namespaces from the web? How many times have I had something work in development only to have it break in pre-production environments because those machines are fire-walled and can’t reach the outside world?! Partial message encryption and digital signatures? Very cool, but one of those advanced features that many people may never get to use.

Rant over.

But what happened at the water cooler? Our conversation became animated, exciting, and in the end cut far too short by pressing and more immediate work issues. Here’s the gedanken experiment: could we provide a set of enterprise services - providing things like access to data and core business task and transactions - *without* using XML all? If so, then what would it look like? Here’s some of the options we discussed in our brief encounter:

  • Revitalize IDL and the Common Data Representation of CORBA. Great stuff, CORBA (I salute you). However, CORBA got some things wrong: the representations are binary not human readable without appropriate tools. There’s no technology that allows you to XPATH style querying of data (very nice if you want to access just one part of the payload, instead of unmarshaling the whole kit and kaboodle). Also, IDL mappings to languages like Java, C++ and others tended to be clunky and are certainly dated at this stage.

  • JSON. Who can argue with simplicity of a data format that can be described in a single HTML page, is simple and fast to parse, and is supported by oodles of languages (see http://json.org). Great stuff indeed, and I think this is an area very much worthy of investigation further. Bringing in schema definitions for JSON allows us to be more specific about the content that can be held in JSON payload - another plus [need citation]. I think JSON would need more though to make it truly ‘enterprisey’: for example, need an XPATH-like way to get into parts of the payload. And, how can you do things like partial message encryption?

  • CSV. Don’t dismiss this one straight away. Comma separated value format, and it’s close cousins ‘fixed width fields’ and ‘name-value pairs’ are arguably the simplest formats around, providing minimal overhead.

  • Serialized Java Objects. I shudder at the though of restricting my data to a single programming language. No thanks. Enough said.

The options above are not exhaustive, and I have not included other approaches that may be in the pipeline. I have a prescient CORBA-savvy friend who stood firm on all of this ten years ago and said ‘Screw this XML stuff. It’s rubbish and will all blow over when people realize that’ (I paraphrase for emphasis - I’m sure he used slightly different words; however, his passion on the subject was clear). He’s written his own payload format that quite possibly will destroy XML, if it catches on.

In our water cooler discussion we didn’t come to any conclusions; there is no ‘winner’ here. However, there is a strengthening of something I’ve always believed since I got into middleware: there is no single, perfect, complete solution. Heterogeneity is crucial in operating systems, hardware platforms, programming languages, and frameworks; there is no reason why supporting heterogeneity of middleware transports and payloads should be any less important. Understanding that the conditions whereby one approach is better than another is key, and far more valuable than adopting a ‘one-size-fits-all’ approach. And then adopting open standards and technologies that can support this heterogeneity (and here I can smugly wear my ServiceMix, CXF and Camel hat) is the next most important thing. When I think of how CXF does RESTful content negotiation my mouth waters.

And so reinvigorated, I returned to work, and tamed the outstanding XML issues on my plate. As I lid-down for the night, I fear that some day this blog entry will haunt me, but I shall publish and be damned.