Tuesday, August 10, 2010

OK OK OK I give up: I’m getting really weary of XML

At a water-cooler in a global Swiss financial institution I had a great chat with an enterprise architect. How’s it going? he asked. The correct answer to such questions is normally just to politely say ‘grand, thanks’. This time though I opened up and let out on what what has been a burning, smoldering, nagging realization: I’m getting really, really tired of XML. It’s everywhere. It’s on my queues, it’s in my SOAP messages, it’s in my integration flows, it’s in my configuration (thanks Spring!). It’s unwieldy, it’s unkind, it’s finicky, it’s bloated and most of the time it’s plain unreadable. In the last few weeks I’ve been debugging some difficult SOAP integration flows, and despite my ability to parse XML in my head, I’m finding it tiresome.

Did XML really deliver what it said it would? All those great features... Schema Validation: a great feature, but rarely powerful enough, and one most users end up disabling for performance reasons. XSLT? Cryptic. XPath: very cool, and very useful. XQuery: never got to use it in anger. Namespaces? Overly complicated and very messy when you start to use large numbers in the same message. Versioning? You have to do it with namespaces, and even then it’s very tricky to roll it out - do it wrong, and your versioning scheme just amounts to rolling out new schema releases that are incompatible with those of the past. Dynamic resolution of namespaces from the web? How many times have I had something work in development only to have it break in pre-production environments because those machines are fire-walled and can’t reach the outside world?! Partial message encryption and digital signatures? Very cool, but one of those advanced features that many people may never get to use.

Rant over.

But what happened at the water cooler? Our conversation became animated, exciting, and in the end cut far too short by pressing and more immediate work issues. Here’s the gedanken experiment: could we provide a set of enterprise services - providing things like access to data and core business task and transactions - *without* using XML all? If so, then what would it look like? Here’s some of the options we discussed in our brief encounter:

  • Revitalize IDL and the Common Data Representation of CORBA. Great stuff, CORBA (I salute you). However, CORBA got some things wrong: the representations are binary not human readable without appropriate tools. There’s no technology that allows you to XPATH style querying of data (very nice if you want to access just one part of the payload, instead of unmarshaling the whole kit and kaboodle). Also, IDL mappings to languages like Java, C++ and others tended to be clunky and are certainly dated at this stage.

  • JSON. Who can argue with simplicity of a data format that can be described in a single HTML page, is simple and fast to parse, and is supported by oodles of languages (see http://json.org). Great stuff indeed, and I think this is an area very much worthy of investigation further. Bringing in schema definitions for JSON allows us to be more specific about the content that can be held in JSON payload - another plus [need citation]. I think JSON would need more though to make it truly ‘enterprisey’: for example, need an XPATH-like way to get into parts of the payload. And, how can you do things like partial message encryption?

  • CSV. Don’t dismiss this one straight away. Comma separated value format, and it’s close cousins ‘fixed width fields’ and ‘name-value pairs’ are arguably the simplest formats around, providing minimal overhead.

  • Serialized Java Objects. I shudder at the though of restricting my data to a single programming language. No thanks. Enough said.

The options above are not exhaustive, and I have not included other approaches that may be in the pipeline. I have a prescient CORBA-savvy friend who stood firm on all of this ten years ago and said ‘Screw this XML stuff. It’s rubbish and will all blow over when people realize that’ (I paraphrase for emphasis - I’m sure he used slightly different words; however, his passion on the subject was clear). He’s written his own payload format that quite possibly will destroy XML, if it catches on.

In our water cooler discussion we didn’t come to any conclusions; there is no ‘winner’ here. However, there is a strengthening of something I’ve always believed since I got into middleware: there is no single, perfect, complete solution. Heterogeneity is crucial in operating systems, hardware platforms, programming languages, and frameworks; there is no reason why supporting heterogeneity of middleware transports and payloads should be any less important. Understanding that the conditions whereby one approach is better than another is key, and far more valuable than adopting a ‘one-size-fits-all’ approach. And then adopting open standards and technologies that can support this heterogeneity (and here I can smugly wear my ServiceMix, CXF and Camel hat) is the next most important thing. When I think of how CXF does RESTful content negotiation my mouth waters.

And so reinvigorated, I returned to work, and tamed the outstanding XML issues on my plate. As I lid-down for the night, I fear that some day this blog entry will haunt me, but I shall publish and be damned.


RGUIG Saad said...

very nice article... about XML,you're totally write it is also spreading out very quickly through my configuration files (no more *.properties) log4j.xml, hibernate+spring XML, activeMQ in XML....

Ade said...

My old friend Aman Kohli has rightly suggested Google Protocol Buffers as a binary payload that's similar to IDL/CDR; nice one Aman. Skipped my mind.

James Strachan said...

Great rant :) XML does indeed suck in many ways. BTW XQuery is pretty good stuff! Great replacement for XSLT.

JSON is doing pretty good in web-ish fields as a nice simple lightweight syntax. (Incidentally you can use JavaScript instead of XPath/XQuery to filter/transform the JSON as JSON is a subset of JavaScript :).

As an aside I see HTML as a good replacement for lots of XML-ish documents as its nicely readable with some CSS :).

Other than JSON for me Google Protocol Buffers is very interesting in that

(i) its simple to parse

(ii) anyone can parse any message stream without a-priori knowledge of schemas, so its version independent in a sense - the only thing that a schema is required for is to attach a field index (say field 3) to an identifier in your code ("surname" in your Person object or whatever)

(iii) its compact; it doesn't send big strings for each field name like JSON/XML so its very compact and fast, particularly when dealing with numeric data

dave hollander said...

Thanks for getting the details right -- so many rants are just wrong.

Anyway, there is a big difference between "interchange" and transport within the same application framework.

When i own both ends of the transport, then nearly any technique is OK. When someone else has to understand what i am trying to send, then the game changes. I consider that the interchange game.

Does this distinction make sense? Does it help describe where JSON/protocol buffer style solutions may be more effective?

Ade said...

Thanks Dave - and you're right. I'm focused on interchange. If you own the consumer of your service and the provider, then you can do whatever you like on the wire. In particular, the 'code-on-demand' principal of REST means that if you're using true REST, then when you change the interface, you can simply change the user-facing at the same time and the next time the user uses the service they download the updates and never get hassled by the change. Neat.

Still, in a non-RESTful interaction, where you don't have control over either the data providers or the data consumers, then we still have this interchange problem. *Sigh*