Recently I’m thinking about how we process messages in networked software.

Consider this Java snippet, for example.

boolean isPassNode(final JsonNode node) {
  if (node.isObject()) {
    final JsonNode child = node.get(Constants.TYPE_FIELD);
    if (child != null) {
      if (child.isTextual()) {
        return Constants.PASS_TYPE.equals(child.asText());
  return false;

Which asks: Is this a JSON object with a top-level “Type” field whose value is the string “Pass”?

Is this a sane thing to want to do? And if so, what’s a good way to do it?

In an ideal world… · You’d never need to go fishing around through fields like this. In fact, you’d never have to think about “messages”, you’d just write methods and they’d get called, maybe across the network, with the expected argument types. Or, you’d call strongly-typed functions and if they’re somewhere else, your infrastructure would take care of the required message wrangling.

If that’s your world, you can stop reading now. Among other things, you probably shouldn’t be using JSON (see below).

It’s not mine. Maybe because I mostly work on infrastructure? I’m regularly dealing with byte blobs arriving over the Internet that are alleged to be JSON and are expected to contain certain name/value field patterns. Those patterns represent types in the minds of their creators and I need to feed part or all of the payload to strongly-typed functions at my end. But this is the Internet, so who knows what those blobs really hold?

JSON, you say? · For low-rent interop, sure. But if both ends of the spectrum know exactly what the bits on the wire represent, you have lots of probably-better choices: ProtoBufs, Amazon Ion, Cap’n Proto, FlatBuffers, SBE, probably more. (For the purposes of this note, I didn’t research which are actively maintained, nor any comparative benchmarks.) These things might serialize/deserialize measurably faster and that might matter to your app performance. They have richer type systems than JSON (admittedly a low bar), which quite likely will help. And they’re almost certainly more compact on the wire, which is always a good thing.

Me, my day-job is heterogeneous distributed infrastructure and, well, that means a lot of JSON. (But maybe soon some Amazon Ion (mentioned above) and/or CBOR).

What are messages? · Sometimes they’re a number or a string; sometimes a little package of numbers and strings; sometimes deeply-nested document-flavored things.

Sometimes they represent business transactions, sometimes database records, sometimes ledger entries, sometimes cloud infrastructure events; and that’s obviously not all.

Internet messaging lessons · These are just some things experience has convinced me are usually true:

  1. When you send a message across the Internet, you can’t control how it’ll be interpreted or used.

  2. When you receive a message from the Internet, you can’t assume that the sender was sane or that it follows any implicit or explicit schema.

  3. Some message transports distinguish between “message body” and “message metadata”, but that distinction is bullshit; by which I mean rarely of practical use.

  4. Unfortunately, messaging compute time is intrinsically O(N) in the number of messages. Suck it up, deal with it, and go looking for grungy low-level optimizations, not elegant sublinear algorithms.

OJM? · By which I mean “Object-JSON Mapping”, in the spirit of ORM. I’ve always disliked ORM; for some reason I always end up feeling like I’m struggling against the mapper, rather than working with it.

When you’re in Programming Language X and look up “JSON handling in X”, they assume that OJM is what you want, and proudly show how easy it is.

The idea is, you have a class (or prototype, or struct, or whatever) declaration, and the JSON is just a serialization. So you can interchange between messages and objects automagically, right? That’s what Java’s ObjectMapper, just to give one example, is about.

And over in GoLang territory, there’s json.Unmarshal(b, &m), and I quote: “If b contains valid JSON that fits in m, after the call err will be nil and the data from b will have been stored in the struct m”. That is, as long as all your JSON field names have initial capitals. But hey, check out JSON, interfaces, and go generate by Francesc Campoy Flores, which turns json.Unmarshal’s object-mapping dials up to 11.

Message probing · Maybe I’m weird, but I’ve basically never found OJM very useful.

Thus we return to the code at the top of this page. It’s the kind of thing you do in infrastructure software; peek into a message and figure out what it is so you can route it, or log it, or index it, or apply security protocols to part of the content, or set off a loud urgent alarm.

Or, maybe, just maybe, load it into a native object and pass the whole thing to some method; but usually not.

Typically, you have to do these things a very large number of times per second.

The problem here is that when you’re building infrastructure you’re usually using a statically-typed language, which will have a severe impedance mismatch with JSON’s happy-go-lucky attitude: the structure of the data is the structure given by the characters in the JSON text, which is definitely neither known nor deducible at compile time.

Fortunately, the tools often aren’t bad. In Java, Jackson will construct a JsonNode tree for you at acceptable speed and you can poke around as illustrated above, or even deploy jayway JsonPath to good effect.

Of course Java’s not the only infrastructure-building choice. It looks like both Swift and Rust provide straightforward ways to look at ad-hoc JSON through a statically-typed lens.

Go, on the other hand, does not make this easy; you have to construct a twisty maze of little interface{} and type-assertion logic, producing code that in my experience is pretty opaque.

Dynamic languages? · Well, of course. In Ruby and Python, poking around inside weakly-understood data structures is the most natural thing in the world, and the impedance mismatch with JSON is barely visible. And of course we all know what the J and S in JSON stand for.

But we mostly don’t use those for infrastructure.

Conclusion · Marrying messages and objects is cool, I suppose. But next time you design a new language, make it easy to hook up with Internet messages and get to know them the way they really are, before you commit to marriage.


Comment feed for ongoing:Comments feed

From: Jason Kemp (Oct 24 2016, at 06:55)

OJM is very useful when showing what's in the J to a human being, in a statically typed language. At its simplest, we take the JSON, turn it into a data object, and display the results.

But humans don't understand seconds since the Unix epoch, or item IDs or other sundry concepts that make sense to computers and those that program them. Likewise, merging the data across several JSON docs becomes harder and harder with just text.

All those things are easier to manipulate in objects.


From: eerie quark doll (Oct 24 2016, at 14:56)

Is JsonNode lazily constructed?

It seems on initial navel stare that something could be saved by only ~deserializing portions of the potential-JSON on path query (... especially in cases where the subset of desired data does exist, is small in comparison to larger glut of representation, and is located path-wise towards the earlier features.)



From: Hugh Fisher (Oct 26 2016, at 02:58)

Do you think we'll see any more attempts at sending code along with data?

PostScript could do some really awesome things with procedural content and dynamic adaptation to the printer because it was a full-fledged programming language. But it could also do horrible things, which I guess is why PDF is the interchange standard now.

Java lets you serialize the members of an object but not the class that defines the object behaviour (AFAIK), which always seemed to me to be Missing The Point.


From: palesz (Oct 29 2016, at 08:15)

That's exactly the reason why I like Clojure. It makes dealing with the json non-sense easier.


author · Dad · software · colophon · rights
picture of the day
October 23, 2016
· Technology (87 fragments)
· · Software (71 more)

By .

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.