At AWS, I’m now in the Serverless organization, which in 2018 is big fun. Someone asked me to check out the work being done at the Cloud Native Computing Foundation (CNCF), particularly around CloudEvents. There’s been a particularly interesting argument going on around there that I think has useful lessons for anyone who cares about designing network protocols.
I’m naturally interested in Eventing because it’s central, not just to serverless computing, but to modern application construction in general. Events are a good way to think about a lot of different things: Actual events from the real world (“Garage door opened”), infrastructure happenings (“database failed over”), user activities (“Leila signed in”), or data movement (“Object 894t7 uploaded to bucket JXYT8-33”). Events are nice, particularly in well-designed modern apps, because among other things you can feed them to functions and drop them onto messaging queues.
My first project at AWS was CloudWatch Events, and one of the essential things about a CloudWatch Event is that it’s got a fixed JSON wrapper with a bunch of top-level fields that are guaranteed to be there. We never wrote down a formal spec but there’s a reasonably straightforward description here. CloudWatch events JSON objects, and that’s all they are; nothing fancy about them.
Evidence suggests those choices were good; the service has been pretty successful; loads and loads of customers doing all sorts of basic meat-and-potatoes automation, and then some pretty imaginative apps combining built-in and custom events. So, I have a lot of sympathy with the CNCF work.
CNCF CloudEvents have an abstract definition not tied to any data format, with the idea that there could be multiple different representations, although most examples and conversations still revolve around JSON.
The Pull Request ·
The problem is summed up Pull Request #277 and issue #294. It’s basically about whether it’s OK to put unknown fields not
defined in the spec (“extensions”) into the top level of a CloudEvent, or instead, banish them to an
container field. It’s not that crucial an issue and I can see both sides of it.
The argument being advanced in issue 294 and by Thomas Bouldin in Codelab: Versioning is Hard (aka the “SEF theorem”) is that if you allow adding “extensions” at the top level, that might break some software. In particular, it’s going to break anything that relies on Protocol Buffers (everyone says “protobufs”). Because they’re not textual and self-representing but binary and rely on an external schema to help software unpick the binary bits; and that doesn’t leave room for any old random new bits to be dropped into the top-level record.
It turns out that some organizations have bought into protobufs heavily; for the purposes of this discussion it doesn’t matter what their reasons were, or whether those reasons were good. So dealing with CloudEvents is going to be easier for them if they can rely on mapping back and forth between CloudEvents and JSON. Which they can’t if extraneous “extensions” might show up at the top level.
Lesson 1: The Internet isn’t abstract · I think the CloudEvents committee probably made a mistake when they went with the abstract-event formulation and the notion that it could have multiple representations. Because that’s not how the Internet works. The key RFCs that define the important protocols don’t talk about sending abstract messages back and forth, they describe actual real byte patterns that make up IP packets and HTTP headers and email addresses and DNS messages and message payloads in XML and JSON. And that’s as it should be.
Time after time, people have got the idea of sharing abstract objects across the Internet, and time after time it’s led to problems of one sort or another. There was a time when a lot of people thought that something like CORBA or DCOM or WCF would make objects-on-the-wire not only possible but straightforward, and free us from the tyranny of thinking about the bits and bytes in message formats. But as you may have noticed, those things are pretty well gone and the Web has outlived them; its klunky old ad-hoc tags and headers are how everything works, mostly.
To make this concrete: If CNCF had started out saying “A CloudEvents is a bag of bits which is a JSON Text” or “…which is a protobuf message”, well, issue #294 just wouldn’t ever have arisen. And neither choice would have been crazy.
Lesson 2: S, E, and F · Bouldin’s Versioning is Hard introduces the “SEF Theorem” where “S” is for Structured, by which he means “you need an external schema and you can’t just throw in extra fields”, “E” is for Extensible, i.e. you can go ahead and put in unannounced foreign fields without changing versions, and “F” is for Forward Compatible, which means you can add versions without breaking existing dependencies.
Given the choice, I’ll take “E” and “F” any day. When you’re pumping messages around the Internet between heterogeneous codebases built by people who don’t know each other, shit is gonna happen. That’s the whole basis of the Web: You can safely ignore an HTTP header or HTML tag you don’t understand, and nothing breaks. It’s great because it allows people to just try stuff out, and the useful stuff catches on while the bad ideas don’t break anything.
So what happened? · The committee took the trade-off I like. Which means you can extend CloudEvents pretty freely (good), but you can’t use protobufs and JSON interchangably and expect things to work (unfortunate). This way is less brittle but a little harder to deal with. Not gonna say that the right choice is a slam-dunk, but it is the right choice.