I’m pretty sure that event-driven software is already a big deal and is going to get bigger. Events, de facto, are JSON blobs, and in general we’d like to make them easier to consume in computer programs. I’ve written before about how it’s difficult to specify JSON dialects, and also about Schemaless message processing. It turns out there’s good news from the world of JSON Schema, but the problem is far from solved.
“Event-driven”? · It’s not exactly a new idea; I first heard it back when I used to program GUIs where of course everything is an event, and your code is all about handling them in callbacks. But that’s not what I’m talking about here. From an AWS-centric point of view, I’m talking about the events that trigger Lambda functions, or get matched and routed by CloudWatch Events, or go through the SNS pub/sub machinery.
As far as I know, there are really only two ways to connect software together: API calls (I send you a request and wait for your response) and events (I fire off a message and whoever gets it does whatever they’re going to do). A common variation on the latter is that along with the event, you send along a callback address that you’d maybe like event consumers to call you back on.
APIs are straightforward and feel natural to programmers because we all grew up calling subroutines and functions. Sometimes that way of thinking works great on the network, as when I send you an HTTP request that includes everything you need to do something for me, and I wait for a response back saying what you did. But APIs have problems, the worst being that they constitute tight coupling; you and I have to stay in sync, and if sometimes I’d like to issue requests a little faster than you can handle them, well, too bad.
Eventing makes the coupling looser. Obviously, it leaves a natural place to insert buffering; if I get ahead of you, that’s OK, the messages can get buffered in transit, and eventually you’ll catch up when I slow down, and that’s just fine.
And that looser coupling leaves space to do lots of other useful things with the data in transit: Fan-out, logging/auditing, transformation, analytics, and filtering, to name a few. I think a high proportion of all integration tasks are a natural fit for event-driven code, as opposed to APIs. So, I care about making it easy.
Contracts and Schemas · APIs generally have them. In strongly-typed programming languages they are detailed and rigid, verified at compile-time to allow for fast, trusting execution at run-time. For RESTful APIs, we have things like Swagger/OpenAPI, and GraphQL offers another approach.
Schemas are nothing like a complete contract for an event-oriented system, but they’re better than nothing. I hear people who write this kind of software asking for “schemas”, and I think this is what they really want:
They’d like to have the messages auto-magically turned into objects or interfaces or structs or whatever the right idiom is for their programming language. And if that can’t be done, they’d like their attempt to fail deterministically with helpful diagnostic output.
For any given message type, they’d like to be able to generate samples, to support testing.
They’d like intelligent handling of versioning in event structures.
Historically, this has been hard. One reason is an idiom that I’ve often seen in real-word events: the “EventType” field. Typically, a stream of events contains many different types of thing, and they’re self-describing in that each contains a field saying what it is. So you can’t really parse it or make it useful to programmers without dispatching based on that type field. It’s worse than that: I know of several examples where you have an EventType enum at the top level, and then further type variations at deeper nesting levels, each with EventType equivalents.
In particular, since events tend to be JSON blobs, this has been a problem, because historically, JSON Schema has had really weak support for this kind of construct. You can dispatch based on
the presence of particular fields, and you can sort of fake type dispatching with the
oneOf keyword, but the
schema-ware gets baroquely complex and the error messages increasingly unhelpful.
Now, if you follow that link and read the description, you may find yourself a little puzzled. Instead, have a look at json-schema-spec issue #652, in which I raised the question about how to handle “EventType” fields and got an explanation of how their if-then-else idiom might do the job.
On JSON Schema · So, I’m glad that that project shows signs of life and is moving forward. And my thanks to the folk who offered smart, responsive answers to my questions.
I still have issues with the effort. Its spec comes in three parts: Core, Validation, and Hyper-Schema.
I think that Core could be replaced with a paragraph saying “here’s the media type, here’s how fragments work, and here’s how to use
$ref to link pieces of schema together.”
I think Validation has grown to be frighteningly large; just check the table of contents.
I have read the Hyper-Schema thing carefully, more than once, and I haven’t the faintest clue what it’s for or how you’d use it. The
authors of JSON Schema do not generally favor using examples as an explanatory device, which makes things tough for bits-on-the-wire
weak-on-abstractions people like me.
But hey, I’m profoundly grateful that people are wrestling with these hard problems, and I’m going to be digging into this whole space of how to make events easier for programmers.
It’s not an abstract problem · Consider CloudWatch Events Event Examples, which offers samples from twenty-seven different AWS services. The number of unique event types would take too long to count, but it’s big. This is a successful service, with a huge number of customers filtering an astonishing number of events per second. Developers use these for all sorts of things. I’m wondering how we might make it easier for them. Think you know? My mind is open, and we’re hiring.