JSON Event Scheming

I’m pretty sure that event-driven software is already a big deal and is going to get bigger. Events, de facto, are JSON blobs, and in general we’d like to make them easier to consume in computer programs. I’ve written before about how it’s difficult to specify JSON dialects, and also about Schemaless message processing. It turns out there’s good news from the world of JSON Schema, but the problem is far from solved.

“Event-driven”? · It’s not exactly a new idea; I first heard it back when I used to program GUIs where of course everything is an event, and your code is all about handling them in callbacks. But that’s not what I’m talking about here. From an AWS-centric point of view, I’m talking about the events that trigger Lambda functions, or get matched and routed by CloudWatch Events, or go through the SNS pub/sub machinery. ¶

As far as I know, there are really only two ways to connect software together: API calls (I send you a request and wait for your response) and events (I fire off a message and whoever gets it does whatever they’re going to do). A common variation on the latter is that along with the event, you send along a callback address that you’d maybe like event consumers to call you back on.

APIs are straightforward and feel natural to programmers because we all grew up calling subroutines and functions. Sometimes that way of thinking works great on the network, as when I send you an HTTP request that includes everything you need to do something for me, and I wait for a response back saying what you did. But APIs have problems, the worst being that they constitute tight coupling; you and I have to stay in sync, and if sometimes I’d like to issue requests a little faster than you can handle them, well, too bad.

Eventing makes the coupling looser. Obviously, it leaves a natural place to insert buffering; if I get ahead of you, that’s OK, the messages can get buffered in transit, and eventually you’ll catch up when I slow down, and that’s just fine.

And that looser coupling leaves space to do lots of other useful things with the data in transit: Fan-out, logging/auditing, transformation, analytics, and filtering, to name a few. I think a high proportion of all integration tasks are a natural fit for event-driven code, as opposed to APIs. So, I care about making it easy.

Contracts and Schemas · APIs generally have them. In strongly-typed programming languages they are detailed and rigid, verified at compile-time to allow for fast, trusting execution at run-time. For RESTful APIs, we have things like Swagger/OpenAPI, and GraphQL offers another approach. ¶

Schemas are nothing like a complete contract for an event-oriented system, but they’re better than nothing. I hear people who write this kind of software asking for “schemas”, and I think this is what they really want:

They’d like to have the messages auto-magically turned into objects or interfaces or structs or whatever the right idiom is for their programming language. And if that can’t be done, they’d like their attempt to fail deterministically with helpful diagnostic output.
For any given message type, they’d like to be able to generate samples, to support testing.
They’d like intelligent handling of versioning in event structures.

Historically, this has been hard. One reason is an idiom that I’ve often seen in real-word events: the “EventType” field. Typically, a stream of events contains many different types of thing, and they’re self-describing in that each contains a field saying what it is. So you can’t really parse it or make it useful to programmers without dispatching based on that type field. It’s worse than that: I know of several examples where you have an EventType enum at the top level, and then further type variations at deeper nesting levels, each with EventType equivalents.

In particular, since events tend to be JSON blobs, this has been a problem, because historically, JSON Schema has had really weak support for this kind of construct. You can dispatch based on the presence of particular fields, and you can sort of fake type dispatching with the oneOf keyword, but the schema-ware gets baroquely complex and the error messages increasingly unhelpful.

But, there’s good news. Apparently the JSON Schema project is very much alive, and in the current draft (-07 as I write this) there’s an if-then-else construct.

Now, if you follow that link and read the description, you may find yourself a little puzzled. Instead, have a look at json-schema-spec issue #652, in which I raised the question about how to handle “EventType” fields and got an explanation of how their if-then-else idiom might do the job.

On JSON Schema · So, I’m glad that that project shows signs of life and is moving forward. And my thanks to the folk who offered smart, responsive answers to my questions. ¶

I still have issues with the effort. Its spec comes in three parts: Core, Validation, and Hyper-Schema. I think that Core could be replaced with a paragraph saying “here’s the media type, here’s how fragments work, and here’s how to use $ref to link pieces of schema together.” I think Validation has grown to be frighteningly large; just check the table of contents. I have read the Hyper-Schema thing carefully, more than once, and I haven’t the faintest clue what it’s for or how you’d use it. The authors of JSON Schema do not generally favor using examples as an explanatory device, which makes things tough for bits-on-the-wire weak-on-abstractions people like me.

But hey, I’m profoundly grateful that people are wrestling with these hard problems, and I’m going to be digging into this whole space of how to make events easier for programmers.

It’s not an abstract problem · Consider CloudWatch Events Event Examples, which offers samples from twenty-seven different AWS services. The number of unique event types would take too long to count, but it’s big. This is a successful service, with a huge number of customers filtering an astonishing number of events per second. Developers use these for all sorts of things. I’m wondering how we might make it easier for them. Think you know? My mind is open, and we’re hiring. ¶

Contributions

Comment feed for ongoing:

From: len (Oct 03 2018, at 12:37)

Any thoughts on TimBL's new/old project?

[link]

From: Henry (Oct 08 2018, at 21:10)

Hi Tim - great article, and great feedback on JSON Schema. I am one of the current primary authors/editors (starting from the end of 2016) and can offer some thoughts on the 3-document form and forthcoming changes. Perhaps you have some feedback on that :-)

In the next draft, Core does what you say, but also lays the foundations of JSON Schema as a modular, extensible, multi-vocabulary system, with more formality around why keywords behave in a few different ways. This will make it easier to standardize Core and Validation deferring many requests to extension vocabularies.

The keywords that take subschemas as values move into Core, as they are more foundational- Core then defines the media type and essentially bootstraps vocabularies.

This shrinks Validation down to the actual validation assertions, plus the random meta-data keyword section (still not sure where to put that).

Hyper-Schema remains a work in progress, although it's picked up much more interest with draft-07, and is the only spec that has lots of examples now. Our big problem with examples is that we need someone with time who is good at writing them- I am notoriously horrible at it.

Anyway, thanks for the thoughful words, glad draft-07 is helping you!

[link]

ongoing

What this is ·

Truth · Biz · Tech

author · Dad
colophon · rights

September 22, 2018
· Technology (90 fragments)
· · Software (86 more)

By Tim Bray.

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!