I’m pret­ty sure that event-driven soft­ware is al­ready a big deal and is go­ing to get big­ger. Events, de fac­to, are JSON blob­s, and in gen­er­al we’d like to make them eas­i­er to con­sume in com­put­er pro­gram­s. I’ve writ­ten be­fore about how it’s dif­fi­cult to spec­i­fy JSON di­alects, and al­so about Schema­less mes­sage pro­cess­ing. It turns out there’s good news from the world of JSON Schema, but the prob­lem is far from solved.

“Event-driven”? · It’s not ex­act­ly a new idea; I first heard it back when I used to pro­gram GUIs where of course ev­ery­thing is an even­t, and your code is all about han­dling them in call­back­s. But that’s not what I’m talk­ing about here. From an AWS-centric point of view, I’m talk­ing about the events that trig­ger Lamb­da func­tions, or get matched and rout­ed by CloudWatch Events, or go through the SNS pub­/­sub ma­chin­ery.

As far as I know, there are re­al­ly on­ly two ways to con­nect soft­ware to­geth­er: API calls (I send you a re­quest and wait for your re­spon­se) and events (I fire off a mes­sage and who­ev­er gets it does what­ev­er they’re go­ing to do). A com­mon vari­a­tion on the lat­ter is that along with the even­t, you send along a call­back ad­dress that you’d maybe like event con­sumers to call you back on.

APIs are straight­for­ward and feel nat­u­ral to pro­gram­mers be­cause we all grew up call­ing sub­rou­tines and func­tion­s. Some­times that way of think­ing works great on the net­work, as when I send you an HTTP re­quest that in­cludes ev­ery­thing you need to do some­thing for me, and I wait for a re­sponse back say­ing what you did. But APIs have prob­lem­s, the worst be­ing that they con­sti­tute tight cou­pling; you and I have to stay in sync, and if some­times I’d like to is­sue re­quests a lit­tle faster than you can han­dle them, well, too bad.

Event­ing makes the cou­pling looser. Ob­vi­ous­ly, it leaves a nat­u­ral place to in­sert buffer­ing; if I get ahead of you, that’s OK, the mes­sages can get buffered in tran­sit, and even­tu­al­ly you’ll catch up when I slow down, and that’s just fine.

And that loos­er cou­pling leaves space to do lots of oth­er use­ful things with the da­ta in tran­sit: Fan-out, log­ging/au­dit­ing, trans­for­ma­tion, an­a­lyt­ic­s, and fil­ter­ing, to name a few. I think a high pro­por­tion of all in­te­gra­tion tasks are a nat­u­ral fit for event-driven code, as op­posed to APIs. So, I care about mak­ing it easy.

Con­tracts and Schemas · APIs gen­er­al­ly have them. In strongly-typed pro­gram­ming lan­guages they are de­tailed and rigid, ver­i­fied at compile-time to al­low for fast, trust­ing ex­e­cu­tion at run-time. For REST­ful APIs, we have things like Swag­ger/OpenAPI, and GraphQL of­fers an­oth­er ap­proach.

Schemas are noth­ing like a com­plete con­tract for an event-oriented sys­tem, but they’re bet­ter than noth­ing. I hear peo­ple who write this kind of soft­ware ask­ing for “schemas”, and I think this is what they re­al­ly wan­t:

  1. They’d like to have the mes­sages auto-magically turned in­to ob­jects or in­ter­faces or structs or what­ev­er the right id­iom is for their pro­gram­ming lan­guage. And if that can’t be done, they’d like their at­tempt to fail de­ter­min­is­ti­cal­ly with help­ful di­ag­nos­tic out­put.

  2. For any giv­en mes­sage type, they’d like to be able to gen­er­ate sam­ples, to sup­port test­ing.

  3. They’d like in­tel­li­gent han­dling of ver­sion­ing in event struc­tures.

His­tor­i­cal­ly, this has been hard. One rea­son is an id­iom that I’ve of­ten seen in real-word events: the “EventType” field. Typ­i­cal­ly, a stream of events con­tains many dif­fer­ent types of thing, and they’re self-describing in that each con­tains a field say­ing what it is. So you can’t re­al­ly parse it or make it use­ful to pro­gram­mers with­out dis­patch­ing based on that type field. It’s worse than that: I know of sev­er­al ex­am­ples where you have an Even­tType enum at the top lev­el, and then fur­ther type vari­a­tions at deep­er nest­ing lev­el­s, each with Even­tType equiv­a­lents.

In par­tic­u­lar, since events tend to be JSON blob­s, this has been a prob­lem, be­cause his­tor­i­cal­ly, JSON Schema has had re­al­ly weak sup­port for this kind of con­struc­t. You can dis­patch based on the pres­ence of par­tic­u­lar field­s, and you can sort of fake type dis­patch­ing with the oneOf key­word, but the schema-ware gets baro­que­ly com­plex and the er­ror mes­sages in­creas­ing­ly un­help­ful.

But, there’s good news. Ap­par­ent­ly the JSON Schema project is very much alive, and in the cur­rent draft (-07 as I write this) there’s an if-then-else con­struc­t.

Now, if you fol­low that link and read the de­scrip­tion, you may find your­self a lit­tle puz­zled. In­stead, have a look at json-schema-spec is­sue #652, in which I raised the ques­tion about how to han­dle “EventType” fields and got an ex­pla­na­tion of how their if-then-else id­iom might do the job.

On JSON Schema · So, I’m glad that that project shows signs of life and is mov­ing for­ward. And my thanks to the folk who of­fered smart, re­spon­sive an­swers to my ques­tion­s.

I still have is­sues with the ef­fort. Its spec comes in three part­s: Core, Val­i­da­tion, and Hyper-Schema. I think that Core could be re­placed with a para­graph say­ing “here’s the me­dia type, here’s how frag­ments work, and here’s how to use $ref to link pieces of schema together.” I think Val­i­da­tion has grown to be fright­en­ing­ly large; just check the ta­ble of con­tents. I have read the Hyper-Schema thing care­ful­ly, more than on­ce, and I haven’t the faintest clue what it’s for or how you’d use it. The au­thors of JSON Schema do not gen­er­al­ly fa­vor us­ing ex­am­ples as an ex­plana­to­ry de­vice, which makes things tough for bits-on-the-wire weak-on-abstractions peo­ple like me.

But hey, I’m pro­found­ly grate­ful that peo­ple are wrestling with these hard prob­lem­s, and I’m go­ing to be dig­ging in­to this whole space of how to make events eas­i­er for pro­gram­mer­s.

It’s not an ab­stract prob­lem · Con­sid­er CloudWatch Events Event Ex­am­ples, which of­fers sam­ples from twenty-seven dif­fer­ent AWS ser­vices. The num­ber of unique event types would take too long to coun­t, but it’s big. This is a suc­cess­ful ser­vice, with a huge num­ber of cus­tomers fil­ter­ing an as­ton­ish­ing num­ber of events per sec­ond. Devel­op­ers use these for all sorts of things. I’m won­der­ing how we might make it eas­i­er for them. Think you know? My mind is open, and we’re hir­ing.


Comment feed for ongoing:Comments feed

From: len (Oct 03 2018, at 12:37)

Any thoughts on TimBL's new/old project?


From: Henry (Oct 08 2018, at 21:10)

Hi Tim - great article, and great feedback on JSON Schema. I am one of the current primary authors/editors (starting from the end of 2016) and can offer some thoughts on the 3-document form and forthcoming changes. Perhaps you have some feedback on that :-)

In the next draft, Core does what you say, but also lays the foundations of JSON Schema as a modular, extensible, multi-vocabulary system, with more formality around why keywords behave in a few different ways. This will make it easier to standardize Core and Validation deferring many requests to extension vocabularies.

The keywords that take subschemas as values move into Core, as they are more foundational- Core then defines the media type and essentially bootstraps vocabularies.

This shrinks Validation down to the actual validation assertions, plus the random meta-data keyword section (still not sure where to put that).

Hyper-Schema remains a work in progress, although it's picked up much more interest with draft-07, and is the only spec that has lots of examples now. Our big problem with examples is that we need someone with time who is good at writing them- I am notoriously horrible at it.

Anyway, thanks for the thoughful words, glad draft-07 is helping you!


author · Dad · software · colophon · rights
picture of the day
September 22, 2018
· Technology (84 fragments)
· · Software (65 more)

By .

I am an employee
of Amazon.com, but
the opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.