So, I helped build Amazon CloudWatch Events (blog, AWS console), which just launched. Been a while since my last extended spell of being an actual software engineer. Shipping feels good.

What it does · The cloud’s asynchronous; changes happen when they happen. Maybe you called an API a minute ago, maybe a database failed over, maybe your app saw a traffic surge. Assuming you want to know when they happen, the traditional approach is POLL LIKE HELL. Oops, I believe the polite usage is “repeatedly call Describe APIs”.

The idea here is for our services to broadcast “Events” (OK, they’re really little JSON blobs) and for you to write “Rules” that match events using “Patterns” (OK, they’re really little JSON blobs) and route ’em to “Targets”, which are often Lambda functions but can also be various kinds of queues and streams and so on.

So for example you catch an EC2 instance’s transition to “running” state and run a Lambda to tag it, or fix up its DNS. Or you catch a certain class of API calls and route them to a DevOps Slack channel. Or… well, a bunch of other things, both obvious and strange.

Which is to say: Less polling, less latency, more automation. We’ve been running a beta for a while and the people who’ve seen Events have mostly said “Yeah, we can use that” and then they get on the air more or less right away. So I’m optimistic.

Is it interesting? · Being event-driven isn’t a novelty, nor are message buses. But from the biz point of view, cloud automation is pretty hot these days. And building anything at AWS scale gets fun fast.

This service has a part that soaks up the events from all the other AWS services. Then there’s the part that figures out which events match the rules. And of course, there’s a rules database. Finally, matching events have to be sent off to the appropriate targets — Lambdas or Topics or whatever.

Of those, only the Rules database is sort of obvious. The rest would be too, except they have to operate at AWS scale, which means there are remarkable numbers of Events and Rules and Targets in flight. And of course the Cloud infrastructure we build this on is fallible, so all the designs have to assume that Shit Happens.

When there’s a change in your AWS deployment and you’ve posted a rule to route it to a Lambda or whatever, it gets there pretty damn fast. That makes me happy.

What I did · The thing wasn’t my idea; that came from Jesse Dougherty, the guy who hired me into AWS Vancouver. His pitching me on the project was one of the reasons joining up seemed like a good idea. I didn’t touch the database, nor the machinery that delivers matched events. But I did leave fingerprints on those JSON blobs, and on the event ingestion and matching software. I think this bit exhibits my simplest-thing-that-could-possibly-work engineering aesthetic.

But to be honest, I probably added more value working on six-pagers and convincing people the whole thing was worth doing.

Where’s the magic? · The databasing and streaming and syncing infrastructure we build on is pretty slick, but that’s not the secret. The management tools are nifty, too; but that’s not it either. It’s the tribal knowledge: How to build Cloud infrastructure that works in a fallible, messy, unstable world.


Comment feed for ongoing:Comments feed

From: Marcel Lanz (Jan 14 2016, at 14:29)

this reminds me of a "business event monitoring" where we did collecting, matching and visually display the system state using time series. it gave an operator a view of the events within a collection of business systems and enabled them to correlate possible dependet events. I could stare the whole day on these timeseries monitor.

what we did:

friday, exceptions and user behavior:



author · Dad
colophon · rights
picture of the day
January 11, 2016
· Technology (90 fragments)
· · Cloud (24 more)

By .

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!