If you want to process events, you can fetch them from the infrastructure or you can have the infrastructure hand them to you. Neither idea is crazy.
[This is part of the Event Facets series.]
When you make a request to fetch data, that’s called “pull”. Which is an off-by-one error; the “u” should be an “o” as in “poll”, because that’s how most network stuff works. I’ve heard old farts of my own vintage claim that on the network, polling is really all there is if you dig in deep enough. But we do use the term “push”, in which — maybe it’s just an illusion — the infrastructure reaches out and hands the event to you.
For example · I’ll use two AWS services that I work nearby.
First, SQS, which is pure pull. You make one call to receive a batch of messages, and then another to delete them, once processed. If they’re being pushed in the front end of the service faster than you’re taking them out, they build up inside, which is usually OK.
In SNS, by contrast, events are dropped onto a “Topic” and then you subscribe things to the topic. You can subscribe SQS queues, Lambda functions, mobile notifications, and arbitrary HTTP endpoints. When someone drops messages onto the topic, SNS “pushes” them to all the subscribed parties.
On webhooks · “Webhook” is just another word for push. I remember when I first heard the idea in the early days of the Web, I thought it was complete magic: “I’ll just send you something, and the something will include the address where you HTTP POST back to me.” Asynchronous, loosely-coupled, what’s not to like?
There are a lot of webhooks out there. We talk to the EventBridge Partners who are injecting their customers’ events into AWS, and one thing we talk about is good ways to get events from customers back to them. The first answer is usually “we do Webhooks”, which means push.
On being pushy · On the face of it, push delivery sounds seductive. Particularly when you can back your webhook with something like a Lambda function. And sometimes it works great. But there are two big problems.
The first is scaling. If you’re putting up an endpoint and inviting whoever to push events to it, you’re implicitly making a commitment to have it available, secured, and ready to accept the traffic. It’s really easy to get a nasty surprise. Particularly if you’re building a public-facing cloud app and it gets popular. I can guarantee that we have plenty of services here in AWS that can accidentally crush your webhook like a bug with a traffic surge.
Then there’s administration. You want data pushed to you, but you want it done securely. That means whoever’s doing the pushing has to have credentials, and you have to trust them to manage them, and you have to arrange for rotation and invalidation as appropriate, and you really don’t want to be handling support calls of the form “I didn’t change anything and now my pushes are failing with HTTP 403!”
On pulling · Polling, on the face of it, is a little more work. If you care about latency, you’re going to have to have a host post a long-poll against the eventing infrastructure, which means you’re going to have to own and manage a host. (If you’re not latency-sensitive, just arrange to fire a Lambda every minute or so to pick up events, way easier.)
On the other hand, you need never have a scaling problem, because you control your own polling rate, so you can arrange to only pick up work that you have capacity to handle.
On the admin side, everybody already owns the problem of managing their cloud service provider credentials so presumably that’s not extra work.
I’m not saying push is never a good idea; just that grizzled old distributed-systems geeks like me have a sort of warm comfy feeling about polling, especially for the core bread-and-butter workloads.
A small case study · Ssshhh! I’m going to tell an AWS-internals secret.
I talk a lot with the SNS team. They have a front-end fleet, a metadata fleet, and then the fleet that actually does the deliveries to the subscriptions. It turns out that the software that delivers to HTTP-endpoint subscriptions is big, and maybe the most complex part of the whole service.
When I learned that I asked one of the engineers why and she told me “Because those are other people’s endpoints. They make a configuration mistake or forget to rotate credentials or send a flood of traffic that’s like eight times what they can handle. And we have to keep running in all those cases.”