Your event infrastructure might be a service in the cloud or might be an actual computer (or cluster) you connect to. Both choices are perfectly sensible. The trade-offs? It’s complicated.
[This is part of the Event Facets series.]
At AWS where I work, our mainstream home-grown services (Kinesis, SQS, SNS) are all serverless. I mean, there are servers, lots of ’em, but you can’t see ’em, there’s just an endpoint that you connect to, usually with HTTP, to produce and consume events. Which should be unsurprising, AWS is built on the proposition that everything should be a service. Here are two:
But there’s another approach, usually referred to as a “message broker”. Brokers are software packages that push around events or messages, and they typically need to be installed and run on servers that you own or rent. Then your app connects to them to send and receive. Here are two:
The trade-offs, well, they’re complicated.
Serverless? · These services have one huge advantage in that you don’t have to think about scaling. You can pretty well fire as much traffic at them as you want at them, and they’ll find a way to soak it up. Load forecasting sucks and the penalties for being wrong are severe, so it’s best just to not do it. Also, with serverless you don’t have to own and monitor and patch and maintain the servers, which is nice.
Brokers? · First some background: When you work with a broker, you often don’t use HTTP. You nail up a TCP connection and just pump bytes back and forth. For this to work, you obviously need some sort of session and message framing protocol and boy, are there ever a lot of them, for example MQTT, STOMP, OpenWire, and AMQP (watch out for AMQP, it comes in different, incompatible, versions).
The fact that you’re using permanent connections mean that you should be able to expect lower latency, because you don’t have the well-known HTTP overheads in setting up and tearing down connections. On the other hand, it means your software has to deal with the situation when your nailed-up connection breaks, which 100% of network connections eventually do. (Once again, your library may take care of this for you.)
Except for… · Let’s start with that scaling advantage that services like SQS and SNS offer. It’s real. But… maybe you don’t care. If you load up one of the MQs on a big honkin’ EC2 instance with lots of CPU and memory and threads, you can push an astonishing number of messages/second through it. Like, maybe ten times the number you’ll ever need to send for the foreseeable future.
On the other hand, how about that latency advantage that brokers give you, because of the nailed-up connection? It’s real. But… on the HTTP side, we have HTTP long polling, which can reduce receive latency a lot. And of course an increasing share of HTTP is now HTTP/2, which multiplexes multiple requests on a single long-lived connection.
And that’s not all. The next step after HTTP/2 is QUIC, likely to be rebranded as HTTP/3. It provides HTTP request semantics using UDP, so there are no permanent connections at all, and in principle amazingly low latencies should be possible.
Reliability? · On the serverless side, the story is excellent. SNS and SQS use all the usual AWS availability-zone tricks to ensure that even if hosts crash (which by the way they do all the time), data keeps flowing.
Brokers have a decent story too, if not quite as rock-solid. ActiveMQ makes it easy to set up broker pairs backed by a single filesystem-based message store, where the backup takes over reasonably quickly on host failure. (Unless of course you’ve built up a huge number of un-received messages, in which case restarts can get very sketchy.) RabbitMQ runs in clusters, often of size three, and when you lose one you can add a new one back in and it’ll pick up state from the others. Once again, if you’ve got a huge backlog, you may experience pain.
In conclusion… · It’s complicated. But you already knew that.