Lock-in and Multi-Cloud

[This fragment is available in an audio version.]

Alone in a room with the customer’s CTO, I said “So, I spent this morning talking to your people, and it looks like you’re using low-level generic tech for everything and managing it yourself. You could save a shit-ton of money by using high-level managed services and get load-scaling for free.”
“Yeah,” he said, “but then we’d be locked in and the next time our enterprise contract comes up for renewal, you can screw us.”

It doesn’t matter who the customer was, except that they were big, and they thought they were paying too much, and I thought so too. It’s about the starkest example I’ve run across of lock-in fear, which inevitably raises the subject of multi-cloud, hot stuff these days.

The Trade-off · It’s really pretty simple. All the Cloud vendors optimize for the case where you’re running your IT end-to-end on their cloud; that way, everything fits together better. Also, it’s in harmony with the natural software-ecosystem trend towards platforms; vendors love having them and developers are comfortable building on them. So buying into a single cloud vendor just makes life easier.

But… lock-in. It’s not just a ghost story. Consider that the vast majority of the world’s large enterprises have lost control of their per-employee desktop-software spend. They pay whatever Microsoft tells them they’re gonna pay. A smaller but substantial number have also lost control of their database licensing and support spend. They pay whatever Oracle tells them they’re going to.

Microsoft and Oracle calibrate their pricing in a way that bears no relationship to its cost of production, but is carefully calculated to extract the maximum revenue without provoking the customer to consider alternatives. Also they design their platforms so that that migration to any alternative is complex and painful.

So I’d be willing to bet that many CIOs have heard, not just from their CEOs but from their Board, “We don’t want you coming back to us in three years and telling us you’ve lost control of your cloud spend.”

But the Public Clouds aren’t like that! · They really aren’t. In each case, the founders are still somewhat in the picture, bringing a vision to the table that extends beyond short-term profits. I have met these people and they genuinely obsess all the time about what can be done to make their customers successful. Anyhow, the business combines high margins with high growth, so you don’t have to squeeze customers very hard to make investors happy.

Here’s an example: Last year, AWS dramatically reduced data egress charges, under pressure from CloudFlare, Oracle, and of course their customers.

But that’s today. Let’s think ahead to 2030 and zero in on on a hypothetical future version of AWS. First, there was that unfortunate accident involving one of Elon’s satellites and the Blue Origin rocket Jeff was on. And Andy, a multibillionaire via that Amazon-CEO package, bought out the rest of the Kraken owners, then put in a surprise grab and owns the Seahawks too. He’s discovered that being a sports mogul beats the hell out of testifying on live TV to Pramila Jayapal.

In this scenario, activist investors and big-time PE players have won a majority on the Amazon board and they don’t want any of their business units missing any chances to turn screws that maximize revenues. Data egress charges ratchet up every quarter , as do high-level proprietary services like Kinesis and Lambda.

A whole lot of customers have lost control of their cloud-computing spend; they pay whatever their AWS account manager tells them they’re gonna pay.

Benefits of going all in · So the fear is real, and reasonably founded. Which is sad, because if you go all-in, there are pretty big pay-offs. If you want to build a modern high-performance global-scale application, a combination of Lambda, Fargate, DynamoDB, EventBridge, and S3 is a really attractive foundation. It’ll probably scale up smoothly for almost any imaginable load, and when the traffic falls, so do your bills. And it’ll have geographical redundancy via AWS Availability Zones and be more robust than anything you could build yourself.

You can decide you’re just not up for all these proprietary APIs. But then you’re going to need to hire more people, you won’t be able to deliver solutions as fast, and you’ll (probably) end up spending more money.

The solution spaces · So, practically speaking, what can organizations do? There are three options, basically.

Plan A: All-in · It’s a perfectly reasonable business strategy to say “lock-in is a future hypothetical, right now I need richer apps and I need them faster, so let’s just go all-in with GCP or Azure or AWS.” Lots of big, rich, presumably smart organizations have done this.

It works great! You’re in the sweet spot for all the platform tooling, and in particular you can approach nearer the ideal, as articulated by Werner Vogels in one of his re:Invent keynotes: “All the code you write should be value-adding business logic.” You will genuinely deliver more software with high business value than with any other option.

(Your cloud account manager will love you, you’ll get great discounts off list price, and you’ll probably get invited on stage at a cool annual conference, in front of thousands.)

Plan B: Bare metal · This is the posture adopted by that company whose CTO gave me a hard time up in the first paragraph. They rented computers and storage and did everything else themselves. They ran their own MySQL, their own Kafka, their own Cassandra, their own everything, using a number of virts that I frankly couldn’t believe when they first told me.

They had big engineering and SRE teams to accomplish this, and had invented all sorts of cool CI/CD stuff to make it bearable.

In principle, they could pick up their whole deployment lock, stock, and barrel, move it over to GCP, and everything ought to work about the same. Which meant a lot to them.

They had a multi-year enterprise agreement which I can only assume included fabulous discounts.

Plan C: Managed OSS · Here’s what I think is an interesting middle-of-the-road path. Decide that you’re going to go ahead and use managed services from your Cloud provider. But only ones that are based on popular open-source projects. So, use Google Cloud SQL (for MySQL) or Amazon Managed Streaming for Kafka or Azure Kubernetes Service. Don’t use GCP BigTable or Amazon Kinesis or Azure Durable Functions.

This creates a landscape where switching cloud providers would be nontrivial but thinkable. At least you could stay with the same data-plane APIs. And if you use Terraform or Pulumi or some such, you might be able to make a whole lot of your CI/CD somewhat portable as well.

And in the meantime, you get all the managing and monitoring and security and fault-tolerance that’s built into the Public-Cloud infrastructure, and which is getting pretty damn slick these days.

But, data gravity! · I heard this more than once, usually from someone who looked old and wise: “The real lock-in ain’t the technology, it’s the data. Once you get enough petabytes in place at one cloud provider, you’re not gonna be moving” they’d say.

I’m not so sure. I worked a bit on the Snowmobile project, so I’ve had exposure to the issues involving movement of big, big data.

First of all, the Internet is faster than it used to be and it’s getting faster every year. The amount of data that it’s practical to move is increasing all the time.

But more important, I wonder how much data can you actually use at a time? A high proportion of those petabyte-scale loads is just logfiles or other historical data, and its business use is for analytics and BI and so on. How much of it, I wonder, is locked away in Glacier or equivalent, for compliance or because you might need it someday? People don’t realize how freaking huge a petabyte is. [Here’s a fairly hilarious piece, Half a Billion Bibles, from this blog eighteen years ago, when I first ran across the notion of petabytes of data.]

So if you were going to move clouds, I really wonder how much actual live data you’d need to bring across for your apps to get on the air. Yes, I acknowledge that there are scientific and military-intelligence and suchlike apps that really do need to pound the petabytes all day every day, but my guess is the proportion is small.

So, data gravity might keep you from moving your analytics. But would it really keep you from moving your production work?

Of course, if you did that, you’d be doing…

Multi-cloud! · Boy, is that a fashionable buzzword, type it into your nearest search engine and look at the frantic pitching on the first page.

First, let’s establish a fact of life: More or less every large enterprise, public or private sector, is already multi-cloud or soon will be. Why? If for no other reason, M&A.

If you survey the application inventory of almost any big household-name enterprise, you’re gonna find a devil’s-brew of mainframes, Windows, Linux, COBOL, C, Java, Python, relational, key/value, and probably eleven different messaging systems. Did they plan it that way? What a dumb question, of course not. It’s the way businesses grow. Enterprises will no more be able to arrange that all their divisions are on the same cloud provider than they can force everybody onto Java 11 or Oracle 19, or get rid of COBOL.

So whatever you think of the lock-in issue, don’t kid yourself you can avoid multi-cloud.

War story · There’s this company I’ve been working with that’s on GCP but at one point was on AWS. They have integrations with a bunch of third-party partners, and those are still running on AWS Lambda. So they have these latency-sensitive retail-customer-facing services that routinely call out from a Kubernetes pod on GCP over to a Lambda function.

When I found out about this I was kind of surprised and asked “But does that work OK?” Yep, they said, solid as a rock and really low latency. I did a little poking around and a public-cloud networking-group insider laughed at me and said “Yeah, that happens all the time, we talk to those guys and make sure the transfer points are super optimized.”

Sounds like an example from the future world of…

Multi-cloud applications · Which, I admit, even given the war story above, I’m not really crazy about. Yeah, multi-cloud is probably in your future. But, in a fairly deep way, the public clouds are remarkably different from each other; the same word can mean entirely different things from one platform to the next.

Which means that people who genuinely have strong skills on more than one public-cloud platform are are thin on the ground. I’m not sure I know any. So here’s maybe the most important issue that nobody talks about.

People costs · Everybody I know in the tech business is screaming for more talent, and every manager is working the hell out of their personal networks, because their success — everyone’s success — is gated on the ability to hire.

For a variety of reasons, I don’t think this is going to change in the near or medium term. After that I’m dead so who cares.

So if I’m in technology leadership, one of my key priorities, maybe the most important, is something along the lines of “How do I succeed in the face of the talent crisis?”

I’ll tell you one way: Go all in on a public-cloud platform and use the highest-level serverless tools as much as possible. You’ll never get rid of all the operational workload. But every time you reduce the labor around instance counts and pod sizes and table space and file descriptors and patch levels, you’ve just increased the proportion of your hard-won recruiting wins that go into delivery of business-critical customer-visible features.

Complicating this is the issue I just mentioned: Cross-cloud technical expertise is rare. In my experience, a lot of strategic choices, particularly in start-ups, are made on the basis of what technologies your current staff already know. And I think there’s nothing wrong with that. But it’s a problem if you really want to do multi-cloud.

What would I do? · If I were a startup: I wouldn’t think twice. I’d go all-in on whatever public cloud my CTO wanted. I’m short of time and short of money and short of people and all I really care about is delivering customer-visible value starting yesterday. I just totally don’t have the bandwidth to think about lock-in; that’s a battle I’ll be happy to fight five years from now once I’ve got a million customers.

If I were a mainstream non-technical enterprise: My prejudice would be to go with Plan C, Managed OSS. Because that open-source software is good (sometimes the public cloud proprietary stuff is better, but usually not). Because in my experience AWS does a really good job of managing open-source services so I assume their competitors do too. Because these people care about velocity and hiring too, but remember those Board members saying not to get locked in.

So, who does that leave that should pick Plan B? I’m not sure, to be honest. Maybe academic researchers? Intelligence agencies? Game studios? I suppose that for an organization that routinely has to code down to the metal anyhow, the higher-level services have fewer benefits, and if you’ve got grad students to keep your systems running the “managed service” aspect is less critical.

I’d hate to be in that situation, though.

Then there’s politics · We’re heading into an era where Big Tech is politically unpopular (as is Big Business in general) and there’s a whole lot of emerging antitrust energy in the USA and Europe. Meanwhile, the Chinese autocrats are enjoying beating up on tech companies who see themselves as anything other than vehicles for building Party support.

What’s going to happen? I don’t know. But if you’re a really big important organization and you feel like lock-in is hurting you, I suggest you call your nearest legislator. I suspect they’ll be happy to take your call. And make sure to tell your public-cloud vendor you’re doing it.

Contributions

Comment feed for ongoing:

From: Adrian Brennan (Feb 01 2022, at 10:24)

More than once I've been prevented from moving fast with managed services by managers terrified of lock-in. This article by Wisen Tanasa (https://martinfowler.com/bliki/LockInCost.html) was useful on those occasions.

If you're inclined toward 'Plan A' and encounter resistance, chat with your CTO about 'opportunity gain'.

[link]

From: anon (Feb 01 2022, at 14:57)

thanks for your commentary. i find the three categories meaningful and your advice is helpful.

this statement:

> All the code you write should be value-adding business logic.” You will genuinely deliver more software with high business value than with any other option

is true but doesn't account for the reality that any service or abstraction layer will come with its own costs. managed dbs that charge per row scanned are one example among many where devs will need to understand what indexes are and why they are important.

scaling, distribution and multi-service packages are definitely real advantages to the cloud, but i don't think the cloud will or can ever deliver a really meaningful abstraction improvements. at best the public cloud enables certain difficult and almost always non business related things, like distribution or authentication or any number of managed cloud services.

these things are often almost necessary and almost impossible for small/medium businesses to do themselves. everything else (the actual business logic) is typically not that challenging and in my experience, actually more difficult in abstracted cloud services.

cheers and thanks again for your thoughts.

[link]

From: Frank Wilhoit (Feb 01 2022, at 16:38)

There is another source of pressure for lock-in, and that is the auditors. They say, "We only know how to audit Oracle and DB2. We only know how to audit Microsoft on the desktop. We are not going to learn anything else, so you'd better not *have* anything else, or we won't sign."

I know of one large firm in an extremely tightly regulated industry that had a suite of homebrew software for critical functions, which had been running perfectly for decades. The auditors came in and ordered them to scrap all that and adopt a vendor package that was not fit for purpose.

[link]

From: Patrick Gibson (Feb 02 2022, at 12:59)

I think there’s a take-away here for developers in particular, and that is you can really increase your value by becoming knowledgeable and aware of these cloud technologies. I still come across many who focus only at the application level without really any consideration given to how this is going to run, how it will scale, and how much it will cost. Even if you’re not dealing with and managing this stuff directly (i.e. “ops”) giving consideration to all of this is very valuable to an organization.

[link]

From: Toby White (Feb 09 2022, at 23:18)

But what if I am neither a "startup", nor a "mainstream non-technical enterprise".

So an enterprise-scale, technical organization - not focussed on hypergrowth / moving-fast-and-breaking-things, and where operational investment and unit-cost optimization is a real concern?

I don't have any silver bullet, but it's not at all clear to me that the "right" answer is lots of managed services. With any moderately-complex set of applications and services you don't get to "outsource operational concerns", at least as far as I can see - you just get to exchange one set of complexity (option B - managing services on bare metal but with strong visibility on your stack) for another (option A - orchestrating between lots of managed microservices with less-predictable and mostly-opaque performance characteristics).

Particularly since option A gives the illusion of operational simplicity and encourages teams to adopt a much larger set of technical dependencies.

Maybe the biggest factor is talent availability - over the last 10 years it's become really hard to hire people who are able to deal with the complexities of option B, even though (in my opinion at least) it's not inherently any more complex.

[link]

From: Nathan Murray (Feb 10 2022, at 04:42)

Great post and resonates with many thoughts I’ve had recently. I’d like to add that one potential future that discourages lock-in would be second tier cloud vendors copying APIs to make similar services. We are already seeing this with Cloudflair’s S3 clone, and I’m sure Oracle will want to recoup their “investment” proving the legality of cloning APIs.

In your example, what’s to stop GCP from creating a Lambda clone so that they get the spend?

I admit it is harder to see Microsoft engaging this way, and same with Google. But for cloud startups wanting to gain market share and legacy vendors fighting for market share they’ve given up, it seems like a preferable strategy.

[link]

From: David Gonzalez (Feb 10 2022, at 07:43)

OP forgot the magic bullet we are currently cooking up in TECH. It's called WASM. WASM will break all lock-in chains!!! Cloud Platforms will be begging us to stay with them because the switch will be one click away. Ok.. ok.. I admit it is ultra simplified, but if we don't mess up the tech behind WASM... IT CAN HAPPEN!!!!

[link]

From: David Schryer (Feb 11 2022, at 00:17)

You have not considered a hybrid Plan A and Plan B where the well defined baseline loads of the main services run in parallel on both bare metal and in "Your Favorite Cloud Provider" This hybrid deployment model allows for the rapid scalability of the cloud and bare metal transparent costing. Admittedly, this would only make sense for a large corporation to pursue, but I have witnessed some good results in practice.

[link]

From: Martin (Feb 16 2022, at 10:48)

> People don’t realize how freaking huge a petabyte is.

I mean yes, but on the other hand it's 55x 18TB hard drives (without redundancy.)

That's 38kg or 82lb of hard drives. Many people could lift it in a single cardboard box. You could transport it in a bicycle trailer or a small car.

On a GCP transfer from S3 or vice versa, it's probably less than an hour to copy... (It would take longer to fill the hard drives if you want a physical copy: maybe a day or more?)

It seems like it's not really the dominant factor?

[link]

ongoing

What this is ·

Truth · Biz · Tech

author · Dad
colophon · rights

January 30, 2022
· Business (125 more)
· Technology (90 fragments)
· · Cloud (26 more)

By Tim Bray.

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!