Sup­pose we like the idea of go­ing server­less (we do). What should we wor­ry about when we make that bet? What I hear when I talk to peo­ple think­ing about it, most­ly, is la­ten­cy. Can a run-on-demand func­tion re­spond as quick­ly as a warmed-up Web serv­er sit­ting there in mem­o­ry wait­ing for in­com­ing work? The an­swer, un­sur­pris­ing­ly, is “it depends”.

[This is part of the Server­less­ness se­ries.]

What we talk about when we talk about la­ten­cy · First of al­l, in this con­tex­t, la­ten­cy con­ver­sa­tions are al­most all about com­pute la­ten­cy; in the AWS con­tex­t, that means Lamb­da func­tions and Far­gate con­tain­er­s. For server­less mes­sag­ing ser­vices like SQS and databas­es like Dy­namoDB, the an­swer is gen­er­al­ly “fast enough to not wor­ry about”.

There’s this anti-pattern that still hap­pens some­times: I’m talk­ing to some­one about this sub­jec­t, and they say “I have a hard la­ten­cy re­quire­ment of 120ms”. (For those who aren’t in this cul­ture, “ms” stands for mil­lisec­onds and is the com­mon cur­ren­cy of la­ten­cy dis­cus­sion­s. So in this case, a lit­tle over a tenth of a sec­ond.)

In­side AWS, a claim like that would be met with a blank stare, be­cause la­ten­cy is way, way more than just a num­ber. To steal from an old joke: La­ten­cy is more com­pli­cat­ed than you think, even when you think it’s more com­pli­cat­ed than you think. Let’s start with a graph:

Latency Graph

To start with, no­body should ev­er talk about la­ten­cy with­out a P-number. P50 means a time such that la­ten­cy is less than that 50% of the time, P90 such that la­ten­cy is less 90% of the time, and so on in­to P99, P99.9; and then P100 is the longest la­ten­cy ob­served in some mea­sure­ment in­ter­val.

Look­ing at that graph, you can see that half the queries com­plet­ed in about a quar­ter of a sec­ond, 90% in un­der a sec­ond, 99% in un­der five sec­ond­s, and there were a few trail­ers out there in twenty-something sec­ond ter­ri­to­ry. (If you’re won­der­ing, this is a re­al graph of one of the mi­croser­vices in­side EC2 Au­to Scal­ing, some control-plane cal­l. The vari­a­tion is be­cause most Au­to Scal­ing Groups have a single-digit hand­ful of in­stances in them, but some have hun­dreds and a very few have tens of thou­sand­s.)

Now, let’s make it more com­pli­cat­ed.

Run­ning hot and cold · The time it takes to launch a func­tion de­pends on how re­cent­ly you’ve launched the func­tion. Be­cause if you’ve run it rea­son­ably re­cent­ly, we’ve prob­a­bly got it load­ed on a host and ready to go, it’s just a mat­ter of rout­ing the event to the right place. If not, we have to go find an emp­ty host, find your func­tion in stor­age, pull it out, and in­stall it be­fore we fire it up. The lat­ter sce­nario is re­ferred to as a “Cold Start”, and with any luck will on­ly show up at some high P-number, like P90 or high­er. The la­ten­cy dif­fer­ence can be sur­pris­ing.

It turns out that there are a va­ri­ety of tricks you can use to re­me­di­ate cold-start ef­fect­s; ask your fa­vorite search en­gine. And that’s all I’m go­ing to say on the sub­jec­t, be­cause while the tech­niques work, they’re an­noy­ing and it’s al­so an­noy­ing that peo­ple have to use them; this is a prob­lem that we need to just do away with.

warming up a lambda

Pho­to: Ryan Mahle from Sher­man Oak­s, CA, USA - Flick­r.­com

Poly­glot la­ten­cy · Once the trig­ger­ing event is rout­ed to your func­tion, your func­tion gets to start com­put­ing. Un­for­tu­nate­ly, that doesn’t mean it al­ways starts do­ing use­ful work right away; and that de­pends on the lan­guage it’s writ­ten in. If it’s a NodeJS or Python pro­gram, it might have to load and com­pile some source code. If it’s Ja­va or .NET, it may have to get a VM run­ning. If it’s Go or C++ or Rust, you drop straight in­to bi­na­ry code.

And be­cause this is la­ten­cy, it’s even more com­pli­cat­ed than that. Be­cause some of the lan­guage run­time ini­tial­iza­tion hap­pens on­ly on cold starts and some even on warm start­s.

It’s worth say­ing a few words about Ja­va here. There is no com­put­er lan­guage that, for prac­ti­cal pur­pos­es, runs use­ful­ly faster on general-purpose server-side code than Java. That is to say, Ja­va af­ter your pro­gram is all ini­tial­ized and the VM warmed up. There has been a more-or-less con­scious cul­ture, stretch­ing back over the decades of Java’s life, of buy­ing run­time per­for­mance and be­ing will­ing to sac­ri­fice start­up per­for­mance.

And of course it’s not all Java’s fault; a lot of app code starts up slow be­cause of mas­sive dependency-injection frame­works like Spring and Guice; these tend to pri­or­i­tize flur­ries of calls through the Ja­va re­flec­tion APIs over han­dling that first re­quest. Now, Ja­va needn’t have slug­gish star­tup; if you must have de­pen­den­cy in­jec­tion, check out Dag­ger, which tries to do it at com­pile time.

The Go language gopher mascot

The take-away, though, is that main­stream Ja­va is slow to start and you need to do ex­tra work to get around that. My re­ac­tion is “Maybe don’t use Ja­va then.” There are mul­ti­ple oth­er run­times whose cold-start be­hav­ior doesn’t fea­ture those ug­ly P90 num­ber­s. One ex­am­ple would be NodeJS, and you could use that but I wouldn’t, be­cause I have no pa­tience for the NPM de­pen­den­cy labyrinth and al­so don’t like JavaScrip­t. Another would be Python, which is not on­ly a de­cent choice but al­most com­pul­so­ry if you’re in Sci­en­tif­ic Com­put­ing or Ma­chine Learn­ing.

But my per­son­al fa­vorite choice for server­less com­pute is the Go pro­gram­ming lan­guage. It’s got great, clean, fast, tool­ing, it pro­duces stat­ic bi­na­ries, it’s got su­perb con­cur­ren­cy prim­i­tives that make it easy to avoid the kind of race con­di­tions that plague any­one who goes near java.lang.Thread, and fi­nal­ly, it is ex­ceed­ing­ly read­able, a cri­te­ri­on that weighs more heav­i­ly with me as each year pass­es. Plus the Go Lamb­da run­time is freak­ing ex­cel­len­t.

State hy­dra­tion · It’s easy to think about start­up la­ten­cy prob­lems as part of the in­fras­truc­ture, whether it’s the ser­vice or the run­time, but lots of times, la­ten­cy prob­lems are right there in your own code. It’s not hard to see why; ser­vices like Lamb­da are built around state­less func­tion­s, but some­times, when an event ar­rives at the front door, you need some state to deal with it. I call this pro­cess “state hydration”.

Here’s an ex­treme ex­am­ple of that: A start­up I was talk­ing to that had a grow­ing cus­tomer base and al­so grow­ing AWS bill­s. Their load was su­per peaky and they were (rea­son­ably) grumpy about pay­ing for com­put­ers to not do any­thing. I said “Serverless?” and they said “Yeah, no, not go­ing to happen” and I said “Why not?” and they said “Drupal”. Dru­pal is a PHP-based Web frame­work that prob­a­bly still drives a sub­stan­tial por­tion of the In­ter­net, but it’s very database-centric, and this par­tic­u­lar app need­ed to run like eight Post­greSQL queries to re­cov­er enough con­text to do any use­ful work. So a Lamb­da func­tion wasn’t re­al­ly an op­tion.

Here’s an ex­treme ex­am­ple of the op­po­site, that I pre­sent­ed in a ses­sion at re:In­vent 2017. Thom­son Reuters is a well-known news or­ga­ni­za­tion that has to deal with loads of in­com­ing videos; the pro­cess in­cludes transcod­ing and re­for­mat­ting. This tends to be lin­ear in the size of the video with a mul­ti­pli­er not far off 1, so a half-hour video clip could take a half-hour to pro­cess.

They came up with this ultra-clever scheme where they used FFm­peg to chop the video up in­to half-second-ish seg­ments, then threw them in­to an S3 buck­et which they’d set up to fire a Lamb­da for each new ob­jec­t. Those Lamb­das pro­cessed the seg­ments in par­al­lel, FFm­peg glued them back to­geth­er, and all of a sud­den they were pro­cess­ing a half-hour video in a hand­ful of sec­ond­s. State hy­dra­tion? No such thing, the on­ly thing the Lamb­da need­ed to know was the S3 ob­ject name.

Another nice thing about the server­less ap­proach here is that do­ing this in the tra­di­tion­al style would have re­quired stag­ing a big enough fleet, which (s­ince this is a news pub­lish­er) would have meant pre­dict­ing when news would hap­pen, and how tele­genic it would be. Which would ob­vi­ous­ly be im­pos­si­ble. So this app has SERVERLESS writ­ten on it in let­ters of fire 500 me­ters high.

Database or not · The con­ven­tion­al ap­proach to state hy­dra­tion is to load your con­text out of a database. And that’s not nec­es­sar­i­ly ter­ri­ble, it doesn’t mean you have to get stuck in a cor­ner like those Drupal-dependent peo­ple. For ex­am­ple:

  1. You could use some­thing like Redis or Mem­cache (maybe via Elas­ti­cache); those things are fast.

  2. You could use a key/­val­ue op­ti­mized NoSQL database like Dy­namoDB or Cas­san­dra or Mon­go.

  3. You could use some­thing that sup­ports GraphQL (like Ap­pSync), a pro­to­col specif­i­cal­ly de­signed to turn a flur­ry of REST­ful fetch­es in­to a sin­gle op­ti­mized HTTP round trip.

  4. You could pack­age up your events with a whole lot more con­text so that the code pro­cess­ing them doesn’t have to do much work to get its bear­ings. The SQS-to-Lambda ca­pa­bil­i­ty we an­nounced ear­li­er this year is get­ting a whole lot of use, and I bet most of those read­er func­tions start up pret­ty damn quick.

La­ten­cy and affin­i­ty · There’s been this widely-held be­lief for years that the on­ly way to get good la­ten­cy in han­dling events or re­quests is to have state in mem­o­ry. Thus we have things like ses­sion affin­i­ty and “sticky sessions” in con­ven­tion­al Web-facing app­s, where you try to route strongly-related queries to the same serv­er in a load-balanced fleet.

This can help with la­ten­cy (we’ve used it in AWS ser­vices), but comes with its own set of prob­lem­s. Most ob­vi­ous­ly, what hap­pens when you lose that host, ei­ther be­cause it fails or be­cause you need to bounce it to patch the OS? First, you have to no­tice that it’s gone (hard­er than you’d think to do re­li­ably), then you have to ad­just the affin­i­ty rout­ing, then you have to re­fresh the con­text in the re­place­ment server. And what hap­pens when you lose a whole Avail­abil­i­ty Zone, say a third of your fleet?

If you can pos­si­bly fig­ure out a way to do state hy­dra­tion fast, then you don’t have to have those ses­sion affin­i­ty strug­gles; just spray your re­quests or events across your fleet, try­ing to stress all the hosts even­ly (still non­triv­ial, but tractable) and have a much sim­pler man­age­ment task.

And once you’ve done that, you can prob­a­bly just go server­less, let Lamb­da han­dle smooth­ing out the load, and don’t write any code that isn’t pure value-adding ap­pli­ca­tion log­ic.

How to talk about it · To start with, don’t just say “I need 120ms.” Try some­thing more like “This has to be in Python, the data’s in Cas­san­dra, and I need the P50 down un­der a fifth of a sec­ond, ex­cept I can tol­er­ate 5-second la­ten­cy if it doesn’t hap­pen more than once an hour.” And in most main­stream ap­pli­ca­tion­s, you should be able to get there with server­less. If you plan for it.



Contributions

Comment feed for ongoing:Comments feed

From: eerie quark doll (Dec 15 2018, at 11:40)

Nice article.

[link]

From: Hugh (Dec 15 2018, at 16:36)

Thank you for this series - a great short introduction to why it's worth considering.

[link]

From: Ken Kennedy (Dec 16 2018, at 09:52)

Thanks for the article (and the whole series), Tim! This stuff is really interesting; I'm a DBA by trade, and do a hunk of dev work on the side...so I'm definitely by default in the "keep your state in the database" camp. These sorts of articles really help me wrap my head around the process changes needed for successful serverless.

[link]

From: Efi (Dec 16 2018, at 13:35)

Hey Tim,

Thank you for writing about this topic. Actually in our testing, we've found out that code initialization has more effect on your cold start than the other points.

Here is a nice summary of our results: https://medium.com/lumigo/web-frameworks-implication-on-serverless-cold-start-18ee5eb6c62a

[link]

From: Gavin B (Dec 19 2018, at 23:57)

Here's a TheRegister perspective - subheaded:

"If 2019 is the year you try AWS Lambda et al, then here are pitfalls to look out for"

https://www.theregister.co.uk/2018/12/19/serverless_computing_study/

[link]

author · Dad · software · colophon · rights

December 14, 2018
· Technology (85 fragments)
· · Cloud (15 more)

By .

I am an employee
of Amazon.com, but
the opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.