I did some recreational programming over Christmas and the blog I wrote about it is now guesting in Jeff Barr’s space for your amusement; try the software at IsItOnAWS.com. What I didn’t do there was relay the lessons I picked up along the way; one or two are around AWS, but most follow from this being my first nontrivial expedition into the land of NodeJS. So (acknowledging that only 0.8% of my profession aren’t already Nodesters), here they are. Spoiler: I don’t like Node very much.
Lesson: Lambda has historically been used for behind-the-scenes work. But with the recent arrival of new API Gateway and Certificate Manager goodies, it’s become pretty easy to convince a function to serve HTTP requests pointed at your own web-space. Will this be a popular idiom? Beats me.
Lesson: I can now work with Node’s everything-is-a-callback worldview, but still, at the end of the day I think it’s wrong. What I want to do is fetch data, then process data, then write data, and if a damn computer language can’t give me a sequential abstraction when I want to do sequential things, well screw it.
Yeah, I acknowledge the kozmick performance gains Node achieves, even when living in a single-threaded environment, by pushing developers into callback-or-die territory, but you know, there are things like pre-emptive multitasking and thread pools that should let the system interleave IO and compute for performance without making me worry my pretty little head over it.
Having said that, async/waterfall is a straightforward way to remediate the damage.
Lesson: Constructing a zip was pretty easy with jszip. Except for, despite the fact that a zip is a bunch of bytes, jszip insisted on emitting a Node Stream. But it seems that NPM generally contains correctives for its misfeatures, in this case raw-body.
Lesson: Node’s HTTP-fetch function is kind of dumb and clumsy. Every language should have a one-liner that says “Here’s a URL, gimme back an object with the content-type and the response body’s bytes, or let me know if you can’t.” Of the languages I’ve used in recent years, only Go and Ruby do.
Lesson: Upon publishing this, I will receive much pitying feedback along the lines of “Well of course you could have done it in a one-liner using TheNewHotness.js.” And also pointing out many other better ways to have done this using things my Internet search skills were insufficiently advanced to discover. Draw your own conclusion.
Lesson: The IPv6 address-literal syntax is stupidly human-hostile.
Lesson: NPM has at least one of everything you can possibly imagine.
Lesson: NPM dependencies are a fulminating cancerous mess. This little Lambda that runs when the JSON updates needs fifteen freaking megabytes in its node_module directory, and the zip is like 2.5M. For the little function that actually handles the IsItOnAWS requests, I consciously tried to keep the dependencies down, but I still ended up needing async, ipaddr.js lodash, and sprintf-js for another 2½ meg. Feaugh. What’s a “lodash”, anyhow?
Lesson: The Lambda and S3 APIs are minimal, sensible, and well-integrated into Node’s resistence-is-futile you-will-learn-to-love-callbacks paradigm.
Lesson: The best Node code is Non Fancy Node.
Lesson: The tape unit-test harness Just Worked for me out of the box, had a nearly-zero learning curve, and was minimally intrusive. I’m a fan.
Comment feed for ongoing:
From: Brent Rockwood (Mar 30 2017, at 03:20)
If I were to make one change, it would be to link to some of the blog posts/Github repo from https://isitonaws.com so people can discover the neat technology behind it.
From: John Cowan (Mar 30 2017, at 05:20)
I left a comment at Jeff Barr's blog, but it doesn't seem to have been approved, so I'll leave it here. There's a nice method of using binary search on ranges that I learned about when messing with Unicode predicates like isBurmese and isUpperCase before every language had them built in. I can't tell from the description if you are already doing this.
Use an ordinary vector of numbers of size 2n, and fill it in thus: start1, end1 + 1, start2, end2 + 1, .... When you need to know if a number is in some range, do your binary search to see what vector element is just greater than or equal to the input. If it's at an odd index, the number is in a range; if it's at an even index, the number is not in any range.
You can do set operations on these vectors as well, though you don't need them for this problem. To negate, prefix the vector with 0 and drop off the last value, unless the vector already begins with 0, in which case you drop it and append the largest value + 1 (0x110000 in the Unicode case). Union is just merge, and intersection is another kind of merge that drops duplicates.
From: Patrick Quinn-Graham (Mar 31 2017, at 05:29)
Node encourages the use of streams because they allow you to handle data more efficiently (passing it from the zip compressor directly to a writable stream such as a file on disk, or in your case a remote server as bits are available) than buffering it all in ram first.
s3.putObject should allow you to pass the stream object instead of buffering it yourself first with raw-body. If not, s3.upload definitely does.
From: Gavin B. (Mar 31 2017, at 09:53)
The nice thin about Elixir on top of Erlang is that it simulates the sequential programming that you crave in node.js while under the hood its all tail callback. They also have a half decent web server now - Phoenix. Next vac maybe take a try. elixir-lang.org
From: Peter J. (Mar 31 2017, at 11:45)
"What I want to do is fetch data, then process data, then write data". Something like
looks an awful lot like promises. Any reason (other than "Non Fancy Node") you didn't go that way, since v6 is supported as a Lambda execution environment?