This morning we released AWS Step Functions, a Serverless Distributed Cloud Microservices Polyglot Workflow Orchestration Coordinator thingie. It’s cool; you want to read about it, visit Jeff Barr’s joint. Step Functions uses state machines specified in a JSON DSL called the States language. This piece is about the two validators I wrote for States documents, one of which is interesting.
Not that many people care about validators and parsers; so if you think you probably won’t be interested in the rest of this piece, you’re probably right.
The validator in the service · Like most AWS Services, Step Functions has a front-end that handles the API calls and dispatches the work. One of the things it obviously has to do is validate the States-language definitions you upload in the CreateStateMachine call. It made sense for me to write the validator since I was editing the States language specification, and have spent more time in the validation trenches than most.
Earlier this year I’d poked around the JSON ecosystem and ended up writing an anguished rant about the awfulness of all the schema alternatives, in particular JSON Schema. But later, when I signed up to do the service’s validator (it’s Java code) I couldn’t find a good alternative, so I spent an unreasonably long time writing a schema, and a validator based on it.
We’ll release that JSON Schema, or the validator source, over my dead body. It’s not a good example of anything.
The result works, but the validator’s not as fast as I’d like and its error messages aren’t as good as I’d like, and when I’m building a States document, I don’t want to have to submit it to an Amazon Web Service to find my fat-fingered typos.
All this didn’t make me happy, so just a few weeks ago I decided we needed more and ended up writing a hurried Ruby Gem called statelint that you can run from your command line; it’s acceptably fast and produces reasonably human-readable descriptions of the problems it finds. The rest of this piece is about Statelint’s construction.
The spec and the validator · The States Language spec, of which I’m an author, should not surprise anyone; it’s written somewhat in the style of an IETF RFC, full of MUST and MUST NOT and MAY as specified in RFC 2119.
When I sat down to write Statelint, it occurred to me that it’d be cool to produce error messages that lined up nicely with those MUST and MAY clauses. So I extracted them all into a file called StateMachine.j2119, where the “j” in the extension is for JSON.
The rest was conceptually simple:
Load the 2119-style assertions into a data structure.
Turn the States-language JSON into a tree and check its nodes against the constraints in that data structure.
So it turns out that Statelint is actually two Ruby gems. One is called j2119 and does #1 and #2 from that list above.
There are three formalisms, “roles”, “types” and “constraints”. For example, here’s an example assertion:
A Message MUST have an object-array field named "Paragraphs"; each member is a "Paragraph".
In the above assertion, “Message” and “Paragraph” are roles and “object-array” is a type. The constraints, which apply to any node with the role “Message”, are: There must be a field named “Paragraphs”, it must be an array, and the members must all be objects. Also, when validating the object members of the array, they are considered to have the role “Paragraph”.
The main Statelint gem bootstraps a j2119 parser with the assertions from the States Languages spec, calls it, then adds semantic validations that don’t fit into MUST/MAY assertions.
What’s good · It’s fast enough. The error messages are reasonably sensible and correspond well to the MUSTs and MAYs in the specification. Anyone can type “gem install statelint” and be off to the races.
What’s wrong · It doesn’t do line numbers, because the default Ruby JSON reader doesn’t. I understand there are other readers that do, but I didn’t get around to wiring one in. What it does is give you a path location, for example “State Machine.States.selector.Choices.Variable”. Better than nothing.
The implementation is kind of gross. The .j2119 assertions are parsed with brute-force regular expressions. Not only is this a maintainability nightmare, it’s just stupid — it would have been less work to use a decent parser generator. I think deploying a natural-language parser might even be sane.
I had to do a little hand-editing on the MUST/MAY assertions from the spec
to make them regular enough. What I wanted to do, but lacked the
time and courage for, was to go and wire all those assertions back into the
spec and mark them with
<div class="assertion"> or some such
so the spec could be used to drive the validator directly.
What’s next · Am I claiming I’ve invented a new JSON schema language? Empirically maybe yes, but not really; all I’ve established is that one hasty implementation produced one plausible validator for one JSON dialect.
There’s a problem in that I don’t much like schema languages; they get used as a crutch to do avoid doing the vital work, when you’re defining a new language, of writing clear expository human-readable prose explaining how it works, and including lots of instructive examples. I suspect that schema languages should be only invented by people who like them.
For now, I’ll look at bug reports and pull requests, and if someone’s fork started getting traction, I’d shed no tears.
Maybe this will turn out to have been worthwhile research into the JSON-validation solution space. It seems obvious that we need better tools than we currently have.
Well, and what really matters: If State Functions attracts AWS customers, Statelint should make their lives a little easier.