This morn­ing we re­leased AWS Step Func­tions, a Server­less Distribut­ed Cloud Mi­croser­vices Poly­glot Work­flow Orches­tra­tion Co­or­di­na­tor thingie. It’s cool; you want to read about it, vis­it Jeff Barr’s joint. Step Func­tions us­es state ma­chines spec­i­fied in a JSON DSL called the States lan­guage. This piece is about the two val­ida­tors I wrote for States doc­u­ments, one of which is in­ter­est­ing.

Not that many peo­ple care about val­ida­tors and parser­s; so if you think you prob­a­bly won’t be in­ter­est­ed in the rest of this piece, you’re prob­a­bly right.

The val­ida­tor in the ser­vice · Like most AWS Ser­vices, Step Func­tions has a front-end that han­dles the API calls and dis­patch­es the work. One of the things it ob­vi­ous­ly has to do is val­i­date the States-language def­i­ni­tions you up­load in the CreateS­tateMa­chine cal­l. It made sense for me to write the val­ida­tor since I was edit­ing the States lan­guage spec­i­fi­ca­tion, and have spent more time in the val­i­da­tion trench­es than most.

Ear­li­er this year I’d poked around the JSON ecosys­tem and end­ed up writ­ing an an­guished rant about the aw­ful­ness of all the schema al­ter­na­tives, in par­tic­u­lar JSON Schema. But lat­er, when I signed up to do the service’s val­ida­tor (it’s Ja­va code) I couldn’t find a good al­ter­na­tive, so I spent an un­rea­son­ably long time writ­ing a schema, and a val­ida­tor based on it.

We’ll re­lease that JSON Schema, or the val­ida­tor source, over my dead body. It’s not a good ex­am­ple of any­thing.

The re­sult work­s, but the validator’s not as fast as I’d like and its er­ror mes­sages aren’t as good as I’d like, and when I’m build­ing a States doc­u­men­t, I don’t want to have to sub­mit it to an Ama­zon Web Ser­vice to find my fat-fingered ty­pos.

All this didn’t make me hap­py, so just a few weeks ago I de­cid­ed we need­ed more and end­ed up writ­ing a hur­ried Ru­by Gem called statelint that you can run from your com­mand line; it’s ac­cept­ably fast and pro­duces rea­son­ably human-readable de­scrip­tions of the prob­lems it find­s. The rest of this piece is about Statelint’s con­struc­tion.

The spec and the val­ida­tor · The States Lan­guage spec, of which I’m an au­thor, should not sur­prise any­one; it’s writ­ten some­what in the style of an IETF RFC, full of MUST and MUST NOT and MAY as spec­i­fied in RFC 2119.

When I sat down to write Statelin­t, it oc­curred to me that it’d be cool to pro­duce er­ror mes­sages that lined up nice­ly with those MUST and MAY claus­es. So I ex­tract­ed them all in­to a file called StateMachine.j2119, where the “j” in the ex­ten­sion is for JSON.

The rest was con­cep­tu­al­ly sim­ple:

  1. Load the 2119-style as­ser­tions in­to a da­ta struc­ture.

  2. Turn the States-language JSON in­to a tree and check its nodes against the con­straints in that da­ta struc­ture.

So it turns out that Statelint is ac­tu­al­ly two Ru­by gem­s. One is called j2119 and does #1 and #2 from that list above.

There are three for­malism­s, “roles”, “types” and “constraints”. For ex­am­ple, here’s an ex­am­ple as­ser­tion:

A Message MUST have an object-array field named "Paragraphs"; each member is a "Paragraph".

In the above assertion, “Message” and “Paragraph” are roles and “object-array” is a type. The constraints, which apply to any node with the role “Message”, are: There must be a field named “Paragraphs”, it must be an array, and the members must all be objects. Also, when validating the object members of the array, they are considered to have the role “Paragraph”.

The main Statelint gem bootstraps a j2119 parser with the assertions from the States Languages spec, calls it, then adds semantic validations that don’t fit into MUST/MAY assertions.

What’s good · It’s fast enough. The error messages are reasonably sensible and correspond well to the MUSTs and MAYs in the specification. Anyone can type “gem install statelint” and be off to the races.

What’s wrong · It doesn’t do line numbers, because the default Ruby JSON reader doesn’t. I understand there are other readers that do, but I didn’t get around to wiring one in. What it does is give you a path location, for example “State Machine.States.selector.Choices[2].Variable”. Better than nothing.

The implementation is kind of gross. The .j2119 assertions are parsed with brute-force regular expressions. Not only is this a maintainability nightmare, it’s just stupid — it would have been less work to use a decent parser generator. I think deploying a natural-language parser might even be sane.

I had to do a little hand-editing on the MUST/MAY assertions from the spec to make them regular enough. What I wanted to do, but lacked the time and courage for, was to go and wire all those assertions back into the spec and mark them with <div class="assertion"> or some such so the spec could be used to drive the validator directly.

What’s next · Am I claiming I’ve invented a new JSON schema language? Empirically maybe yes, but not really; all I’ve established is that one hasty implementation produced one plausible validator for one JSON dialect.

There’s a problem in that I don’t much like schema languages; they get used as a crutch to do avoid doing the vital work, when you’re defining a new language, of writing clear expository human-readable prose explaining how it works, and including lots of instructive examples. I suspect that schema languages should be only invented by people who like them.

For now, I’ll look at bug reports and pull requests, and if someone’s fork started getting traction, I’d shed no tears.

Maybe this will turn out to have been worthwhile research into the JSON-validation solution space. It seems obvious that we need better tools than we currently have.

Well, and what really matters: If State Functions attracts AWS customers, Statelint should make their lives a little easier.



Contributions

Comment feed for ongoing:Comments feed

From: Rob (Dec 03 2016, at 06:47)

For a second there I somehow read the headline as "Validating State VOTING Machines" and I got all excited.

[link]

author · Dad · software · colophon · rights
picture of the day
December 01, 2016
· Technology (80 fragments)
· · Internet (106 more)

By .

I am an employee
of Amazon.com, but
the opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.