Specifying JSON

I find myself tasked with polishing and publishing a little custom JSON-encoded language. It’s harder than it ought to be.

This didn’t start with the language, it started with prototype software this guy wrote, that did something old and familiar in a new and dramatically better way. He replaced a bunch of gnarly old code with a few JSON templates to save time. Now, in the rearview, the JSON looks like an important part of an important product.

And there’s a lesson in that: All the good markup vocabularies are discovered by coders trying to get shit done, not cooked up in committee rooms in advance of software. For example, TimBL just needed to tag his hypertexts. Not that this is that big.

Q: Why JSON? · If it looks like a document, use XML. If it looks like an object, use JSON. It’s that simple. The essential difference isn’t simplicity/complexity or compact/verbose or typed/text, it’s ordered-by-default or not. ¶

This particular thing I’m working on is a lot like objects and not at all like documents. Case closed.

By the way, it’s amusing that this century hasn’t yet offered a plausible new markup alternative to the last one’s mouldy leftovers. Also pleasing to one who has left fingerprints on both leftovers.

Q: How to author? · With pain. By default, I write things in Emacs, which unaccountably doesn’t have a JSON mode that knows where to put the fucking commas. ¶

I’ve also tried editing JSON in IntelliJ and it’s not terrible but not remotely good enough. I’m writing what you’re now reading in XML in an Emacs mode that’s a fantastically-productive finely-tuned machine, so the thing should be possible.

There’s another solution: make JSON easier. Hjson (“the Human JSON”) addresses this problem and looks quite cleanly thought out. There’s also JSON5 (“JSON for the ES5 era”), but I stopped reading at “Unicode characters and escape sequences aren’t yet supported”.

Meh. Fewer markup languages are better; fix the editors, I say. Maybe Amazon would be OK with me dropping all this cloud stuff and working on JSON authoring tools for a few months? Not holding my breath.

Q: How to specify? · If you want to ship your own language, you need to specify it. The most important part of any language spec is the human-readable prose that describes the constructs, says what they mean, and offers advice on how to use them. ¶

Next most important is examples; lots of ’em, well-chosen, and downloadable from somewhere.

(Actually, if you ever find yourself tasked with specifying something, go read Mark Pilgrim’s Why specs matter, which clarifies your task: Turning morons into experts.)

Next most important, an open-source validator, so people can check their efforts; it should have helpful error messages which include line and column numbers.

Now, to write the validator, a schema will help, and you should write one anyhow; if you’re like most people, writing down the formalisms will probably shake out some bugs and sloppy thinking in your mental model of your language.

Having a schema is a lousy way to explain a language, and it only solves part of the validation problem, because any nontrivial language will have semantic constraints that you can’t capture declaratively.

But like I said, you should write one anyhow. Which means that if your language is JSON-based, it sucks to be you, because JSON Schema is a big fat pool of pain.

JSON Schema, sigh · If you find yourself needing to use it, run don’t walk over to Understanding JSON Schema, which has lots of examples and reasonably human-readable narrative; a nice piece of work. ¶

In its first paragraph, it says “learning to use [JSON Schema] by reading its specification is like learning to drive a car by looking at its blueprints.” They’re being tactful; the JSON Schema spec is really not very good at all. I don’t think this is a controversial thing to say, but let me offer some evidence anyhow.

Most obvious: There are multiple pieces of software out there that claim to implement JSON Schema, and their behavior is really inconsistent, in my experience.

One area where I observe inconsistencies is in the handling of the “$ref” construct. Irritated, I decided to go check the official spec. “$ref” is not defined (but is used) in JSON Schema Core. Same for JSSON Schema Validation. Same for JSON Hyper-Schema. Same for the Core/Validation Meta-Schema and the Hyper Meta-Schema

Actually, I could be wrong; the spec is really hard to read; and I say that as one with much more experience in spec-reading and schemas than most.

When I’m defining a language, I need to work on the schema and the validator and the examples all together, iteratively, validating things multiple times per minute, so I need a tool that I can run from the command line that is:

Fast,
consistent,
complete, and
reports errors with line and column numbers.

So far, I haven’t found one. JSONLint (npm) isn’t up to date with the latest schema spec draft. json-schema and json_schema (ruby; see, hyphen vs underscore, isn’t that quaint?) are inconsistent, particularly in their handling of “$ref”, and neither of them reports line/column, and neither of them respond sanely to JSON-level rather than schema-level errors.

Then there’s json-schema-validator, but that’s in Java, which means it’s probably too slow for command-line use, and also I’m not smart enough to run Java from the command line without hand-holding from an IDE or a really good build system that knows dependencies.

Feaugh.

What I’m actually doing · First, I’m keeping my schema in one great big file; cross-file “$ref” handling is irremediably broken, near as I can tell. ¶

Second, I’m using “json-schema” (hyphen not underbar) in a little ruby script, which first of all runs “jsonlint” to check for basic JSON sanity. “jsonlint” at least does line numbers, yay. ♫ Ruby and Node, sittin’ in a tree… ♫

Because I distrust the tools I’m building a Java testbed in IntelliJ that will let me double-check with “json-schema-validator” from time to time.

JSON specification future? · I notice that the most recent JSON-Schema IETF draft expired in 2013, and that a couple of the tools have “looking for maintainer” signs posted on them. ¶

Now that I’ve mostly figured out JSON Schema, I neither love it nor hate it. So far, I’ve been able to make it do what I wanted. But it sure feels lame compared to RELAX NG, in terms of polish, documentation, and tooling.

This is a space that could use some work.

Contributions

Comment feed for ongoing:

From: John Cowan (Apr 30 2016, at 18:53)

If you want a sane markup language, see FtanML and its schema language FtanGram and transformation language FtanSkrit. It provides everything that sane XML does, while doing just enough more than JSON to make it able to replace both of them.

IMO JSON Schema is a horror following in the treads XML Schema down the selfsame road to hell. If you want to validate JSON, write a program in jq that returns true if its input is valid and false if it isn't. That doesn't solve the problem that XML Schema actually copes with of making sure an instance fits into bespoke data structures in a strongly typed programming language (or actually it could, but it's overkill), which is something I'm thinking about ways and means to specify right now.