I find myself tasked with polishing and publishing a little custom JSON-encoded language. It’s harder than it ought to be.

This didn’t start with the language, it started with prototype software this guy wrote, that did something old and familiar in a new and dramatically better way. He replaced a bunch of gnarly old code with a few JSON templates to save time. Now, in the rearview, the JSON looks like an important part of an important product.

And there’s a lesson in that: All the good markup vocabularies are discovered by coders trying to get shit done, not cooked up in committee rooms in advance of software. For example, TimBL just needed to tag his hypertexts. Not that this is that big.

Q: Why JSON? · If it looks like a document, use XML. If it looks like an object, use JSON. It’s that simple. The essential difference isn’t simplicity/complexity or compact/verbose or typed/text, it’s ordered-by-default or not.

This particular thing I’m working on is a lot like objects and not at all like documents. Case closed.

By the way, it’s amusing that this century hasn’t yet offered a plausible new markup alternative to the last one’s mouldy leftovers. Also pleasing to one who has left fingerprints on both leftovers.

Q: How to author? · With pain. By default, I write things in Emacs, which unaccountably doesn’t have a JSON mode that knows where to put the fucking commas.

I’ve also tried editing JSON in IntelliJ and it’s not terrible but not remotely good enough. I’m writing what you’re now reading in XML in an Emacs mode that’s a fantastically-productive finely-tuned machine, so the thing should be possible.

There’s another solution: make JSON easier. Hjson (“the Human JSON”) addresses this problem and looks quite cleanly thought out. There’s also JSON5 (“JSON for the ES5 era”), but I stopped reading at “Unicode characters and escape sequences aren’t yet supported”.

Meh. Fewer markup languages are better; fix the editors, I say. Maybe Amazon would be OK with me dropping all this cloud stuff and working on JSON authoring tools for a few months? Not holding my breath.

Q: How to specify? · If you want to ship your own language, you need to specify it. The most important part of any language spec is the human-readable prose that describes the constructs, says what they mean, and offers advice on how to use them.

Next most important is examples; lots of ’em, well-chosen, and downloadable from somewhere.

(Actually, if you ever find yourself tasked with specifying something, go read Mark Pilgrim’s Why specs matter, which clarifies your task: Turning morons into experts.)

Next most important, an open-source validator, so people can check their efforts; it should have helpful error messages which include line and column numbers.

Now, to write the validator, a schema will help, and you should write one anyhow; if you’re like most people, writing down the formalisms will probably shake out some bugs and sloppy thinking in your mental model of your language.

Having a schema is a lousy way to explain a language, and it only solves part of the validation problem, because any nontrivial language will have semantic constraints that you can’t capture declaratively.

But like I said, you should write one anyhow. Which means that if your language is JSON-based, it sucks to be you, because JSON Schema is a big fat pool of pain.

JSON Schema, sigh · If you find yourself needing to use it, run don’t walk over to Understanding JSON Schema, which has lots of examples and reasonably human-readable narrative; a nice piece of work.

In its first paragraph, it says “learning to use [JSON Schema] by reading its specification is like learning to drive a car by looking at its blueprints.” They’re being tactful; the JSON Schema spec is really not very good at all. I don’t think this is a controversial thing to say, but let me offer some evidence anyhow.

Most obvious: There are multiple pieces of software out there that claim to implement JSON Schema, and their behavior is really inconsistent, in my experience.

One area where I observe inconsistencies is in the handling of the “$ref” construct. Irritated, I decided to go check the official spec. “$ref” is not defined (but is used) in JSON Schema Core. Same for JSSON Schema Validation. Same for JSON Hyper-Schema. Same for the Core/Validation Meta-Schema and the Hyper Meta-Schema

Actually, I could be wrong; the spec is really hard to read; and I say that as one with much more experience in spec-reading and schemas than most.

When I’m defining a language, I need to work on the schema and the validator and the examples all together, iteratively, validating things multiple times per minute, so I need a tool that I can run from the command line that is:

  1. Fast,

  2. consistent,

  3. complete, and

  4. reports errors with line and column numbers.

So far, I haven’t found one. JSONLint (npm) isn’t up to date with the latest schema spec draft. json-schema and json_schema (ruby; see, hyphen vs underscore, isn’t that quaint?) are inconsistent, particularly in their handling of “$ref”, and neither of them reports line/column, and neither of them respond sanely to JSON-level rather than schema-level errors.

Then there’s json-schema-validator, but that’s in Java, which means it’s probably too slow for command-line use, and also I’m not smart enough to run Java from the command line without hand-holding from an IDE or a really good build system that knows dependencies.

Feaugh.

What I’m actually doing · First, I’m keeping my schema in one great big file; cross-file “$ref” handling is irremediably broken, near as I can tell.

Second, I’m using “json-schema” (hyphen not underbar) in a little ruby script, which first of all runs “jsonlint” to check for basic JSON sanity. “jsonlint” at least does line numbers, yay. ♫ Ruby and Node, sittin’ in a tree…

Because I distrust the tools I’m building a Java testbed in IntelliJ that will let me double-check with “json-schema-validator” from time to time.

JSON specification future? · I notice that the most recent JSON-Schema IETF draft expired in 2013, and that a couple of the tools have “looking for maintainer” signs posted on them.

Now that I’ve mostly figured out JSON Schema, I neither love it nor hate it. So far, I’ve been able to make it do what I wanted. But it sure feels lame compared to RELAX NG, in terms of polish, documentation, and tooling.

This is a space that could use some work.



Contributions

Comment feed for ongoing:Comments feed

From: John Cowan (Apr 30 2016, at 18:53)

If you want a sane markup language, see FtanML and its schema language FtanGram and transformation language FtanSkrit. It provides everything that sane XML does, while doing just enough more than JSON to make it able to replace both of them.

IMO JSON Schema is a horror following in the treads XML Schema down the selfsame road to hell. If you want to validate JSON, write a program in jq that returns true if its input is valid and false if it isn't. That doesn't solve the problem that XML Schema actually copes with of making sure an instance fits into bespoke data structures in a strongly typed programming language (or actually it could, but it's overkill), which is something I'm thinking about ways and means to specify right now.

[link]

From: Mark (Apr 30 2016, at 19:21)

I don't miss this, like, at all.

[link]

From: Wolfgang Schnerring (May 01 2016, at 00:13)

I'm using Emacs too, https://github.com/purcell/flymake-json helps me quite a bit.

[link]

From: Pete Forman (May 01 2016, at 02:04)

In emacs I pipe to python -m json.tool which is good enough to tell me the line number where the comma is missing.

[link]

From: Jim (May 01 2016, at 07:30)

> Maybe Ama­zon would be OK with me drop­ping all this cloud stuff and work­ing on JSON au­thor­ing tools for a few month­s? Not hold­ing my breath.

Intern season is coming up!

[link]

From: Nelson (May 01 2016, at 07:43)

I think it must mean something that JSON Schema is such a mess and underutilized. My immediate thought is that the programming community just really doesn't want schema for its APIs. Which seems crazy to me, but maybe true?

Do you have any opinions on Swagger? It keeps popping up when I look at nicely designed REST APIs with good docs. I'm not sure it solves your JSON Schema problem though. http://swagger.io/

[link]

From: Mike Sokolov (May 02 2016, at 10:24)

Most of the JSON I see gets produced as exhaust from Java libraries. So the people producing it write whatever specification they write as part of a boilerplate Java class, and use Jackson or something for de/serialization. When we're lucky they include javadocs, and the classes may have some semantics for enforcing constraints when the objects are constructed. I think this working model leads to a de-emphasis of concern for message formats and formal validation *as JSON*.

It's shortsighted, since Java objects don't lend themselves well to versioning, and network protocols and APIs invariably get versioned if they don't die. This approach also provides no support for working in any other language.

But I think this explains why the formal tools are much worse than in a document-centric world. Developers believe they are already handling these issues in their deserializer and object constructors.

BTW if you get Amazon behind your program and you want help, you could send me a note @ sokolovm@

[link]

From: Pete Cordell (May 03 2016, at 06:09)

As mentioned on the IETF JSON WG list, we're working on something called JSON Content Rules (JCR) [https://datatracker.ietf.org/doc/draft-newton-json-content-rules/]. It has similar functionality to Relax NG with compact syntax. Having my own set of scars from working with XSD, I'm pleased to say that it's a refreshingly compact and 'to-the-point' why of specifying message structure.

We also have put together some thought on JCR Co-Constraints (JCRCC) [https://datatracker.ietf.org/doc/draft-cordell-jcr-co-constraints/] which can express dependencies between message parts.

We're working on implementations. There's details at [http://codalogic.github.io/jcr/]. I'm hoping at some point to have command-line validators that can work on Windows, Linux and online.

[link]

From: Walter Underwood (May 03 2016, at 14:35)

The Python jsonschema library has been working pretty well for us. We validate every datafeed before loading it into the search engine.

It has a nifty feature to try and report the best error it finds. That has been pretty accurate for us.

https://pypi.python.org/pypi/jsonschema

But yes, learning JSON Schema is a pain. And the fucking commas in JSON. Geez.

[link]

From: rns (May 05 2016, at 22:25)

A command-line validating parser can be written in Perl using Marpa::R2, which generates general BNF parsers -- start with JSON BNF (grammar [1] or [2], they do semantics differently) and add specific rules for your language, such as lexical rules for all kinds of tokens.

Parsers generated with Marpa::R2 are very good at diagnostics (line, column and the entire parser state before error).

A BNF can also help specify, exemplify, and test a language -- Jeffrey Kegler, the author of Marpa, calls it "parser-driven development".

[1] https://metacpan.org/source/JKEGL/Marpa-R2-3.000000/t/sl_json.t

[2] https://metacpan.org/source/JKEGL/Marpa-R2-3.000000/t/sl_json_ast.t

[link]

From: Peter Amstutz (May 19 2016, at 05:07)

The JSON alternative with the most traction seems to be YAML. It embeds JSON syntax so any JSON document is also a valid YAML document, but also provides a block indentation syntax similar to Python.

An alternative to JSON schema is schema salad, http://github.com/common-workflow-language/schema_salad which is based on linked data principals for cross referencing rather than json-ref.

[link]

From: Kevin Suttle (May 19 2016, at 06:47)

I'm hopeful that GraphQL, Amazon IoN, and EDN will gain traction.

http://graphql.org

http://amznlabs.github.io/ion-docs/

https://github.com/edn-format/edn

[link]

From: John Klehm (May 19 2016, at 07:00)

Have you looked at ajv? It should be simple to write a cli wrapper for it.

http://epoberezkin.github.io/ajv/#validation-errors

It gives you the json data path rather than line number of the invalid piece but that should be sufficient for debugging.

[link]

From: Michael Prasuhn (May 19 2016, at 08:41)

Of course the great thing about standards is that there are so many of them. This is the first I've heard about Hjson – I've seen more adoption of HOCON* so far :(

*https://github.com/typesafehub/config/blob/master/HOCON.md

[link]

From: Micah Dubinko (May 19 2016, at 12:47)

In the XML world, Examplotron is a really well-designed specification. Alas, I’ve looked and not run in to anything the the same wheelhouse for JSON.

[link]

From: Sean Gilligan (May 19 2016, at 13:26)

You might want to try using RoboVM (last OSS version before it went proprietary and was shut down by Microsoft) to compile json-schema-validator. I've used it to compile some Java command line tools to native and it worked well.

(Note: I did this over a year ago with a much earlier version, I have not built OS X or Linux command-line tools with any recent version)

Here is a link to the page for the most-active maintenance fork of RoboVM: http://robovm.mobidevelop.com

[link]

From: Brian Gregg (May 19 2016, at 17:45)

Have you looked at EDN? Or Avro Schema?

[link]

From: Jonas Galvez (May 19 2016, at 20:12)

I've developed a little DSL for writing JSON Schema definitions (that get parsed into marshmallow objects, a Python library for validating schemas).

success:bool data:{name:str link:url vitamins,minerals:str[]}

Will validate a JSON like:

{

"success": true,

"data": [

{"name": "...", link: "http://...", "vitamins": ["..."], ...}

]

}

Summarised for brevity. See the whole bit at:

http://hire.jonasgalvez.com.br/2016/Apr/23/Scalable-API-Testing

"Running tests against responses"

[link]

author · Dad
colophon · rights
picture of the day
April 30, 2016
· Technology (90 fragments)
· · Software (80 more)

By .

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!