I find my­self tasked with pol­ish­ing and pub­lish­ing a lit­tle cus­tom JSON-encoded lan­guage. It’s hard­er than it ought to be.

This didn’t start with the lan­guage, it start­ed with pro­to­type soft­ware this guy wrote, that did some­thing old and fa­mil­iar in a new and dra­mat­i­cal­ly bet­ter way. He re­placed a bunch of gnarly old code with a few JSON tem­plates to save time. Now, in the rearview, the JSON looks like an im­por­tant part of an im­por­tant pro­duc­t.

And there’s a les­son in that: All the good markup vo­cab­u­lar­ies are dis­cov­ered by coders try­ing to get shit done, not cooked up in com­mit­tee rooms in ad­vance of soft­ware. For ex­am­ple, TimBL just need­ed to tag his hy­per­texts. Not that this is that big.

Q: Why JSON? · If it looks like a doc­u­men­t, use XML. If it looks like an ob­jec­t, use JSON. It’s that sim­ple. The es­sen­tial dif­fer­ence isn’t sim­plic­i­ty/­com­plex­i­ty or com­pact/ver­bose or type­d/­tex­t, it’s ordered-by-default or not.

This par­tic­u­lar thing I’m work­ing on is a lot like ob­jects and not at all like doc­u­ments. Case closed.

By the way, it’s amus­ing that this cen­tu­ry hasn’t yet of­fered a plau­si­ble new markup al­ter­na­tive to the last one’s mouldy left­over­s. Al­so pleas­ing to one who has left fin­ger­prints on both left­over­s.

Q: How to au­thor? · With pain. By de­fault, I write things in Emac­s, which un­ac­count­ably doesn’t have a JSON mode that knows where to put the fuck­ing com­mas.

I’ve al­so tried edit­ing JSON in In­tel­liJ and it’s not ter­ri­ble but not re­mote­ly good enough. I’m writ­ing what you’re now read­ing in XML in an Emacs mode that’s a fantastically-productive finely-tuned ma­chine, so the thing should be pos­si­ble.

There’s an­oth­er so­lu­tion: make JSON eas­ier. Hj­son (“the Hu­man JSON”) ad­dress­es this prob­lem and looks quite clean­ly thought out. There’s al­so JSON5 (“JSON for the ES5 era”), but I stopped read­ing at “Unicode char­ac­ters and es­cape se­quences aren’t yet supported”.

Me­h. Few­er markup lan­guages are bet­ter; fix the ed­i­tors, I say. Maybe Ama­zon would be OK with me drop­ping all this cloud stuff and work­ing on JSON au­thor­ing tools for a few month­s? Not hold­ing my breath.

Q: How to spec­i­fy? · If you want to ship your own lan­guage, you need to spec­i­fy it. The most im­por­tant part of any lan­guage spec is the human-readable prose that de­scribes the con­struct­s, says what they mean, and of­fers ad­vice on how to use them.

Next most im­por­tant is ex­am­ples; lots of ’em, well-chosen, and down­load­able from some­where.

(Ac­tu­al­ly, if you ev­er find your­self tasked with spec­i­fy­ing some­thing, go read Mark Pilgrim’s Why specs mat­ter, which clar­i­fies your task: Turn­ing mo­rons in­to ex­pert­s.)

Next most im­por­tan­t, an open-source val­ida­tor, so peo­ple can check their ef­fort­s; it should have help­ful er­ror mes­sages which in­clude line and col­umn num­ber­s.

Now, to write the val­ida­tor, a schema will help, and you should write one any­how; if you’re like most peo­ple, writ­ing down the for­malisms will prob­a­bly shake out some bugs and slop­py think­ing in your men­tal mod­el of your lan­guage.

Hav­ing a schema is a lousy way to ex­plain a lan­guage, and it on­ly solves part of the val­i­da­tion prob­lem, be­cause any non­triv­ial lan­guage will have se­man­tic con­straints that you can’t cap­ture declar­a­tive­ly.

But like I said, you should write one any­how. Which means that if your lan­guage is JSON-based, it sucks to be you, be­cause JSON Schema is a big fat pool of pain.

JSON Schema, sigh · If you find your­self need­ing to use it, run don’t walk over to Un­der­stand­ing JSON Schema, which has lots of ex­am­ples and rea­son­ably human-readable nar­ra­tive; a nice piece of work.

In its first para­graph, it says “learning to use [JSON Schema] by read­ing its spec­i­fi­ca­tion is like learn­ing to drive a car by look­ing at its blueprints.” They’re be­ing tact­ful; the JSON Schema spec is re­al­ly not very good at al­l. I don’t think this is a con­tro­ver­sial thing to say, but let me of­fer some ev­i­dence any­how.

Most ob­vi­ous: There are mul­ti­ple pieces of soft­ware out there that claim to im­ple­ment JSON Schema, and their be­hav­ior is re­al­ly in­con­sis­ten­t, in my ex­pe­ri­ence.

One area where I ob­serve in­con­sis­ten­cies is in the han­dling of the “$ref” con­struc­t. Ir­ri­tat­ed, I de­cid­ed to go check the of­fi­cial spec. “$ref” is not de­fined (but is used) in JSON Schema Core. Same for JSSON Schema Val­i­da­tion. Same for JSON Hyper-Schema. Same for the Core/Val­i­da­tion Meta-Schema and the Hyper Meta-Schema

Ac­tu­al­ly, I could be wrong; the spec is re­al­ly hard to read; and I say that as one with much more ex­pe­ri­ence in spec-reading and schemas than most.

When I’m defin­ing a lan­guage, I need to work on the schema and the val­ida­tor and the ex­am­ples all to­geth­er, it­er­a­tive­ly, val­i­dat­ing things mul­ti­ple times per min­ute, so I need a tool that I can run from the com­mand line that is:

  1. Fast,

  2. con­sis­ten­t,

  3. com­plete, and

  4. re­ports er­rors with line and col­umn num­ber­s.

So far, I haven’t found one. JSONLint (npm) isn’t up to date with the lat­est schema spec draft. json-schema and json_schema (ruby; see, hy­phen vs un­der­score, isn’t that quain­t?) are in­con­sis­ten­t, par­tic­u­lar­ly in their han­dling of “$ref”, and nei­ther of them re­ports line/­colum­n, and nei­ther of them re­spond sane­ly to JSON-level rather than schema-level er­rors.

Then there’s json-schema-validator, but that’s in Java, which means it’s prob­a­bly too slow for command-line use, and al­so I’m not smart enough to run Ja­va from the com­mand line with­out hand-holding from an IDE or a re­al­ly good build sys­tem that knows de­pen­den­cies.


What I’m ac­tu­al­ly do­ing · First, I’m keep­ing my schema in one great big file; cross-file “$ref” han­dling is ir­re­me­di­a­bly bro­ken, near as I can tel­l.

Se­cond, I’m us­ing “json-schema” (hy­phen not un­der­bar) in a lit­tle ru­by scrip­t, which first of all runs “jsonlint” to check for ba­sic JSON san­i­ty. “jsonlint” at least does line num­ber­s, yay. ♫ Ru­by and Node, sittin’ in a tree…

Be­cause I dis­trust the tools I’m build­ing a Ja­va testbed in In­tel­liJ that will let me double-check with “json-schema-validator” from time to time.

JSON spec­i­fi­ca­tion fu­ture? · I no­tice that the most re­cent JSON-Schema IETF draft ex­pired in 2013, and that a cou­ple of the tools have “looking for maintainer” signs post­ed on them.

Now that I’ve most­ly fig­ured out JSON Schema, I nei­ther love it nor hate it. So far, I’ve been able to make it do what I want­ed. But it sure feels lame com­pared to RELAX NG, in terms of pol­ish, doc­u­men­ta­tion, and tool­ing.

This is a space that could use some work.


Comment feed for ongoing:Comments feed

From: John Cowan (Apr 30 2016, at 18:53)

If you want a sane markup language, see FtanML and its schema language FtanGram and transformation language FtanSkrit. It provides everything that sane XML does, while doing just enough more than JSON to make it able to replace both of them.

IMO JSON Schema is a horror following in the treads XML Schema down the selfsame road to hell. If you want to validate JSON, write a program in jq that returns true if its input is valid and false if it isn't. That doesn't solve the problem that XML Schema actually copes with of making sure an instance fits into bespoke data structures in a strongly typed programming language (or actually it could, but it's overkill), which is something I'm thinking about ways and means to specify right now.


From: Mark (Apr 30 2016, at 19:21)

I don't miss this, like, at all.


From: Wolfgang Schnerring (May 01 2016, at 00:13)

I'm using Emacs too, https://github.com/purcell/flymake-json helps me quite a bit.


From: Pete Forman (May 01 2016, at 02:04)

In emacs I pipe to python -m json.tool which is good enough to tell me the line number where the comma is missing.


From: Jim (May 01 2016, at 07:30)

> Maybe Ama­zon would be OK with me drop­ping all this cloud stuff and work­ing on JSON au­thor­ing tools for a few month­s? Not hold­ing my breath.

Intern season is coming up!


From: Nelson (May 01 2016, at 07:43)

I think it must mean something that JSON Schema is such a mess and underutilized. My immediate thought is that the programming community just really doesn't want schema for its APIs. Which seems crazy to me, but maybe true?

Do you have any opinions on Swagger? It keeps popping up when I look at nicely designed REST APIs with good docs. I'm not sure it solves your JSON Schema problem though. http://swagger.io/


From: Mike Sokolov (May 02 2016, at 10:24)

Most of the JSON I see gets produced as exhaust from Java libraries. So the people producing it write whatever specification they write as part of a boilerplate Java class, and use Jackson or something for de/serialization. When we're lucky they include javadocs, and the classes may have some semantics for enforcing constraints when the objects are constructed. I think this working model leads to a de-emphasis of concern for message formats and formal validation *as JSON*.

It's shortsighted, since Java objects don't lend themselves well to versioning, and network protocols and APIs invariably get versioned if they don't die. This approach also provides no support for working in any other language.

But I think this explains why the formal tools are much worse than in a document-centric world. Developers believe they are already handling these issues in their deserializer and object constructors.

BTW if you get Amazon behind your program and you want help, you could send me a note @ sokolovm@


From: Pete Cordell (May 03 2016, at 06:09)

As mentioned on the IETF JSON WG list, we're working on something called JSON Content Rules (JCR) [https://datatracker.ietf.org/doc/draft-newton-json-content-rules/]. It has similar functionality to Relax NG with compact syntax. Having my own set of scars from working with XSD, I'm pleased to say that it's a refreshingly compact and 'to-the-point' why of specifying message structure.

We also have put together some thought on JCR Co-Constraints (JCRCC) [https://datatracker.ietf.org/doc/draft-cordell-jcr-co-constraints/] which can express dependencies between message parts.

We're working on implementations. There's details at [http://codalogic.github.io/jcr/]. I'm hoping at some point to have command-line validators that can work on Windows, Linux and online.


From: Walter Underwood (May 03 2016, at 14:35)

The Python jsonschema library has been working pretty well for us. We validate every datafeed before loading it into the search engine.

It has a nifty feature to try and report the best error it finds. That has been pretty accurate for us.


But yes, learning JSON Schema is a pain. And the fucking commas in JSON. Geez.


From: rns (May 05 2016, at 22:25)

A command-line validating parser can be written in Perl using Marpa::R2, which generates general BNF parsers -- start with JSON BNF (grammar [1] or [2], they do semantics differently) and add specific rules for your language, such as lexical rules for all kinds of tokens.

Parsers generated with Marpa::R2 are very good at diagnostics (line, column and the entire parser state before error).

A BNF can also help specify, exemplify, and test a language -- Jeffrey Kegler, the author of Marpa, calls it "parser-driven development".

[1] https://metacpan.org/source/JKEGL/Marpa-R2-3.000000/t/sl_json.t

[2] https://metacpan.org/source/JKEGL/Marpa-R2-3.000000/t/sl_json_ast.t


From: Peter Amstutz (May 19 2016, at 05:07)

The JSON alternative with the most traction seems to be YAML. It embeds JSON syntax so any JSON document is also a valid YAML document, but also provides a block indentation syntax similar to Python.

An alternative to JSON schema is schema salad, http://github.com/common-workflow-language/schema_salad which is based on linked data principals for cross referencing rather than json-ref.


From: Kevin Suttle (May 19 2016, at 06:47)

I'm hopeful that GraphQL, Amazon IoN, and EDN will gain traction.





From: John Klehm (May 19 2016, at 07:00)

Have you looked at ajv? It should be simple to write a cli wrapper for it.


It gives you the json data path rather than line number of the invalid piece but that should be sufficient for debugging.


From: Michael Prasuhn (May 19 2016, at 08:41)

Of course the great thing about standards is that there are so many of them. This is the first I've heard about Hjson – I've seen more adoption of HOCON* so far :(



From: Micah Dubinko (May 19 2016, at 12:47)

In the XML world, Examplotron is a really well-designed specification. Alas, I’ve looked and not run in to anything the the same wheelhouse for JSON.


From: Sean Gilligan (May 19 2016, at 13:26)

You might want to try using RoboVM (last OSS version before it went proprietary and was shut down by Microsoft) to compile json-schema-validator. I've used it to compile some Java command line tools to native and it worked well.

(Note: I did this over a year ago with a much earlier version, I have not built OS X or Linux command-line tools with any recent version)

Here is a link to the page for the most-active maintenance fork of RoboVM: http://robovm.mobidevelop.com


From: Brian Gregg (May 19 2016, at 17:45)

Have you looked at EDN? Or Avro Schema?


From: Jonas Galvez (May 19 2016, at 20:12)

I've developed a little DSL for writing JSON Schema definitions (that get parsed into marshmallow objects, a Python library for validating schemas).

success:bool data:{name:str link:url vitamins,minerals:str[]}

Will validate a JSON like:


"success": true,

"data": [

{"name": "...", link: "http://...", "vitamins": ["..."], ...}



Summarised for brevity. See the whole bit at:


"Running tests against responses"


author · Dad · software · colophon · rights
picture of the day
April 30, 2016
· Technology (78 fragments)
· · Software (55 more)

By .

I am an employee
of Amazon.com, but
the opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.