Most server-side APIs these days are JSON-over-HTTP. Developers are generally comfy with this, but I notice when I look at the JSON that it’s often, uh, what’s the tactful term these days? Let’s say “generously proportioned”. And I see clumsy code being written to walk through it. The options for dealing with this are interesting.

For example · I’ve been working with keybase.io recently; when you talk to their directory through their API, an entry is represented by a User Object, which is not exactly lightweight; here’s part of one which may be retrieved here.

{
  "status": {
    "code": 0,
    "name": "OK"
  },
  "guest_id": "05a8fdd28c23a5d5dc2c2f588c3e7b08",
  "them": {
    "id": "922d9f5ffd96b34b9133483091738a00",
    "basics": {
      "username": "timbray",
      "ctime": 1395088335,
      "mtime": 1395088335,
      "id_version": 9,
      "track_version": 11,
      "last_id_change": 1396452398
    },
    "profile": {
      "mtime": 1395088563,
      "full_name": "Tim Bray",
      "location": "Vancouver, Canada",
      "bio": "Long-time Web guy, usually observed wearing ...
    },
    "public_keys": {
      "primary": {
        "kid": "0101389af6856a7ef3392860...
        "key_type": 1,
        "bundle": "-----BEGIN PGP PUBLIC KEY BLOCK-----...

And it goes on from there. For quite a bit, actually. But for what I’m actually going to do, I only need a couple of those fields (ctime and the ASCII-armored public key). Compose in your head, if you will, the sequence of JSONObject.getString() calls you’d need to retrieve the key bits in the bundle field. Feaugh.

Also, consider all the extra JSON bytes occupying network space that I fully intend never to look at; so there are two problems here.

Plan A: JPath · Back in the days when APIs were all XML, I got into the habit, in clients, of fishing the bits I needed out of overinflated server blobs using XPath. A reasonable person might think “JPath?” And indeed, there’s JPath at stsvilik/jPath and at artjock/jpath and JPath on npm and JSONPath and json-path and so on, and they all define idiosyncratic selector-string syntaxes that do way past than just walk down an object chain.

Plausible; maybe there’s a there in there somewhere.

Plan B: Send less · Wouldn’t it be better if the server just sent what I need? It turns out that Google+ goes part of the way there, clients can ask for Partial Responses, using a fields argument.

Which, by the way, uses an idiosyncratic selector-string syntax.

It clearly works at slimming down your JSON on the wire, but you still have to do the dorky walk-down-the-object-chain logic in the client.

Plan C: JWalk · I cooked this up in about fifteen minutes to help me pick out pieces of Keybase.io responses, without really thinking it over much. To get the primary public-key bytes out of an HTTP response body, you say:

  JSONObject user = new JSONObject(response_body);
  String key = JWalk.getString(user, "them", "public_keys", "primary", "bundle");

Source on GitHub, not that it’s very sophisticated or anything. It doesn’t allow you to select a particular array element, or jump steps with a “descendent” relationship or, well, anything; except walk through an object chain. So, no query string, just a JSON object and enough strings to get you where you want in the tree.

Hmm, it occurs to me that JWalk.getArray() should return an Iterator<JSONObject>, since the only thing I’ve ever done with a JSON array is traverse it.

I suppose JWalk could be broken out as a separate package, but why bother?

Questions · Why isn’t this built-in to the JSON-API infrastructure?

What are the use-cases for anything more complicated than an list-of-string-keys selector?

If we could agree on a way to pick out a few pieces you need, wouldn’t it be better to send it to the server as part of the query, rather than using it in the client?

If you do get a big blob of JSON and know you only want a couple of pieces, why go through all the parse-time memory-management overhead of building data structures to hold the parts you don’t want?



Contributions

Comment feed for ongoing:Comments feed

From: Dirkjan Ochtman (May 06 2014, at 00:42)

There happens to be a not-idiosyncratic selector-string thingy, as well:

https://tools.ietf.org/html/rfc6901

[link]

From: Janne (May 06 2014, at 01:47)

We probably see rather different uses of JSON, but to me having multiple identical field names (in different subsections) in a file is the rule rather than the exception. Say, an array of records, all with the same fields. In any such case, your solution would not work.

Also, regardless of the merits of the idea in itself, I think part of the beauty of JSON is that it's such a small, simple format to support. I don't think a query language should become part of the standard; we already have XML for such things after all.

[link]

From: Evan Stoner (May 06 2014, at 01:49)

The JSON API project (http://jsonapi.org/format/) is trying to encourage conventions, some of which get at what you're talking about.

They've used both "fields" and "include" keywords to let you thin or fatten the response as you need, and have defined a standard, flattened structure for the response object that eliminates the duplication you can get in heavily nested objects.

I'm hopeful that they'll get some traction; it'd be wonderful to have standard client libraries for every major language that know how to talk to conforming APIs for you.

[link]

From: Grahame Grieve (May 06 2014, at 03:28)

"If you do get a big blob of JSON and know you only want a couple of pieces, why go through all the parse-time memory-management overhead of building data structures to hold the parts you don’t want?"

You've got to parse it somehow. And since order isn't significant, you've got to parse more of it till you know what you have... easiest just to parse to a set of objects (library code)

A standardised JSON query language would be extremely useful. Right now, for instance, I am writing a JSON based exchange standard. We naturally suffer from the problem of rich and comprehensive JSON because we can't narrow the use cases (it's a standard). We can't make a JPath variant up, but the lack of a JPath equivalent keeps many people stuck on XML.

OTOH, if everything people want gets added to JSON, then we'll have to go off and invent a new lightweight exchange protocol again...

[link]

From: Alex Bilbie (May 06 2014, at 03:54)

APIs that I've built lately return the bare minimum fields and then the developer can include additional fields (rather than requesting partial responses).

So in the case of your Keybase example it would always include the "basics" object, and then using an `include` query string parameter (with comma delimited values) you could also request `profile` and `public_keys` object to be include in the response.

e.g. `?include=profile,public_keys`

[link]

From: Christian Harms (May 06 2014, at 04:20)

Because it's the JavaScript Object notation and you get your value by:

b = user.them.public_keys.primary.bundle;

And this is simple?

[link]

From: Manuel Strehl (May 06 2014, at 04:45)

There’s RFC 6901 defining a JSON Pointer language, and its application RFC 6902, JSON Patch. I used both in a medium-sized project, and they work conceptually really well. (Also the Python and JS implementations that I used.)

Your example reads like this:

/them/public_keys/primary/bundle

which is straight-forward.

The advantage over other idiosyncratic scheme languages is the IETF URL to point people to, when someone complains :-) .

[link]

From: Asbjørn Ulsberg (May 06 2014, at 04:58)

In OData, you have the "$select" operator that allows you to dig into an object graph and extract only the bits and pieces you want, which are then returned as a new (generated on the fly) JSON object. We've implemented support for this in Pomona, which makes custom querying through a standard interface (like Microsoft's Linq) easy, intuitive and most of all: a quite enjoyable experience.

[link]

From: GregBildson (May 06 2014, at 05:17)

Generally, the answer these days is to not worry about it unless it becomes a performance bottleneck. APIs like facebook's allow you to select fields. Mongodb queries allow you to select fields. My python json code has 3+ or more versions of "lightweight" to full user objects depending on the situation. But unless you're nesting 1000s of these, no biggie. Your connection probably gets compressed by default which helps with bandwidth.

[link]

From: Ruben (May 06 2014, at 05:28)

your point about thinning down the response is very much valid.

but from the API designers point (from experience) it's really nice if you can reverse proxy cache APIs, even if it's just for a few seconds/minutes to reduce the load on 'hot' URLs.

adding the option to specify what you want as return makes this less efficient (and less easy to implement) and it's easier to just create a /full and /simple endpoint in an attempt to have the best of both worlds

[link]

From: David Rachel (May 06 2014, at 05:34)

Fetching specific sub-paths out of JSON data sounds extremely like JSON Pointer to me.

RFC: http://tools.ietf.org/html/rfc6901

I believe there are libraries for many different environments already. Plus, it standardises a way to use URL fragments to reference specific nodes within a document!

Adding JSON Pointer support to your API seems like it would be pretty trivial:

JWalk.getStringPtr(user, "/them/public_keys/primary/bundle");

[link]

From: Julian Gruber (May 06 2014, at 06:42)

I wrote a node.js module called "binary-extract", which you can find here: https://github.com/segmentio/binary-extract

It takes a piece of binary json and parses only the key you care about, e.g. `parse(blob, "timestamp")`, being a lot faster than serializing the json blob and parsing the whole thing.

The idea should easily be recreated in Java.

[link]

From: Pies (May 06 2014, at 06:53)

How about something like:

try {

userBundle = user.them.public_keys.primary.bundle;

} catch (NodeNotFoundException $exception) {

// handle the exception

}

There isn't much difference for the service provider (because bandwidth is cheap) and it's easier for them to cache.

[link]

From: Jed Parsons (May 06 2014, at 09:00)

JSONSelect is query language for JSON using CSS-style selectors. There's an interactive demo at http://jsonselect.org. Github: https://github.com/lloyd/JSONSelect

[link]

From: Andrew Ovens (May 06 2014, at 09:07)

If you were to use the OData syntax it allows for querying and selecting your result set

[link]

From: Matěj Cepl (May 06 2014, at 09:21)

Was it you or somebody else who was saying that if you ask for validation, XPath, schema, transformation, etc. for JSON than you do it wrong, because that's XML what you are looking for?

[link]

From: FCatalan (May 06 2014, at 09:49)

For some reason I have found Java particularly painful about dealing with JSON.

I picture the most humorless deep gray bureaucrat ever as he watches this hippie entering his office: "Woe unto me if this kid isn't filling every form to the last t".

As awkward as things like mochijson in Erlang can be, the JSON.org stuff in Java is clunkier, and don't get me started about JAXB aka "Hey, there are enterprise thingies floating in my API".

For my current stuff I have settled with flexjson, which at least has clicked somewhat for me.

In fact the custom marshalling through factories is cool: You are still parsing everything, but you can easily deserialize thinner POJOs from fat JSON if you so desire: http://stackoverflow.com/questions/9589644/flexjson-exclude-properties-upon-deserialization

[link]

From: Anant Narayanan (May 06 2014, at 09:57)

The Jackson JSON parser for Java seems to be the most popular these days, and has better tools for handling JSON than most other packages. This includes a "tree traversal" mode (http://wiki.fasterxml.com/JacksonTreeModel) which makes it a little easier to get at the few fields you need. Doesn't help with all the bloat on the wire, though.

[link]

From: William Casarin (May 06 2014, at 10:00)

I also had a similar thought the other day I threw together this javascript library for generating object lenses:

https://github.com/jb55/dot-lens

It even allows magnifying into arrays. I get a list of all the lenses from another library called:

https://github.com/jb55/tableize-array

which flattens a JSON object into a table compatible with dot-lens.

I ended up using these for a json2csv/csv2json library:

https://github.com/jb55/json2csv

lenses ftw!

[link]

From: dthorpe (May 06 2014, at 10:42)

Some REST services actually support (gasp!) URL addressing of JSON complex documents. (not using URI fragments) foo.com/users/12345/name/firstname returns just the firstname field of the JSON user object rooted at foo.com/users/12345

See firebase.com as a living example. Specific doc: https://www.firebase.com/docs/data-structure.html

[link]

From: mario (May 06 2014, at 12:57)

A friend of mine, not me that is, recently just went ahead with a `json_rx_count()` even. For an unwiedly hodgepodge of JSON from a dozen AWIs queried in parallel, still consistent enough for grepping, but only in need of one field each; this worked out more maintainable and resilient than traversing each.

[link]

From: Aaron (May 06 2014, at 13:05)

Where does a web api end and a standard begin?

[link]

From: Cowtowncoder (May 06 2014, at 17:19)

I second suggestion for using JSON Pointer.

Jackson (https://github.com/FasterXML/jackson) for Java, for example, supports it out of the box.

So if you want to use Tree model (code example assumed use of Trees, not data-binding), you'd do:

long id = mapper.readTree(json).at("/some/path/value/id").longValue();

[link]

From: Kevin H (May 06 2014, at 18:23)

I saw several mentions already for https://tools.ietf.org/html/rfc6901 but they all left off the best part.

Have a close look at Section 9: https://tools.ietf.org/html/rfc6901#section-9

[link]

From: Daniil (May 07 2014, at 12:27)

Why not parse it into an object using a library. In Java you can do it using Gson from Google. Are there such libraries in JS?

[link]

From: Eamonn Power (May 09 2014, at 20:19)

I used to see this in XML data layers of applications years ago. As the scope of a feature using a given API call extended, there was a tendency to throw the kitchen sink in there rather than have a separate API call.

The proposed approach of using 'includes' parameters seems reasonable. It sticks to only returning what's needed while still allowing the API dev provide context when its required.

[link]

From: Jason Moiron (May 15 2014, at 21:24)

I implemented basically this exact thing independently in Golang about a year ago, which can be had at github.com/jmoiron/jsonq. The only difference is that I treated all numeric strings (eg. "0") unambiguously as array indexes; I'd imagine that they are used rarely enough as string keys in an object that those cases can be dealt with manually.

[link]

From: Mark (May 26 2014, at 18:59)

Most frameworks these days encourage the use of a single layer of key/value pairs without nested objects in the JSON.

Doing things this way avoids these complexities completely.

[link]

author · Dad
colophon · rights
picture of the day
May 05, 2014
· Technology (90 fragments)
· · Internet (116 more)

By .

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!