Here’s a nice little RFC describing a nice little trick that might even be useful. Short form: People like to write JSON into logfiles. Text sequences make reading them easier and more robust.
The trick · You precede each JSON log entry with little-known Unicode character U+001E INFORMATION SEPARATOR TWO, and you stick a newline (0xA) after it.
This makes it easy for a log reader to pick the byte stream apart into chunks it can hand its friendly local JSON parser, and cleanly survive the not-terribly-uncommon scenario where something blew chunks while logging and left behind a truncated/malformed entry.
I’m not going to reproduce the RFC’s narrative; it’s perfectly transparent. I think this might actually find fairly widespread use. I’ll be showing it to some folks here at AWS, maybe someone will be interested. Boy, do we ever have a lot of logs.
Software archaeology · What makes this mildly amusing is that U+001E has a secret identity; it’s also an ASCII “Control character” called RS for Record Separator. I and every other text-encoding geek have long thought the control characters an irritating waste of space; XML 1.0 flatly forbid using them because they don’t mean anything and have no use.
Except for, this one is being put to something like its original use, all these years later. Are there any ASCIInauts still living who’d know if there’s a story behind RS?
Kudos for Nico Williams for getting this done.