The publish/subscribe pattern is central to data in motion — event-driven and messaging-based apps, I mean. I’m increasingly convinced that pub/sub software just isn’t complete without some sort of declarative filtering technology, so that you can subscribe to a huge shared torrent of data and only see the parts of it that you need to process. You could look at everything and write code to reject the data you don’t care about, but it’s nice to write a declarative rule and have the system take care of the filtering for you.

This piece is about data-filtering technology we’ve been cooking up at AWS, and that I’ve personally put a whole lot of work into. The proximate cause for publishing now is that while this feature has been around for a while in the old CloudWatch Events and in SNS, we’re just rolling out all the latest bells and whistles in EventBridge. I want to write about it because it’s different enough from other filtering technologies to be interesting.

EventBridge’s events are delivered in JSON, but this tech ought to apply to any nested JSON-like structured data. The syntax is called “Event Patterns”, and the idea is that the filters don’t look like SQL or really any other popular query language, they look like the events they’re filtering.

To make this concrete, let’s look at a typical event you might encounter on EventBridge:

{
  "version": "0",
  "id": "6a7e8feb-b491-4cf7-a9f1-bf3703467718",
  "detail-type": "EC2 Instance State-change Notification",
  "source": "aws.ec2",
  "account": "111122223333",
  "time": "2017-12-22T18:43:48Z",
  "region": "us-west-1",
  "resources": [
    "arn:aws:ec2:us-west-1:123456789012:instance/ i-1234567890abcdef0"
  ],
  "detail": {
    "instance-id": " i-1234567890abcdef0",
    "state": "terminated"
  }
}

In the following sections, all the examples will match this event.

Event Patterns have the same structure as the Events they match · Suppose you wanted to subscribe only to events from EC2. Here’s the Event Pattern:

{
  "source": [ "aws.ec2" ]
}

The pattern simply quotes the fields you want to match and provides the values you are looking for.

The sample event above, like most events, has a nested structure. Suppose you want to process all instance-termination events:

{
  "source": [ "aws.ec2" ],
  "detail-type": [ "EC2 Instance State-change Notification" ],
  "detail": {
    "state": [ "terminated" ]
  }
}

Only specify the fields you care about · In the example above, you only provide values for three fields: The top-level fields “source” and “detail-type”, and the “state” field inside the “detail” object field. EventBridge ignores all the other fields in the event while applying the filter.

Match values are always in arrays · Note that the value to match is in a JSON array, that is to say surrounded by “[” and “]”. This is so that you can provide multiple values. Suppose you were interested in events from EC2 or Fargate:

{
  "source": [ "aws.ec2", "aws.fargate" ]
}

If the value of the “source” field was an array, that would work too: The pattern would match if the intersection between the pattern array and the event array (both treated as sets) was non-empty.

Ands and Ors · The filter language is a bit surprising in that when you provide multiple possible matches, as immediately above, that’s an OR operation — you match if any one of them does. But when your pattern has multiple fields, that’s an AND because all of the fields have to match. Oddly, this seems to meet people’s needs quite well. And if you really need OR’ed fields, you can post two filters, one with each of the options.

You can match all the JSON data types · Consider the following Auto Scaling event:

{
  "version": "0",
  "id": "3e3c153a-8339-4e30-8c35-687ebef853fe",
  "detail-type": "EC2 Instance Launch Successful",
  "source": "aws.autoscaling",
  "account": "123456789012",
  "time": "2015-11-11T21:31:47Z",
  "region": "us-east-1",
  "resources": [
   ],
  "detail": {
    "eventVersion": "",
    "responseElements": null
   }
}

You can match the “responseElements” field as follows:

{
  "detail": {
    "responseElements": [ null ]
  }
} 

This works for numbers too. Consider the following Macie event (truncated for brevity, and pardon the klunky line breaks, need to make my CSS smarter about code):

{
  "version": "0",
  "id": "3e355723-fca9-4de3-9fd7-154c289d6b59",
  "detail-type": "Macie Alert",
  "source": "aws.macie",
  "account": "123456789012",
  "time": "2017-04-24T22:28:49Z",
  "region": "us-east-1",
  "resources": [
    "arn:aws:macie:us-east-1:123456789012:trigger/trigger_id/alert/alert_id",
    "arn:aws:macie:us-east-1:123456789012:trigger/trigger_id"
  ],
  "detail": {
    "notification-type": "ALERT_CREATED",
    "name": "Scanning bucket policies",
    "tags": [
      "Custom_Alert",
      "Insider"
    ],
    "url": "https://lb00.us-east-1.macie.aws.amazon.com/111122223333/posts/alert_id",
    "alert-arn": "arn:aws:macie:us-east-1:123456789012:trigger/trigger_id/alert/alert_id",
    "risk-score": 80,
    "trigger": {
      "rule-arn": "arn:aws:macie:us-east-1:123456789012:trigger/trigger_id",
      "alert-type": "basic",
      "created-at": "2017-01-02 19:54:00.644000",
      "description": "Alerting on failed enumeration of large number of bucket policies",
      "risk": 8
    },
    "created-at": "2017-04-18T00:21:12.059000",
    // truncated for brevity
    . . . 

Suppose your security policies require you to react when Macie reports anything with a risk score of 80 and a trigger risk of 8:

{
  "source": [ "aws.macie" ],
  "detail": {
    "risk-score": [ 80 ],
    "trigger": {
      "risk": [ 8 ]
    }
  }
}

Numbers work properly · While the pattern above works, it doesn’t work that well, because it only matches against the JSON exactly as stated. So for example, if the programmer generating that Macie event changed their code so that it emitted "risk-score": 80.0, the rule wouldn’t match.

Fortunately, EventBridge has numeric matching built-in. This would allow you to implement security policies much more flexibly and reliably. For example, here’s a pattern that matches a trigger risk value of 8 (even if it’s expressed as “8.000” or “8.0e0”, and any risk-score value over 50 but less than or equal to 100.

{
  "source": [ "aws.macie" ],
  "detail-type": [ "Macie Alert" ],
  "detail": {
    "risk-score": [ { "numeric": [ ">", 50, "<=", 100 ] } ],
    "trigger": {
      "risk": [ { "numeric": [ "=", 8 ] } ]
    }
  }
}

This kind of numeric matching is useful, but is limited to values between -1.0e9 and +1.0e9 inclusive, with 15 digits of precision, that is to say six digits to the right of the decimal point.

Note that match expressions go into arrays just like literal values. Match expressions and literals can be mixed up as much as you want.

Prefix matching · Suppose you want to process all the auto-scaling events from AWS’s European regions. There’s a match expression for that.

{
  "source": [ "aws.autoscaling" ],
  "region": [ { "prefix": "eu-" } ]
}

IP address matching · For one reason or another, it’s not that uncommon to encounter IP addresses in event fields. Since the CIDR notation was explicitly designed to match IP ranges, it works well as a filter syntax:

{
  "caller-ip": [ { "cidr": "192.168.100.0/22" } ] 
}

It works with IPv6 too!

{
  "caller-ip": [ { "cidr": "2001:db8::/120" } ] 
}

(Confession: I’ve never seen one of these in the wild in our event ecosystem.)

(Confession: I don’t think the CIDR capability has quite finished deploying as of the publishing of this blog.)

Existence Matching · Suppose you wanted to make an ElasticSearch full-text index of a bunch of events. To do this, you might want to select all the events that have a description field:

{
  "detail": {
    "description": [ { "exists": true } ],
  }
}

You could also use { "exists": false } to select events that don’t contain some particular field.

Anything-but matching · Sometimes you want to exclude rather than include a particular field value. Suppose you want to process all the events except those that are CloudTrail reports of API calls:

{
  "detail-type": [ { "anything-but": "AWS API Call via CloudTrail" } ]
}   

The anything-but match expression can blacklist literal strings or also a list of values, but the list has to contain either all strings or all numbers. Suppose you wanted to see all the events except those that came from EC2 or S3:

{
  "source": [ { "anything-but": [ "aws.ec2", "aws.s3" ] } ]
}

The anything-but match expression can also use a nested match expression to exclude prefixes. For example, EventBridge’s main event bus has a huge number of events coming from all the AWS services, but you can also inject your own events using the PutEvents API. You can distinguish AWS’s events and process only your own because the “source” field in all the AWS events begins with the string “aws.”.

{
  "source": [ { "anything-but": { "prefix": "aws." } } ]
}

The future? · People seem to like this idiom a lot — EventBridge has a huge number of customers. Also, we’ve got a super-efficient implementation currently processing an immense number of events per second, that I’d like to open-source. We keep getting requests for more filtering features (wildcards or regexes are an obvious direction) and have managed to keep new stuff rolling out steadily.

There’s one problem: It’s not SQL-flavored, and a high proportion of software people sort of think in SQL when they want to select data. There have been attempts to extend SQL to be a good citizen of the world of non-relational data; the one I’m most familiar with, because it’s recent and from AWS, is PartiQL.

I’m biased in that I’ve never actually liked SQL, but I recognize that this is not exactly a majority opinion. Anyhow, it’s on my mind.


author · Dad · software · colophon · rights
picture of the day
December 18, 2019
· Technology (85 fragments)
· · Cloud (21 more)

By

I am an employee of Amazon.com, but the opinions expressed here are my own, and no other party necessarily agrees with them.

A full disclosure of my professional interests is on the author page.