In building Quamina, I needed to compute the union of two finite automata (FAs). I remembered from some university course 100 years ago that this was possible in theory, so I went looking for the algorithm, but was left unhappy. The descriptions I found tended to be hyper-academic, loaded with mathematical notation that I found unhelpful, and didn’t describe an approach that I thought a reasonable programmer would reasonably take. The purpose of this ongoing entry is to present a programmer-friendly description of the problem and of the algorithm I adopted, with the hope that some future developer, facing the same problem, will have a more satisfying search experience.

There is very little math in this discussion (a few subscripts), and no circles-and-arrows pictures. But it does have working Go code.

Finite automata? · I’m not going to rehash the theory of FAs (often called state machines). In practice the purpose of an FA is to match (or fail to match) some input against some pattern. What the software does when the input matches the pattern (or doesn’t) isn’t relevant to our discussion today. Usually the inputs are strings and the patterns are regular expressions or equivalent. In practice, you compile a pattern into an FA, and then you go through the input, character by character, trying to traverse the FA to find out whether it matches the input.

An FA has a bunch of states, and for each state there can be a list of input symbols that lead to transitions to other states. What exactly I mean by “input symbol” turns out to be interesting and affects your choice of algorithm, but let’s ignore that for now.

The following statements apply:

One state is designated as the “start state” because, well, that’s where you start.

Some states are called “final”, and reaching them means you’ve matched one or more patterns. In Quamina’s FAs, each state has an extra field (usually empty) saying “if you got here you matched P*, yay!”, where P* is a list of labels for the (possibly more than one) patterns you matched.

It is possible that you’re in a state and for some particular input, you transition to more than one other state. If this is true, your FA is

*nondeterministic*, abbreviated NFA.It is possible that a state can have one or more “epsilon transitions”, ones that you can just take any time, not requiring any particular input. (I wrote about this in Epsilon Love.) Once again, if this is true, you’ve got an NFA. If neither this statement nor the previous are true, it’s a

*deterministic*finite automaton, DFA.

The discussion here works for NFAs, but lots of interesting problems can be solved with DFAs, which are simpler and faster, and this algorithm works there too.

Union? ·
If I have `FA1`

that matches “foo” and `FA2`

that matches “bar”, then their union,
`FA1 ∪ FA2`

, matches both “foo”
and “bar”. In practice Quamina often computes the union of a large number of FAs, but it does so a pair at a time, so we’re
only going to worry about the union of two FAs.

The academic approach · There are plenty of Web pages and YouTubes covering this. Most of them are full of Greek characters and math symbols. They go like this:

You have two FAs, call them

`A`

and`B`

.`A`

has states`A`

, …_{1}`A`

,_{maxA}`B`

has`B`

, …_{1}`B`

_{maxB}The union contains all the states in

`A`

, all the states in`B`

, and the “product” of`A`

and`B`

, which is to say states you could call`A`

,_{1}B_{1}`A`

,_{1}B_{2}`A`

,_{2}B_{1}`A`

, …_{2}B_{2}`A`

._{maxA}B_{maxB}For each state

`A`

, you work out its transitions by looking at the transitions of the two states being combined. For some input symbol, if_{X}B_{Y}`A`

has a transition to_{X}`A`

but_{XX}`B`

has no transition, then the combined state just has the A transition. The reverse for an input where_{Y}`B`

has a transition but_{Y}`A`

doesn’t. And if_{X}`A`

transitions to_{X}`A`

and_{XX}`B`

transitions to_{Y}`B`

, then the transition is to_{YY}`A`

._{XX}B_{YY}Now you’ll have a lot of states, and it usually turns out that many of them aren’t reachable. But there are plenty of algorithms to filter those out. You’re done, you’ve computed the union and

`A`

is its start state!_{1}B_{1}

Programmer-think · If you’re like me, the idea of computing all the states, then throwing out the unreachable ones, feels wrong. So here’s what I suggest, and has worked well in practice for Quamina:

First, merge

`A`

and_{1}`B`

to make your new start state_{1}`A`

. Here’s how:_{1}B_{1}If an input symbol causes no transitions in either

`A`

or_{1}`B`

, it also doesn’t cause any in_{1}`A`

._{1}B_{1}If an input symbol causes a transition in

`A`

to_{1}`A`

but no transition in_{X}`B`

, then you adopt_{1}`A`

into the union, and any other_{X}`A`

states it points to, and any they point to, and so on.And of course if

`B`

has a transition to_{1}`B`

but_{Y}`A`

doesn’t transition, you flip it the other way, adopting_{1}`B`

and its descendents._{Y}And if

`A`

transitions to_{1}`A`

and_{X}`B`

transitions to_{1}`B`

, then you adopt a new state_{Y}`A`

, which you compute recursively the way you just did for_{X}B_{Y}`A`

. So you’ll never compute anything that’s not reachable._{1}B_{1}

I could stop there. I think that’s enough for a competent developers to get the idea? But it turns out there are a few details, some of them interesting. So, let’s dig in.

“Input symbol”? · The academic discussion of FAs is very abstract on this subject, which is fair enough, because when you’re talking about how to build, or traverse, or compute the union of FAs, the algorithm doesn’t depend very much on what the symbols actually are. But when you’re writing code, it turns out to matter a lot.

In practice, I’ve done a lot of work with FAs over the years, and I’ve only ever seen four things used as input symbols to drive them. They are:

Unicode “characters” represented by code points, integers in the range 0…1,114,111 inclusive.

UTF-8 bytes, which have values in the range 0…244 inclusive.

UTF-16 values, unsigned 16-bit integers. I’ve only ever seen this used in Java programs because that’s what its native

`char`

type is. You probably don’t want to do this.Enum values, small integers with names, which tend to come in small collections.

As I said, this is all I’ve seen, but 100% of the FAs that I’ve seen automatically generated and subject to set-arithmetic operations like Union are based on UTF-8. And that’s what Quamina uses, so that’s what I’m going to use in the rest of this discussion.

Code starts here ·
This comes from Quamina’s
nfa.go. We’re going to look at the function
`mergeFAStates`

, which implements the merge-two-states logic described above.

Lesson: This process can lead to a lot of wasteful work. Particularly if either or both of the states transition on ranges of
values like `0…9`

or `a…z`

. So we only want to do the work merging any pair of states once, and we want
there only to be one merged value. Thus we start with a straightforward memo-ization.

func mergeFAStates(state1, state2 *faState, keyMemo map[faStepKey]*faState) *faState { // try to memo-ize mKey := faStepKey{state1, state2} combined, ok := keyMemo[mKey] if ok { return combined }

Now some housekeeping. Remember, I noted above that any state might contain a signal saying that arriving here
means you’ve matched pattern(s). This is called `fieldTransitions`

, and the merged state obviously has to
match all the things that either of the merged states match. Of course, in the vast majority of cases neither merged state
matched anything and so this is a no-op.

fieldTransitions := append(state1.fieldTransitions, state2.fieldTransitions...)

Since our memo-ization attempt came up empty, we have to allocate an empty structure for the new merged state, and add it to the memo-izer.

combined = &faState{table: newSmallTable(), fieldTransitions: fieldTransitions} keyMemo[mKey] = combined

Here’s where it gets interesting. The algorithm talks about looking at the inputs that cause transitions in the states we’re merging. How do you find them? Well, in the case where you’re transitioning on UTF-8 bytes, since there are only 244 values, why not do the simplest thing that could possibly work and just check each byte value?

Every Quamina state contains a table that encodes the byte transitions, which operates like the Go construct
`map[byte]state`

. Those tables are implemented in
a compact data structure optimized for fast traversal. But for doing this
kind of work, it’s easy to “unpack” them into a fixed-sized table; in Go, `[244]state`

. Let’s do that for the states
we’re merging and for the new table we’re building.

u1 := unpackTable(state1.table) u2 := unpackTable(state2.table) var uComb unpackedTable

`uComb`

is where we’ll fill in the merged transitions.

Now we’ll run through all the possible input values; `i`

is the byte value, `next1`

and `next2`

are the transitions on that value. In practice, `next1`

and `next2`

are going to be null most of the time.

for i, next1 := range u1 { next2 := u2[i]

Here’s where we start building up the new transitions in the unpacked array `uComb`

.

For many values of `i`

, you can avoid actually merging the states to create a new one. If the transition is the
same in both input FAs, or if either of them are null, or if the transitions for this value of `i`

are the same as
for the last value. This is all about avoiding unnecessary work and the `switch`

/`case`

structure is the
result of a bunch of profiling and optimization.

switch { case next1 == next2: // no need to merge uComb[i] = next1 case next2 == nil: // u1 must be non-nil uComb[i] = next1 case next1 == nil: // u2 must be non-nil uComb[i] = next2 case i > 0 && next1 == u1[i-1] && next2 == u2[i-1]: // dupe of previous step - happens a lot uComb[i] = uComb[i-1]

If none of these work, we haven’t been able to avoid merging the two states. We do that by a recursive call to invoke all the logic we just discussed.

There is a complication. The automaton might be nondeterministic, which means that there might be more than one transition
for some byte value. So the data structure actually behaves like `map[byte]*faNext`

, where `faNext`

is a
wrapper for a list of states you can transition to.

So here we’ve got a nested loop to recurse for each possible combination of transitioned-to states that can occur on this byte value. In a high proportion of cases the FA is deterministic, so there’s only one state from each FA being merged and this nested loop collapses to a single recursive call.

default: // have to recurse & merge var comboNext []*faState for _, nextStep1 := range next1.states { for _, nextStep2 := range next2.states { comboNext = append(comboNext, mergeFAStates(nextStep1, nextStep2, keyMemo)) } } uComb[i] = &faNext{states: comboNext} } }

We’ve filled up the unpacked state-transition table, so we’re almost done. First, we have to compress it into its optimized-for-traversal form.

combined.table.pack(&uComb)

Remember, if the FA is nondeterministic, each state can have “epsilon” transitions which you can follow any time without requiring any particular input. The merged state needs to contain all the epsilon transitions from each input state.

combined.table.epsilon = append(state1.table.epsilon, state2.table.epsilon...) return combined }

And, we’re done. I mean, we are once all those recursive calls have finished crawling through the states being merged.

Is that efficient? · As I said above, this is an example of a “simplest thing that could possibly work” design. Both the recursion and the unpack/pack sequence are kind of code smells, suggesting that this could be a pool of performance quicksand.

But apparently not. I ran a benchmark where I added 4,000 patterns synthesized from the Wordle word-list; each of them looked like this:

`{"allis": { "biggy": [ "ceils", "daisy", "elpee", "fumet", "junta", … `

(195 more).

This produced a *huge* deterministic FA with about 4.4 million states, with the addition of these hideous worst-case
patterns running at 500/second. Good enough for rock ’n’ roll.

How about nondeterministic FAs? I went back to that Wordle source and, for each of its 12,959 words, added a pattern with a random wildcard; here are three of them:

`{"x": [ {"shellstyle": "f*ouls" } ] }`

{"x": [ {"shellstyle": "pa*sta" } ] }

{"x": [ {"shellstyle": "utter*" } ] }

This produced an NFA with 46K states, the addition process ran at 70K patterns/second.

Sometimes the simplest thing that could possibly work, works.