When
· Twenties
· · 2024
· · · December
· · · · 18 (1 entry)
QRS: Matching “.” in UTF-8 ·
Back on December 13th, I posted a challenge on Mastodon: In a simple UTF-8 byte-driven finite automaton, how many states does it take to match the regular-expression construct “.
”, i.e. “any character”? Commenter Anthony Williams responded, getting it almost right I think, but I found his description a little hard to understand. In this piece I’m going to dig into what .
actually means, and then how many states you need to match it.
[Update: Lots more on this subject and some of the material below is arguably wrong, but just “arguably”; see Dot-matching Redux.] ... [10 comments]
By Tim Bray.
The opinions expressed here
are my own, and no other party
necessarily agrees with them.
A full disclosure of my
professional interests is
on the author page.
I’m on Mastodon!