Pattern-matching-based AST Evaluation as Prisms

Posted on 2021-10-10 by ubikium

Introduction

When performing evaluation on an Abstract Syntax Tree (AST) for an Embedded Domain-Specific Language (EDSL), it would be nice to have a “pattern matching” feature to define semantic rules. For example, “for an expression of the form Add M N, if M and N are integers, it evaluates to M+N”.

This blog post will introduce a way to use prism-like functions to express such complex pattern-matching-based evaluation rules. For most of the parts, the reader need not to know anything about optics. Just basic concepts of algebraic data types, functors, and typeclasses would suffice.

In Haskell, if you write the syntax of the EDSL with algebraic data types, basic pattern matching is already built in. For a simple imperative language defined by the data types:

data AExp = ALit Int | AVar String
          | AAdd AExp AExp

data BExp = BLit Bool
            -- less than or equal to
          | BLe  AExp AExp
          | BAnd BExp BExp

data Stmt = SAssign String AExp
            -- If-then-else
          | SIte BExp Stmt Stmt

We can pattern match on an Stmt value with

case e of
  SAssign v expr      -> ...
  SIte    b stmt stmt -> ...

These are all good for top-level definitions, however, it can be quite cumbersome to pattern match something that’s deep in the structure. Say we want to write a renaming function. It accepts a function from variable names to variable names, like \s -> "new_" ++ s. Then it will change a Stmt tree to a new tree with variable names replaced. This is usually defined recursively:

subs_s :: (String -> String) -> Stmt -> Stmt
subs_s rho s = case s of
  SAssign v a  -> SAssign (rho v) (subs_a rho a)
  SIte b s1 s2 -> SIte (subs_b rho b) (subs_s rho s1) (subs_s rho s2)

subs_a :: (String -> String) -> AExp -> AExp
subs_a rho a = case a of
  ALit _     -> a
  AVar v     -> AVar (rho v)
  AAdd a1 a2 -> AAdd (subs_a rho a1) (subs_a rho a2)

subs_b :: (String -> String) -> BExp -> BExp
subs_b rho b -> case bexp of
  BLit _     -> b
  BLe  a1 a2 -> BLe  (subs_a rho a1) (subs_a rho a2)
  BAnd b1 b2 -> BAnd (subs_b rho b1) (subs_b rho b2)

If we then proceed to write another function to perform application. This function will take a function from variables to integer numbers. And then it takes a Stmt tree and returns a new tree with variables replaced with the numbers.

At first glance, we seem to be able to write a more general function subs_s_var that takes a substitution function of the type String -> AExp to unify them. But remember we are in an imperative language. The substitution from variables to values should not happen to the target variable for an assignment statement (the so-called “left value”). Therefore, the implementation is slightly different:

apply_s :: (String -> Int) -> Stmt -> Stmt
apply_s rho s = case s of
  SAssign v a  -> SAssign v (apply_a rho a)
  SIte b s1 s2 -> SIte (apply_b rho b) (apply_s rho s1) (apply_s rho s2)

apply_a :: (String -> Int) -> AExp -> AExp
apply_a rho a = case a of
  ALit _     -> a
  AVar v     -> ALit (rho v)
  AAdd a1 a2 -> AAdd (apply_a rho a1) (apply_a rho a2)

apply_b :: (String -> Int) -> BExp -> BExp
apply_b rho b -> case bexp of
  BLit _     -> b
  BLe  a1 a2 -> BLe  (apply_a rho a1) (apply_a rho a2)
  BAnd b1 b2 -> BAnd (apply_b rho b1) (apply_b rho b2)

Notice in the function apply_s, we leave the v in SAssign v a unchanged. In contrast, the function subs_s changes v to rho v. Actually in the application case, the only place where we replace the variable with a value is in apply_a: if we find an AVar v, we change it to ALit (rho v).

From these two sets of functions, we can see that the pattern we want to match is not “every appearance of a variable”, but “every appearance of a variable within an expression (either an AExp or a BExp)”. This is an example of the need for nuanced patterns. The pattern “every variable in an expression” can be abstracted from the common parts of subs_s and apply_s:

aexp_avar :: (String -> AExp) -> AExp -> AExp
aexp_avar rho a = case a of
  ALit _     -> a
  AVar v     -> rho v
  AAdd a1 a2 -> AAdd (aexp_avar rho a1) (aexp_avar rho a2)

bexp_avar :: (String -> AExp) -> BExp -> BExp
bexp_avar rho b -> case bexp of
  BLit _     -> b
  BLe  a1 a2 -> BLe  (aexp_avar rho a1) (aexp_avar rho a2)
  BAnd b1 b2 -> BAnd (bexp_avar rho b1) (bexp_avar rho b2)

These two functions together express the idea of pattern matching a variable inside an AExp or a BExp, when such a variable is found, it will be changed into an AExp by applying rho, and a new tree is constructed accordingly.

The functions subs_s and apply_s can share the same pattern matching path when they go from AExp to AVar. So they only need to specify how to go from a Stmt to an AExp.

For subs_s, we may find AVar in the target and value of SAssign, as well as in the condition and clauses of SIte. On the other hand, for apply_s, we will go from a Stmt to an AExp by leaving out the assignment target part.

subs_s :: (String -> String) -> Stmt -> Stmt
subs_s rho s = case s of
  SAssign v a  -> SAssign (rho v) (subs_a rho a)
  SIte b s1 s2 -> SIte (subs_b rho b) (subs_s rho s1) (subs_s rho s2)

apply_s :: (String -> Int) -> Stmt -> Stmt
apply_s rho s = case s of
  SAssign v a  -> SAssign v (apply_a rho a)
  SIte b s1 s2 -> SIte (apply_b rho b) (apply_s rho s1) (apply_s rho s2)

The above functions specifies how to go from a Stmt to an AExp or a BExp. Then we only need to specify how to go from an AExp or a BExp to a AVar and what to do when such a pattern matching succeeds. This can be easily done by:

subs_a :: (String -> String) -> AExp -> AExp
subs_a rho = aexp_avar (AVar . rho)

subs_b :: (String -> String) -> BExp -> BExp
subs_b rho = bexp_avar (AVar . rho)

apply_a :: (String -> Int) -> AExp -> AExp
apply_a rho = aexp_avar (ALit . rho)

apply_b :: (String -> Int) -> BExp -> BExp
apply_b rho = bexp_avar (ALit . rho)

Let’s review what we’ve done here. We break up the process of going from a Stmt to AVar into three steps:

We specify how to go from a Stmt to AExp and BExp for each function, because we are taking different pattern matching paths to expressions.
Then we abstract a common pattern to go from AExp and BExp to AVar. This pattern matching path is the same for both functions. So we come up with aexp_avar and bexp_avar to do just that, then subs_a, apply_a, and other functions are implemented using these patterns.
Finally, we plug the pieces back to subs_s and apply_s, to get the whole pattern matching path.

In this way, we can write less code when implementing subs_a and so on. Furthermore, actions are decoupled with the pattern matching process. aexp_avar and bexp_avar can be viewed as individual stored pattern matching paths. They leave out what to do if the pattern matching succeeds, as long as it gets an AExp in the end to replace the matched variable.

As is with all decouplings, we also get benefits when the structure changes. Suppose we want to add a new constructor to the definition of BExp:

data BExp = ...
          | BNeg BExp

We only need to change how we go from a BExp to an AExp:

bexp_avar :: (String -> AExp) -> BExp -> BExp
bexp_avar rho b -> case b of
  ...     -> ...
  BNeg b' -> BNeg (bexp_avar rho b')

All other functions remain the same. After all, we are only adding a new way to go from a BExp to an AExp. This shouldn’t change how to go from a Stmt to an AExp and other paths.

These “stored pattern matching paths” also combine with each other if the types match. Say we want to do some evaluation, but we want to restrict the process to only the condition of each “if-then-else” block, maybe for some optimization reasons. We already know how to get from an AExp to a variable, now we only need to figure out how to get from a Stmt to AExp in if-then-else conditions.

stmt_ite_aexp :: (String -> AExp) -> Stmt -> Stmt
stmt_ite_aexp rho s = case s of
  SAssign _ _  -> s
  SIte b s1 s2 -> SIte (bexp_avar rho b) s1 s2

Notice how we “stringed” the two paths together: from Stmt to AExp, then from AExp to AVar.

In conclusion, we have obtained a good of way of doing pattern matching for our EDSL, with the following advantages:

It is expressive enough for complex patterns.
Many patterns can be reused, due to the decoupling of actions from the pattern matching process.
When the syntax changes, the patterns can be changed in a small, predictable, and local scope.
Patterns are composable.

However, this representation still has two shortcomings:

We don’t have the concept of “failure” in the process of pattern matching. A new tree is returned, but we don’t know whether any pattern matching took place. We may need such information, for example, when trying to answer whether there are no more patterns to match, i.e. whether there are no applicable evaluation rules.
The composition is not clear. In stmt_ite_a, it is not obvious we are going from a Stmt to a condition BExp of an “if-then-else” block, and then from an BExp to AExp. This is because the sub-pattern-matching is inserted in the middle of the function.

We now take the same idea, but use a new way to represent “pattern matching functions”, which aims to address these two problems.

Targeted language

To demonstrate the advantages of the representation, we expand the above toy language to a more complete yet still small EDSL.

We target a simple imperative language called IMP from a K-framework’s tutorial. The language embedded in Haskell looks like this:

data AExp v = ALit Int | AVar v
            | ANeg Int
            | ADiv (AExp v) (AExp v)
            | AAdd (AExp v) (AExp v)

data BExp v = BLit Bool
              -- less than or equal to
            | BLe  (AExp v) (AExp v)
            | BNeg (BExp v)
            | BAnd (BExp v) (BExp v)

-- empty block is Nothing
type Block v = Maybe (Stmt v)

data Stmt v = SBlock  (Block v)
            | SAssign v (AExp v)
              -- If-then-else
            | SIte    (BExp v) (Block v) (Block v)
            | SWhile  (BExp v) (Block v)
            | SSeq    (Stmt v) (Stmt v)
              -- identity of SSeq
            | SUnit

Compared with the example from the introduction, IMP has a richer set of structures. You can write quite a lot of programs with IMP. For some example programs written with this embedded IMP, see this page. The corresponding actual IMP programs that can be interpreted by K-framework tools like kompiler can be found in test files.

Method

Representation of pattern matchings

We will now proceed to define pattern matchings, and use a type to express our definition.

Definition. A pattern matching for a algebraic data type is a function, when given a (possibly failing) substitution of the substructure, can give a substitution of the whole structure.

For example, when pattern matching on AVar in AExp, what we are saying is that given a substitution for AVar, we know how to construct an AExp as a result for this substitution.

This obviously doesn’t cover every meaning of the term “pattern matching”, but here we consider it to be a good fit for our purpose of AST evaluation. We can use the following type to express our definition:

type MatchInto s t a b = (a -> Maybe b) -> s -> Maybe t

This type means if given a possibly failing substitution of the substructure a to b (i.e. a function a -> Maybe b), we can construct a possibly failing substitution for the overall structure s and change it into t.

For our evaluation process, an evaluation rule always preserves the type of the value (or more precisely, the reducible expression), i.e. no rule can change an AExp to a BExp. Therefore, we can use a less general type:

type MatchInto' s a = MatchInto  s s a a
type MatchSelf  a   = MatchInto' a   a

It can be illustrating to expand the type definitions:

type MatchInto' s a = (a -> Maybe a) -> s -> Maybe s
type MatchSelf  a   = (a -> Maybe a) -> a -> Maybe a

The difference between the two types is that MatchInto' may move into a different type to continue the pattern matching, while MatchSelf says that the substructure has the same type of the matched pattern.

For example, an AExp in the BLe AExp AExp will change the type of pattern matching from a BExp to an AExp, while the AExp in AAdd AExp AExp will not change. This is quite handy when we discuss recursively pattern matching into substructures, since a successful pattern matching will return a same type, we can continue to match into the substructures recursively.

Simple patten matching functions

Let’s see some examples to get a hang of it.

aLit :: MatchSelf (AExp v)
aLit f e = case e of
  ALit _ -> f e
  _      -> Nothing

This means aLit is a pattern matching from an AExp v to AExp v. When the constructor is ALit, the matching succeeds, it will apply the substitution function f, which will produce a result of Maybe (AExp v). We use this substitution result to replace the matched AExp v in the whole tree. Otherwise, the pattern matching fails and returns a Nothing.

Note: it is not guaranteed that if the pattern matching succeeds, the substitution will also succeed and return a Just value. The substitution may be partial, for example:

keep5 :: AExp v   -> Maybe (AExp v)
keep5    (ALit 5) =  Just (ALit 5)
keep5    _        =  Nothing

keep5 will fail on ALit 6 when performing the pattern matching, i.e. aLit keep5 (ALit 6) will return a Nothing, although the pattern matching on the constructor succeeds.

One might be tempted to conclude that for each constructor of the sum type, we will produce exactly one pattern matching function. However, we are free to choose onto which substructure to apply the substitution function, for example, the whole structure, one of the arguments, or some subset of the arguments. An example of this distinction is as follows:

bNeg :: MatchSelf (BExp v)
bNeg f e = case e of
  BNeg _ -> f e
  _      -> Nothing

bNegArg :: MatchSelf (BExp v)
bNegArg f e = case e of
  BNeg e' -> BNeg <$> f e'
  _       -> Nothing

Although both functions match on the same constructor BNeg, the pattern bNeg applies the substitution on the whole structure, while bNegArg only applies it into the subexpression, and wrap it up with the same BNeg constructor. The former function is more general because it can choose to change the constructor to any BExp v constructor, e.g.

negLit :: BExp v -> Maybe (BExp v)
negLit = bNeg $ \b -> case b of
  BNeg (BLit b) -> Just $ BLit (not b)
  _             -> Nothing

Then bNeg negLit (BNeg (BLit True)) returns Just (BLit False), while bNegArg negLit (BNeg (BLit True)) will return Nothing, because it expects to see a BNeg in the argument of the top structure, but gets a BLit instead. On the other hand bNegArg negLit (BNeg (BNeg (BLit True))) will succeed by returning Just (BNeg (BLit False)).

Composing pattern matching functions

Recall the type of MatchInto':

type MatchInto  s t a b = (a -> Maybe b) -> s -> Maybe t
type MatchInto' s   a   = MatchInto s s a a

Observe the expanded type of type MatchInto' s a = (a -> Maybe a) -> (s -> Maybe s). We can easily see if we have aToB :: MatchInto' a b and bToC :: MatchInto' b c, then the composed pattern aToC = aToB . bToC has the type MatchInto' a c, which will pattern match into the overall big structure a and get matches of the smallest substructure c.

For example, even if we don’t know the implementation, just by reading the types, we will be able to tell whether and how the pattern matchings will compose.

aAddArg        :: MatchInto' (AExp v) (AExp v)
aLit           :: MatchInto'          (AExp v) (AExp v)
aAddArg . aLit :: MatchInto' (AExp v)          (AExp v)

Supposedly, the composed pattern will match arithmetic literals in arguments of AAdd. For example, (aAddArg . aLit) (\_ -> Just (ALit 0)) (AAdd (ALit 1) (AVar "x")) will return Just (AAdd (ALit 0) (AVar "x")).

The following composed pattern matching function will match an ADiv expression which appears in an argument of a BLe expression.

bLeArg        :: MatchInto' (BExp v) (AExp v)
aDiv          :: MatchInto'          (AExp v) (AExp v)
bLeArg . aDiv :: MatchInto' (BExp v)          (AExp v)

We can also compose more than one pattern matching functions, to get a more complex pattern:

sIteCond               :: MatchInto' (Stmt v) (BExp v)
bNeg                   :: MatchInto'          (BExp v) (BExp v)
bLit                   :: MatchInto'                   (BExp v) (BExp v)
sIteCond . bNeg . bLit :: MatchInto' (Stmt v)                   (BExp v)

For those who are familiar with the concept, this should immediately remind you of the power of optics. In fact, MatchInto is indeed similar to a kind of optics called prisms. We will return to that in the end. But right now, neither the definition nor the implementation requires any knowledge about optics. And we have a quite good intuition of the meaning of each part.

Pattern matching combinators

It is straightforward to manually write out all the pattern matching functions for the entire syntax tree. However, there are some patterns that can be extracted into combinators to facilitate the process.

A basic example is a binary constructor, by which we mean a constructor with two substructures of the same type. This is only a contingent feature of the IMP language, it may not be useful to other EDSL. However, it’s easy to come up with new pattern matching paths and combine them together.

For binary constructors, what we want to do is to match into both of the substructures. If either of them succeeds, we will replace that succeeded substructure with the result of applying the substitution. The other substructure remains the same. And we construct a new tree using the same constructor.

binaryConstr :: (a -> Maybe a) -> (a -> a -> b) -> a -> a -> Maybe b
binaryConstr f constr a1 a2 =
  let aL = flip constr a2 <$> f a1
      aR = constr a1 <$> f a2
   in aL <|> aR

Here we are using the Alternative instance of Maybe. The definition is as follows:

instance Alternative Maybe where
  empty = Nothing
  Nothing <|> r = r
  l       <|> _ = l

In short, this means we will go through some Maybe values, if we meet a Nothing, the search continues, otherwise, we’ve found a Just value, then we stop the search and return the Just value. If all we’ve met along the way are Nothing values, we will return a Nothing value in the end.

An example of binaryConstr is:

bLeArg :: MatchInto' (BExp v) (AExp v)
bLeArg f = \e -> case e of
  BLe e1 e2 -> binaryConstr f BLe e1 e2
  _         -> Nothing

bLeArg matches into two arguments of a BLe constructor.

Remember the discussion about matching a thing itself and matching its substructures? If the matched substructure has the same type of the overall structure, then we can recursively match into it. For binary constructors, this provide a different combinator:

binaryRec :: (a -> Maybe a) -> (a -> a -> a) -> a -> a -> a -> Maybe a
binaryRec f constr a1 a2 a =
  let aL = flip constr a2 <$> f a1
      aR = constr a1 <$> f a2
      aS = f a
   in aL <|> aR <|> aS

We add the result of matching the term itself aS = f a to the possible results, by connecting aS to other results with <|>.

If we really want, we can write such a pattern:

aAddWithArgs :: MatchSelf (AExp v)
aAddWithArgs f e = case e of
  AAdd e1 e2 -> binaryRec f AAdd e1 e2 e
  _          -> Nothing

However, as we will explain later, we choose to use a more general way for recursively pattern matching substructures. For now, we only provide two patterns for the constructor AAdd:

aAdd :: MatchSelf (AExp v)
aAdd f e = case e of
  AAdd _ _ -> f e
  _        -> Nothing

aAddArg :: MatchSelf (AExp v)
aAddArg f e = case e of
  AAdd e1 e2 -> binaryConstr f AAdd e1 e2
  _          -> Nothing

aAdd matches the structure itself, while aAddArg matches exactly the two arguments without touching the overall structure.

Recursive pattern matching

A recursively pattern matching into substructures process can be done through the following steps:

Try to match itself.
If 1 fails, find the substructures with the same type.
Recursively match into substructures.

This combinator is provided as a typeclass.

class RecursiveMatch a where
  recursiveMatch :: MatchSelf a
  recursiveMatch f e = f e <|> subMatch (recursiveMatch f) e

  subMatch       :: MatchSelf a

recursiveMatch does exactly steps 1-3. For any RecursiveMatch instance, we have to specify how to find substructures.

For example, recall the definition of AExp:

data AExp v = ALit Int | AVar v
            | ANeg Int
            | ADiv (AExp v) (AExp v)
            | AAdd (AExp v) (AExp v)

For a substructure to also have the type AExp, it must belong to either the arguments of ADiv or the arguments of AAdd. Therefore, the instance for AExp can be defined as:

instance RecursiveMatch (AExp v) where
  subMatch :: MatchSelf (AExp v)
  subMatch f s = aDivArg f s <|> aAddArg f s

And now when we can use recursiveMatch from the typeclass to get all AExp in an AExp with an arbitrary depth of unfolding (and also itself). This is a technique called anamorphism, which can be intuitively understood as a mechanism to unfold over a structure. This is why we choose to provide only the patterns to match substructures instead of using binaryRec.

One last combinator in our toolbox, the possibly combinator will try pattern matching functions from a list, if any of them succeeds (i.e. returns a Just value), then the whole pattern matching succeeds. This is like concat from Data.Foldable, except we are using the Alternative instance, so what we want is asum:

possibly :: [MatchInto' s a] -> MatchInto' s a
possibly patterns f s = asum . map (\p -> p f s) $ patterns

Then the recursively matching turns into listing substructures:

instance RecursiveMatch (AExp v) where
  subMatch :: MatchSelf (AExp v)
  subMatch = possibly
    [ aDivArg
    , aAddArg
    ]

instance RecursiveMatch (BExp v) where
  subMatch :: MatchSelf (BExp v)
  subMatch = possibly
    [ bNegArg
    , bAndArg
    ]

instance RecursiveMatch (Stmt v) where
  subMatch :: MatchSelf (Stmt v)
  subMatch = possibly
    [ sBlock . block
    , sSeqFirst
    ]

Now we can combine simple patterns to create complex patterns. For example:

aExpAVar :: MatchInto' (AExp v) (AExp v)
aExpAVar = recursiveMatch . aVar

This pattern aExpAVar will match to all variables inside an AExp, no matter how deep it is in the structure.

Similarly, we can find all AExp inside a BExp by bExpAExp:

bExpAExp :: MatchInto' (BExp v) (AExp v)
bExpAExp = recursiveMatch . bLeArg

For more complex patterns, we may need to mix combinators and paths:

stmtBExp :: MatchInto' (Stmt v) (BExp v)
stmtBExp = recursiveMatch . possibly
  [ sIteCond
  ] . recursiveMatch

In the implementation of stmtAExp, since both Stmt and AExp can recursively match substructures with the same type, we add recursiveMatch to both ends to thoroughly explore the tree.

The recursiveMatch on the first line of the definition matches first. It uses the Stmt instance to recursively find all Stmt in a Stmt. And the middle pattern will lead us from a Stmt to a BExp. Although it’s unnecessary to use possibly here, we use a list to remind us to add more cases when the definition is expanded, because this pattern matching path is more likely to be subject to change. Finally, the recursiveMatch on the last line of the definition will find all BExp inside a BExp. In conclusion, stmtBExp will return all BExp inside a Stmt.

Similarly, we have stmtAExp to pattern match from a Stmt to all AExp inside it.

stmtAExp :: MatchInto' (Stmt v) (AExp v)
stmtAExp = recursiveMatch . possibly
  [ sAssignArg
  , stmtBExp . bExpAExp
  ] . recursiveMatch

And recall the example in the introduction, where we want to get from a Stmt to all the variables inside it. Well, we can implement that now with:

stmtAVar :: MatchInto' (Stmt v) (AExp v)
stmtAVar = stmtAExp . aVar

That is, we match a Stmt to find all AExp, and then match them to find all AVar.

Interestingly, this is not the only way to match from AExp to AVar. We can also write:

stmtAVar' :: MatchInto' (Stmt v) (AExp v)
stmtAVar' = stmtAExp . aExpAVar

That is, we match from a Stmt to AExp, and then go from AExp to all AVar. The difference is that, inside an AExp, stmtAVar will give up if it sees the top level constructor is not AVar, because it is using the aVar pattern, while stmtAVar' will keep searching into substructures.

We are not leaving out any possible matches if we use the first approach, since stmtAExp will look into the substructures later. This can be proven by the following reasoning: for any AVar, stmtAExp will guarantee to match AVar as AExp, then the last match must be on the constructor AVar.

In the case of a mismatch, stmtAExp will once again search into substructures of AExp, without knowing aExpAVar has already failed. So the first implementation is correct and more efficient. This gives us an example of thinking about the search order and optimization when dealing with these patterns. It takes a bit of effort to reason, so you should design your patterns carefully.

Evaluation

With these patterns ready at hand, the evaluation process is straightforward.

Suppose by evaluation, we only need to transform the AST (meaning no states are mutated), then an evaluation rule can be defined as:

type Rule = Stmt -> Maybe Stmt

This is just half of the type of MatchInto. The reason is that we define a pattern matching to be a thing that asks what to do when a pattern matching succeeds, and it will tell you want will happen as a result. Well, an evaluation rule will specifies what to do when a pattern matching succeeds, so the only thing left is what will happen.

The type basically means that if the pattern matching succeeds and the substitution function returns a Just value, we will get a new tree wrapped in Just. Otherwise, we will get a Nothing.

For example, to evaluate integer addition:

ruleIntAdd :: Rule
ruleIntAdd = stmtAExp $ \e -> case e of
  (AAdd (ALit i1) (ALit i2)) -> Just $ ALit (i1 + i2)
  _                          -> Nothing

Of course, you can change the pattern to reflect different evaluation orders. For example, the BAnd can be made into only strict in the first argument, to exhibit a “short circuit” behaviour:

ruleBoolAnd :: Rule
ruleBoolAnd = stmtBExp . bAnd $ \e -> case e of
  BAnd (BLit True)  e' -> Just e'
  BAnd (BLit False) _  -> Just $ BLit False
  _                    -> Nothing

Of course, for an imperative language, it’s not enough to evaluate expressions. We need to maintain a state to be passed between statements. We can provide a pattern into the state-decorated expressions and combine it seamlessly with normal patterns. (For those familiar with optics, we obtain a prism by composing a lens with a prism.)

For a complete example of evaluation rules for IMP, see this page.

Relation to Prisms

A Prism is a kind of optics. To some extent, it can be understood as a way to extract a value from a sum type. This is very close to what we have defined to be a pattern matching process.

The definition of Prism encoded in the so-called van Laarhoven representation is as follows:

type Prism s t a b = forall p f. (Choice p, Applicative f)
  => p a (f b) -> p s (f t)

As a matter of fact, (->) is an instance of Choice, and Maybe is an instance of Applicative. If we instantiate p to (->) and f to Maybe, we get the exact type of MatchInto.

Therefore, although the type MatchInto is not a prism, it is composable with other prisms. The reason is that a prism must work with every p and f that are instances of Choice and Applicative respectively. Then, a prism can always be instantiated to (->) and Maybe, so that it can compose with pattern matching functions. For example,

_Just        :: Prism (Maybe a) (Maybe b) a b
_Just . aLit :: (AExp v -> Maybe (AExp v))
             -> Maybe (AExp v) -> Maybe (Maybe (AExp v))

This allows us to apply a pattern matching function within a Maybe value and the pattern matching is only performed in a Just case.

I will not go into the details of the implementation of prisms. However, I hope the above introduction of pattern matching functions should at least help to build some intuition for prisms and their composition.

From the this perspective, it should be possible to write prisms for constructors of everyday sum types. In fact, for a class of sum types, some optics packages can use template Haskell to automatically generate prisms that cover all the above pattern matching functions. To do this with the lens package, after introducing the related package into scope, you only need to add the following lines after the type definition.

makePrisms ''AExp
makePrisms ''BExp
makePrisms ''Stmt

For a constructor like ALit, a prism with the name _ALit will be generated. Not only can it perform everything we do with the pattern matching function aLit, it has the full power of a complete prism. The implication is that we can use a lot of the functions from the lens package, to support a wider range of actions with these pattern matching paths.

_ALit :: Prism (AExp v) (AExp v) Int Int

We can query if there is a value inside a type with the ^? operator.

>>> (ALit 5) ^? _ALit
Just 5
>>> (ALit 5) ^? _AVar
Nothing

We can apply a function to the value pointed by the prism with the function over:

>>> over (_AAdd . _1 . _ALit) (+1) (AAdd (ALit 5) (AVar "x"))
(AAdd (ALit 6) (AVar "x"))

Here _1 will select the first component of the arguments of _AAdd.

These may seem confusing if you are not familiar with optics. If that’s the case, you can use what we have introduced before as a reference. Because of their additional power, optics packages provide many more functions and operators to interact with them. To use those tools, one need to get used to the specific symbols, but at a user level, it’s not that magical. Basically you need to specific three things:

How to get to the place? Usually by composition of prisms and other optics.
What to do after we get there? We can view the value, apply a function to change the value, and do many other things.
In addition, prisms also allow you to choose how to pack the structure back. In our application, we only allow replacement to the same place, but for different optics, you may be able to keep the structure, fold it with a special typeclass instance, or make a list of all matched values.

Conclusions

In this blog post, we introduced a way to perform pattern matchings for AST evaluation. The core of this representation lies in the type definition type MatchInto s t a b = (a -> Maybe b) -> (s -> Maybe t). We applied this method to the IMP language and introduced some combinators that allow us to write complex patterns.

We then explained the relation between our method and prisms, namely, prisms are more general structures that come with many tools. Our method can be replaced by prisms. And the original definition composes with prisms to some extent.

A complete IMP interpreter that utilises the aforementioned method of pattern-matching-based tree evaluation can be found at this repository, along with some descriptions and tests.

Ubikium