Pattern-matching-based AST Evaluation as Prisms
Posted on 2021-10-10 by ubikiumIntroduction
When performing evaluation on an Abstract Syntax Tree (AST) for an Embedded Domain-Specific Language (EDSL), it would be nice to have a “pattern matching” feature to define semantic rules.
For example, “for an expression of the form Add M N
, if M
and N
are integers, it evaluates to M+N
”.
This blog post will introduce a way to use prism-like functions to express such complex pattern-matching-based evaluation rules. For most of the parts, the reader need not to know anything about optics. Just basic concepts of algebraic data types, functors, and typeclasses would suffice.
In Haskell, if you write the syntax of the EDSL with algebraic data types, basic pattern matching is already built in. For a simple imperative language defined by the data types:
data AExp = ALit Int | AVar String
| AAdd AExp AExp
data BExp = BLit Bool
-- less than or equal to
| BLe AExp AExp
| BAnd BExp BExp
data Stmt = SAssign String AExp
-- If-then-else
| SIte BExp Stmt Stmt
We can pattern match on an Stmt
value with
case e of
SAssign v expr -> ...
SIte b stmt stmt -> ...
These are all good for top-level definitions, however, it can be quite cumbersome to pattern match something that’s deep in the structure.
Say we want to write a renaming function.
It accepts a function from variable names to variable names, like \s -> "new_" ++ s
.
Then it will change a Stmt
tree to a new tree with variable names replaced.
This is usually defined recursively:
subs_s :: (String -> String) -> Stmt -> Stmt
subs_s rho s = case s of
SAssign v a -> SAssign (rho v) (subs_a rho a)
SIte b s1 s2 -> SIte (subs_b rho b) (subs_s rho s1) (subs_s rho s2)
subs_a :: (String -> String) -> AExp -> AExp
subs_a rho a = case a of
ALit _ -> a
AVar v -> AVar (rho v)
AAdd a1 a2 -> AAdd (subs_a rho a1) (subs_a rho a2)
subs_b :: (String -> String) -> BExp -> BExp
subs_b rho b -> case bexp of
BLit _ -> b
BLe a1 a2 -> BLe (subs_a rho a1) (subs_a rho a2)
BAnd b1 b2 -> BAnd (subs_b rho b1) (subs_b rho b2)
If we then proceed to write another function to perform application.
This function will take a function from variables to integer numbers.
And then it takes a Stmt
tree and returns a new tree with variables replaced with the numbers.
At first glance, we seem to be able to write a more general function subs_s_var
that takes a substitution function of the type String -> AExp
to unify them.
But remember we are in an imperative language.
The substitution from variables to values should not happen to the target variable for an assignment statement (the so-called “left value”).
Therefore, the implementation is slightly different:
apply_s :: (String -> Int) -> Stmt -> Stmt
apply_s rho s = case s of
SAssign v a -> SAssign v (apply_a rho a)
SIte b s1 s2 -> SIte (apply_b rho b) (apply_s rho s1) (apply_s rho s2)
apply_a :: (String -> Int) -> AExp -> AExp
apply_a rho a = case a of
ALit _ -> a
AVar v -> ALit (rho v)
AAdd a1 a2 -> AAdd (apply_a rho a1) (apply_a rho a2)
apply_b :: (String -> Int) -> BExp -> BExp
apply_b rho b -> case bexp of
BLit _ -> b
BLe a1 a2 -> BLe (apply_a rho a1) (apply_a rho a2)
BAnd b1 b2 -> BAnd (apply_b rho b1) (apply_b rho b2)
Notice in the function apply_s
, we leave the v
in SAssign v a
unchanged.
In contrast, the function subs_s
changes v
to rho v
.
Actually in the application case, the only place where we replace the variable with a value is in apply_a
: if we find an AVar v
, we change it to ALit (rho v)
.
From these two sets of functions, we can see that the pattern we want to match is not “every appearance of a variable”, but “every appearance of a variable within an expression (either an AExp
or a BExp
)”.
This is an example of the need for nuanced patterns.
The pattern “every variable in an expression” can be abstracted from the common parts of subs_s
and apply_s
:
aexp_avar :: (String -> AExp) -> AExp -> AExp
aexp_avar rho a = case a of
ALit _ -> a
AVar v -> rho v
AAdd a1 a2 -> AAdd (aexp_avar rho a1) (aexp_avar rho a2)
bexp_avar :: (String -> AExp) -> BExp -> BExp
bexp_avar rho b -> case bexp of
BLit _ -> b
BLe a1 a2 -> BLe (aexp_avar rho a1) (aexp_avar rho a2)
BAnd b1 b2 -> BAnd (bexp_avar rho b1) (bexp_avar rho b2)
These two functions together express the idea of pattern matching a variable inside an AExp
or a BExp
, when such a variable is found, it will be changed into an AExp
by applying rho
, and a new tree is constructed accordingly.
The functions subs_s
and apply_s
can share the same pattern matching path when they go from AExp
to AVar
.
So they only need to specify how to go from a Stmt
to an AExp
.
For subs_s
, we may find AVar
in the target and value of SAssign
, as well as in the condition and clauses of SIte
.
On the other hand, for apply_s
, we will go from a Stmt
to an AExp
by leaving out the assignment target part.
subs_s :: (String -> String) -> Stmt -> Stmt
subs_s rho s = case s of
SAssign v a -> SAssign (rho v) (subs_a rho a)
SIte b s1 s2 -> SIte (subs_b rho b) (subs_s rho s1) (subs_s rho s2)
apply_s :: (String -> Int) -> Stmt -> Stmt
apply_s rho s = case s of
SAssign v a -> SAssign v (apply_a rho a)
SIte b s1 s2 -> SIte (apply_b rho b) (apply_s rho s1) (apply_s rho s2)
The above functions specifies how to go from a Stmt
to an AExp
or a BExp
.
Then we only need to specify how to go from an AExp
or a BExp
to a AVar
and what to do when such a pattern matching succeeds.
This can be easily done by:
subs_a :: (String -> String) -> AExp -> AExp
subs_a rho = aexp_avar (AVar . rho)
subs_b :: (String -> String) -> BExp -> BExp
subs_b rho = bexp_avar (AVar . rho)
apply_a :: (String -> Int) -> AExp -> AExp
apply_a rho = aexp_avar (ALit . rho)
apply_b :: (String -> Int) -> BExp -> BExp
apply_b rho = bexp_avar (ALit . rho)
Let’s review what we’ve done here.
We break up the process of going from a Stmt
to AVar
into three steps:
- We specify how to go from a
Stmt
toAExp
andBExp
for each function, because we are taking different pattern matching paths to expressions. - Then we abstract a common pattern to go from
AExp
andBExp
toAVar
. This pattern matching path is the same for both functions. So we come up withaexp_avar
andbexp_avar
to do just that, thensubs_a
,apply_a
, and other functions are implemented using these patterns. - Finally, we plug the pieces back to
subs_s
andapply_s
, to get the whole pattern matching path.
In this way, we can write less code when implementing subs_a
and so on.
Furthermore, actions are decoupled with the pattern matching process.
aexp_avar
and bexp_avar
can be viewed as individual stored pattern matching paths.
They leave out what to do if the pattern matching succeeds, as long as it gets an AExp
in the end to replace the matched variable.
As is with all decouplings, we also get benefits when the structure changes.
Suppose we want to add a new constructor to the definition of BExp
:
data BExp = ...
| BNeg BExp
We only need to change how we go from a BExp
to an AExp
:
bexp_avar :: (String -> AExp) -> BExp -> BExp
bexp_avar rho b -> case b of
... -> ...
BNeg b' -> BNeg (bexp_avar rho b')
All other functions remain the same.
After all, we are only adding a new way to go from a BExp
to an AExp
.
This shouldn’t change how to go from a Stmt
to an AExp
and other paths.
These “stored pattern matching paths” also combine with each other if the types match.
Say we want to do some evaluation, but we want to restrict the process to only the condition of each “if-then-else” block, maybe for some optimization reasons.
We already know how to get from an AExp
to a variable, now we only need to figure out how to get from a Stmt
to AExp
in if-then-else
conditions.
stmt_ite_aexp :: (String -> AExp) -> Stmt -> Stmt
stmt_ite_aexp rho s = case s of
SAssign _ _ -> s
SIte b s1 s2 -> SIte (bexp_avar rho b) s1 s2
Notice how we “stringed” the two paths together: from Stmt
to AExp
, then from AExp
to AVar
.
In conclusion, we have obtained a good of way of doing pattern matching for our EDSL, with the following advantages:
- It is expressive enough for complex patterns.
- Many patterns can be reused, due to the decoupling of actions from the pattern matching process.
- When the syntax changes, the patterns can be changed in a small, predictable, and local scope.
- Patterns are composable.
However, this representation still has two shortcomings:
- We don’t have the concept of “failure” in the process of pattern matching. A new tree is returned, but we don’t know whether any pattern matching took place. We may need such information, for example, when trying to answer whether there are no more patterns to match, i.e. whether there are no applicable evaluation rules.
- The composition is not clear.
In
stmt_ite_a
, it is not obvious we are going from aStmt
to a conditionBExp
of an “if-then-else” block, and then from anBExp
toAExp
. This is because the sub-pattern-matching is inserted in the middle of the function.
We now take the same idea, but use a new way to represent “pattern matching functions”, which aims to address these two problems.
Targeted language
To demonstrate the advantages of the representation, we expand the above toy language to a more complete yet still small EDSL.
We target a simple imperative language called IMP from a K-framework’s tutorial. The language embedded in Haskell looks like this:
data AExp v = ALit Int | AVar v
| ANeg Int
| ADiv (AExp v) (AExp v)
| AAdd (AExp v) (AExp v)
data BExp v = BLit Bool
-- less than or equal to
| BLe (AExp v) (AExp v)
| BNeg (BExp v)
| BAnd (BExp v) (BExp v)
-- empty block is Nothing
type Block v = Maybe (Stmt v)
data Stmt v = SBlock (Block v)
| SAssign v (AExp v)
-- If-then-else
| SIte (BExp v) (Block v) (Block v)
| SWhile (BExp v) (Block v)
| SSeq (Stmt v) (Stmt v)
-- identity of SSeq
| SUnit
Compared with the example from the introduction, IMP has a richer set of structures.
You can write quite a lot of programs with IMP.
For some example programs written with this embedded IMP, see this page.
The corresponding actual IMP programs that can be interpreted by K-framework tools like kompiler
can be found in test files.
Method
Representation of pattern matchings
We will now proceed to define pattern matchings, and use a type to express our definition.
Definition. A pattern matching for a algebraic data type is a function, when given a (possibly failing) substitution of the substructure, can give a substitution of the whole structure.
For example, when pattern matching on AVar
in AExp
, what we are saying is that given a substitution for AVar
, we know how to construct an AExp
as a result for this substitution.
This obviously doesn’t cover every meaning of the term “pattern matching”, but here we consider it to be a good fit for our purpose of AST evaluation. We can use the following type to express our definition:
type MatchInto s t a b = (a -> Maybe b) -> s -> Maybe t
This type means if given a possibly failing substitution of the substructure a
to b
(i.e. a function a -> Maybe b
), we can construct a possibly failing substitution for the overall structure s
and change it into t
.
For our evaluation process, an evaluation rule always preserves the type of the value (or more precisely, the reducible expression), i.e. no rule can change an AExp
to a BExp
.
Therefore, we can use a less general type:
type MatchInto' s a = MatchInto s s a a
type MatchSelf a = MatchInto' a a
It can be illustrating to expand the type definitions:
type MatchInto' s a = (a -> Maybe a) -> s -> Maybe s
type MatchSelf a = (a -> Maybe a) -> a -> Maybe a
The difference between the two types is that MatchInto'
may move into a different type to continue the pattern matching, while MatchSelf
says that the substructure has the same type of the matched pattern.
For example, an AExp
in the BLe AExp AExp
will change the type of pattern matching from a BExp
to an AExp
, while the AExp
in AAdd AExp AExp
will not change.
This is quite handy when we discuss recursively pattern matching into substructures, since a successful pattern matching will return a same type, we can continue to match into the substructures recursively.
Simple patten matching functions
Let’s see some examples to get a hang of it.
aLit :: MatchSelf (AExp v)
aLit f e = case e of
ALit _ -> f e
_ -> Nothing
This means aLit
is a pattern matching from an AExp v
to AExp v
.
When the constructor is ALit
, the matching succeeds, it will apply the substitution function f
, which will produce a result of Maybe (AExp v)
.
We use this substitution result to replace the matched AExp v
in the whole tree.
Otherwise, the pattern matching fails and returns a Nothing
.
Note: it is not guaranteed that if the pattern matching succeeds, the substitution will also succeed and return a Just
value.
The substitution may be partial, for example:
keep5 :: AExp v -> Maybe (AExp v)
keep5 (ALit 5) = Just (ALit 5)
keep5 _ = Nothing
keep5
will fail on ALit 6
when performing the pattern matching, i.e. aLit keep5 (ALit 6)
will return a Nothing
, although the pattern matching on the constructor succeeds.
One might be tempted to conclude that for each constructor of the sum type, we will produce exactly one pattern matching function. However, we are free to choose onto which substructure to apply the substitution function, for example, the whole structure, one of the arguments, or some subset of the arguments. An example of this distinction is as follows:
bNeg :: MatchSelf (BExp v)
bNeg f e = case e of
BNeg _ -> f e
_ -> Nothing
bNegArg :: MatchSelf (BExp v)
bNegArg f e = case e of
BNeg e' -> BNeg <$> f e'
_ -> Nothing
Although both functions match on the same constructor BNeg
, the pattern bNeg
applies the substitution on the whole structure, while bNegArg
only applies it into the subexpression, and wrap it up with the same BNeg
constructor.
The former function is more general because it can choose to change the constructor to any BExp v
constructor, e.g.
negLit :: BExp v -> Maybe (BExp v)
negLit = bNeg $ \b -> case b of
BNeg (BLit b) -> Just $ BLit (not b)
_ -> Nothing
Then bNeg negLit (BNeg (BLit True))
returns Just (BLit False)
, while bNegArg negLit (BNeg (BLit True))
will return Nothing
, because it expects to see a BNeg
in the argument of the top structure, but gets a BLit
instead.
On the other hand bNegArg negLit (BNeg (BNeg (BLit True)))
will succeed by returning Just (BNeg (BLit False))
.
Composing pattern matching functions
Recall the type of MatchInto'
:
type MatchInto s t a b = (a -> Maybe b) -> s -> Maybe t
type MatchInto' s a = MatchInto s s a a
Observe the expanded type of type MatchInto' s a = (a -> Maybe a) -> (s -> Maybe s)
.
We can easily see if we have aToB :: MatchInto' a b
and bToC :: MatchInto' b c
, then the composed pattern aToC = aToB . bToC
has the type MatchInto' a c
, which will pattern match into the overall big structure a
and get matches of the smallest substructure c
.
For example, even if we don’t know the implementation, just by reading the types, we will be able to tell whether and how the pattern matchings will compose.
aAddArg :: MatchInto' (AExp v) (AExp v)
aLit :: MatchInto' (AExp v) (AExp v)
aAddArg . aLit :: MatchInto' (AExp v) (AExp v)
Supposedly, the composed pattern will match arithmetic literals in arguments of AAdd
.
For example, (aAddArg . aLit) (\_ -> Just (ALit 0)) (AAdd (ALit 1) (AVar "x"))
will return Just (AAdd (ALit 0) (AVar "x"))
.
The following composed pattern matching function will match an ADiv
expression which appears in an argument of a BLe
expression.
bLeArg :: MatchInto' (BExp v) (AExp v)
aDiv :: MatchInto' (AExp v) (AExp v)
bLeArg . aDiv :: MatchInto' (BExp v) (AExp v)
We can also compose more than one pattern matching functions, to get a more complex pattern:
sIteCond :: MatchInto' (Stmt v) (BExp v)
bNeg :: MatchInto' (BExp v) (BExp v)
bLit :: MatchInto' (BExp v) (BExp v)
sIteCond . bNeg . bLit :: MatchInto' (Stmt v) (BExp v)
For those who are familiar with the concept, this should immediately remind you of the power of optics.
In fact, MatchInto
is indeed similar to a kind of optics called prisms.
We will return to that in the end.
But right now, neither the definition nor the implementation requires any knowledge about optics.
And we have a quite good intuition of the meaning of each part.
Pattern matching combinators
It is straightforward to manually write out all the pattern matching functions for the entire syntax tree. However, there are some patterns that can be extracted into combinators to facilitate the process.
A basic example is a binary constructor, by which we mean a constructor with two substructures of the same type. This is only a contingent feature of the IMP language, it may not be useful to other EDSL. However, it’s easy to come up with new pattern matching paths and combine them together.
For binary constructors, what we want to do is to match into both of the substructures. If either of them succeeds, we will replace that succeeded substructure with the result of applying the substitution. The other substructure remains the same. And we construct a new tree using the same constructor.
binaryConstr :: (a -> Maybe a) -> (a -> a -> b) -> a -> a -> Maybe b
binaryConstr f constr a1 a2 =
let aL = flip constr a2 <$> f a1
aR = constr a1 <$> f a2
in aL <|> aR
Here we are using the Alternative
instance of Maybe
.
The definition is as follows:
instance Alternative Maybe where
empty = Nothing
Nothing <|> r = r
l <|> _ = l
In short, this means we will go through some Maybe
values, if we meet a Nothing
, the search continues, otherwise, we’ve found a Just
value, then we stop the search and return the Just
value.
If all we’ve met along the way are Nothing
values, we will return a Nothing
value in the end.
An example of binaryConstr
is:
bLeArg :: MatchInto' (BExp v) (AExp v)
bLeArg f = \e -> case e of
BLe e1 e2 -> binaryConstr f BLe e1 e2
_ -> Nothing
bLeArg
matches into two arguments of a BLe
constructor.
Remember the discussion about matching a thing itself and matching its substructures? If the matched substructure has the same type of the overall structure, then we can recursively match into it. For binary constructors, this provide a different combinator:
binaryRec :: (a -> Maybe a) -> (a -> a -> a) -> a -> a -> a -> Maybe a
binaryRec f constr a1 a2 a =
let aL = flip constr a2 <$> f a1
aR = constr a1 <$> f a2
aS = f a
in aL <|> aR <|> aS
We add the result of matching the term itself aS = f a
to the possible results, by connecting aS
to other results with <|>
.
If we really want, we can write such a pattern:
aAddWithArgs :: MatchSelf (AExp v)
aAddWithArgs f e = case e of
AAdd e1 e2 -> binaryRec f AAdd e1 e2 e
_ -> Nothing
However, as we will explain later, we choose to use a more general way for recursively pattern matching substructures.
For now, we only provide two patterns for the constructor AAdd
:
aAdd :: MatchSelf (AExp v)
aAdd f e = case e of
AAdd _ _ -> f e
_ -> Nothing
aAddArg :: MatchSelf (AExp v)
aAddArg f e = case e of
AAdd e1 e2 -> binaryConstr f AAdd e1 e2
_ -> Nothing
aAdd
matches the structure itself, while aAddArg
matches exactly the two arguments without touching the overall structure.
Recursive pattern matching
A recursively pattern matching into substructures process can be done through the following steps:
- Try to match itself.
- If 1 fails, find the substructures with the same type.
- Recursively match into substructures.
This combinator is provided as a typeclass.
class RecursiveMatch a where
recursiveMatch :: MatchSelf a
recursiveMatch f e = f e <|> subMatch (recursiveMatch f) e
subMatch :: MatchSelf a
recursiveMatch
does exactly steps 1-3.
For any RecursiveMatch
instance, we have to specify how to find substructures.
For example, recall the definition of AExp
:
data AExp v = ALit Int | AVar v
| ANeg Int
| ADiv (AExp v) (AExp v)
| AAdd (AExp v) (AExp v)
For a substructure to also have the type AExp
, it must belong to either the arguments of ADiv
or the arguments of AAdd
.
Therefore, the instance for AExp
can be defined as:
instance RecursiveMatch (AExp v) where
subMatch :: MatchSelf (AExp v)
subMatch f s = aDivArg f s <|> aAddArg f s
And now when we can use recursiveMatch
from the typeclass to get all AExp
in an AExp
with an arbitrary depth of unfolding (and also itself).
This is a technique called anamorphism, which can be intuitively understood as a mechanism to unfold over a structure.
This is why we choose to provide only the patterns to match substructures instead of using binaryRec
.
One last combinator in our toolbox, the possibly
combinator will try pattern matching functions from a list, if any of them succeeds (i.e. returns a Just
value), then the whole pattern matching succeeds.
This is like concat
from Data.Foldable
, except we are using the Alternative
instance, so what we want is asum
:
possibly :: [MatchInto' s a] -> MatchInto' s a
possibly patterns f s = asum . map (\p -> p f s) $ patterns
Then the recursively matching turns into listing substructures:
instance RecursiveMatch (AExp v) where
subMatch :: MatchSelf (AExp v)
subMatch = possibly
[ aDivArg
, aAddArg
]
instance RecursiveMatch (BExp v) where
subMatch :: MatchSelf (BExp v)
subMatch = possibly
[ bNegArg
, bAndArg
]
instance RecursiveMatch (Stmt v) where
subMatch :: MatchSelf (Stmt v)
subMatch = possibly
[ sBlock . block
, sSeqFirst
]
Now we can combine simple patterns to create complex patterns. For example:
aExpAVar :: MatchInto' (AExp v) (AExp v)
aExpAVar = recursiveMatch . aVar
This pattern aExpAVar
will match to all variables inside an AExp
, no matter how deep it is in the structure.
Similarly, we can find all AExp
inside a BExp
by bExpAExp
:
bExpAExp :: MatchInto' (BExp v) (AExp v)
bExpAExp = recursiveMatch . bLeArg
For more complex patterns, we may need to mix combinators and paths:
stmtBExp :: MatchInto' (Stmt v) (BExp v)
stmtBExp = recursiveMatch . possibly
[ sIteCond
] . recursiveMatch
In the implementation of stmtAExp
, since both Stmt
and AExp
can recursively match substructures with the same type, we add recursiveMatch
to both ends to thoroughly explore the tree.
The recursiveMatch
on the first line of the definition matches first.
It uses the Stmt
instance to recursively find all Stmt
in a Stmt
.
And the middle pattern will lead us from a Stmt
to a BExp
.
Although it’s unnecessary to use possibly
here, we use a list to remind us to add more cases when the definition is expanded, because this pattern matching path is more likely to be subject to change.
Finally, the recursiveMatch
on the last line of the definition will find all BExp
inside a BExp
.
In conclusion, stmtBExp
will return all BExp
inside a Stmt
.
Similarly, we have stmtAExp
to pattern match from a Stmt
to all AExp
inside it.
stmtAExp :: MatchInto' (Stmt v) (AExp v)
stmtAExp = recursiveMatch . possibly
[ sAssignArg
, stmtBExp . bExpAExp
] . recursiveMatch
And recall the example in the introduction, where we want to get from a Stmt
to all the variables inside it.
Well, we can implement that now with:
stmtAVar :: MatchInto' (Stmt v) (AExp v)
stmtAVar = stmtAExp . aVar
That is, we match a Stmt
to find all AExp
, and then match them to find all AVar
.
Interestingly, this is not the only way to match from AExp
to AVar
.
We can also write:
stmtAVar' :: MatchInto' (Stmt v) (AExp v)
stmtAVar' = stmtAExp . aExpAVar
That is, we match from a Stmt
to AExp
, and then go from AExp
to all AVar
.
The difference is that, inside an AExp
, stmtAVar
will give up if it sees the top level constructor is not AVar
, because it is using the aVar
pattern, while stmtAVar'
will keep searching into substructures.
We are not leaving out any possible matches if we use the first approach, since stmtAExp
will look into the substructures later.
This can be proven by the following reasoning: for any AVar
, stmtAExp
will guarantee to match AVar
as AExp
, then the last match must be on the constructor AVar
.
In the case of a mismatch, stmtAExp
will once again search into substructures of AExp
, without knowing aExpAVar
has already failed.
So the first implementation is correct and more efficient.
This gives us an example of thinking about the search order and optimization when dealing with these patterns.
It takes a bit of effort to reason, so you should design your patterns carefully.
Evaluation
With these patterns ready at hand, the evaluation process is straightforward.
Suppose by evaluation, we only need to transform the AST (meaning no states are mutated), then an evaluation rule can be defined as:
type Rule = Stmt -> Maybe Stmt
This is just half of the type of MatchInto
.
The reason is that we define a pattern matching to be a thing that asks what to do when a pattern matching succeeds, and it will tell you want will happen as a result.
Well, an evaluation rule will specifies what to do when a pattern matching succeeds, so the only thing left is what will happen.
The type basically means that if the pattern matching succeeds and the substitution function returns a Just
value, we will get a new tree wrapped in Just
.
Otherwise, we will get a Nothing
.
For example, to evaluate integer addition:
ruleIntAdd :: Rule
ruleIntAdd = stmtAExp $ \e -> case e of
(AAdd (ALit i1) (ALit i2)) -> Just $ ALit (i1 + i2)
_ -> Nothing
Of course, you can change the pattern to reflect different evaluation orders.
For example, the BAnd
can be made into only strict in the first argument, to exhibit a “short circuit” behaviour:
ruleBoolAnd :: Rule
ruleBoolAnd = stmtBExp . bAnd $ \e -> case e of
BAnd (BLit True) e' -> Just e'
BAnd (BLit False) _ -> Just $ BLit False
_ -> Nothing
Of course, for an imperative language, it’s not enough to evaluate expressions. We need to maintain a state to be passed between statements. We can provide a pattern into the state-decorated expressions and combine it seamlessly with normal patterns. (For those familiar with optics, we obtain a prism by composing a lens with a prism.)
For a complete example of evaluation rules for IMP, see this page.
Relation to Prisms
A Prism
is a kind of optics.
To some extent, it can be understood as a way to extract a value from a sum type.
This is very close to what we have defined to be a pattern matching process.
The definition of Prism
encoded in the so-called van Laarhoven representation is as follows:
type Prism s t a b = forall p f. (Choice p, Applicative f)
=> p a (f b) -> p s (f t)
As a matter of fact, (->)
is an instance of Choice
, and Maybe
is an instance of Applicative
.
If we instantiate p
to (->)
and f
to Maybe
, we get the exact type of MatchInto
.
Therefore, although the type MatchInto
is not a prism, it is composable with other prisms.
The reason is that a prism must work with every p
and f
that are instances of Choice
and Applicative
respectively.
Then, a prism can always be instantiated to (->)
and Maybe
, so that it can compose with pattern matching functions.
For example,
_Just :: Prism (Maybe a) (Maybe b) a b
_Just . aLit :: (AExp v -> Maybe (AExp v))
-> Maybe (AExp v) -> Maybe (Maybe (AExp v))
This allows us to apply a pattern matching function within a Maybe
value and the pattern matching is only performed in a Just
case.
I will not go into the details of the implementation of prisms. However, I hope the above introduction of pattern matching functions should at least help to build some intuition for prisms and their composition.
From the this perspective, it should be possible to write prisms for constructors of everyday sum types. In fact, for a class of sum types, some optics packages can use template Haskell to automatically generate prisms that cover all the above pattern matching functions. To do this with the lens package, after introducing the related package into scope, you only need to add the following lines after the type definition.
makePrisms ''AExp
makePrisms ''BExp
makePrisms ''Stmt
For a constructor like ALit
, a prism with the name _ALit
will be generated.
Not only can it perform everything we do with the pattern matching function aLit
, it has the full power of a complete prism.
The implication is that we can use a lot of the functions from the lens
package, to support a wider range of actions with these pattern matching paths.
_ALit :: Prism (AExp v) (AExp v) Int Int
We can query if there is a value inside a type with the ^?
operator.
>>> (ALit 5) ^? _ALit
Just 5
>>> (ALit 5) ^? _AVar
Nothing
We can apply a function to the value pointed by the prism with the function over
:
>>> over (_AAdd . _1 . _ALit) (+1) (AAdd (ALit 5) (AVar "x"))
(AAdd (ALit 6) (AVar "x"))
Here _1
will select the first component of the arguments of _AAdd
.
These may seem confusing if you are not familiar with optics. If that’s the case, you can use what we have introduced before as a reference. Because of their additional power, optics packages provide many more functions and operators to interact with them. To use those tools, one need to get used to the specific symbols, but at a user level, it’s not that magical. Basically you need to specific three things:
- How to get to the place? Usually by composition of prisms and other optics.
- What to do after we get there? We can view the value, apply a function to change the value, and do many other things.
- In addition, prisms also allow you to choose how to pack the structure back. In our application, we only allow replacement to the same place, but for different optics, you may be able to keep the structure, fold it with a special typeclass instance, or make a list of all matched values.
Conclusions
In this blog post, we introduced a way to perform pattern matchings for AST evaluation.
The core of this representation lies in the type definition type MatchInto s t a b = (a -> Maybe b) -> (s -> Maybe t)
.
We applied this method to the IMP language and introduced some combinators that allow us to write complex patterns.
We then explained the relation between our method and prisms, namely, prisms are more general structures that come with many tools. Our method can be replaced by prisms. And the original definition composes with prisms to some extent.
A complete IMP interpreter that utilises the aforementioned method of pattern-matching-based tree evaluation can be found at this repository, along with some descriptions and tests.