One thing Pyfl lacked for a long time was pattern matching – the ability to define functions based on the structure of their arguments. For example, the function listsum
, which adds the elements of a list, can be defined with two patterns. The first is listsum
applied to the empty list, which returns 0. The second is listsum
applied to a list with head h and tail t. The result is h+listsum(t)
. No need for if-then-else and head and tail operators.
I tried for a long time to find a scheme involving multiple equations, for example
listsum([]) = 0
listsum(h::t) = h+listsum(t)
but I couldn’t make them work. Multiple equations broke the parser and formatter and the transformations that assumed each function has a single defining equation. Finally I solved the problem by introducing a “match” case statement in which the guards are patterns in a mini language of patterns.
Here is listsum
using match:
listsum(L) =
match L to
:0 ;
h | t : $h + listsum($t) ;
end;
This expression uses two simple patterns: the empty pattern, which matches the empty list, and h | t
, which matches any nonempty list. When h | t
is matched it has the side effect of binding the variable h
to the head of the list and binding the variable t
to the tail of the list.
In general match uses a mini-language of patterns generated by about a dozen primitives. The | operator is one of them.
The Basic Patterns
The simplest primitive is the variable. A variable matches anything and results in the variable being bound to whatever was matched. Thus
a b c
matches any list with exactly three elements. The match causes a, b
, and c
to be bound to these values.
As we have already seen, |
delimits the ‘rest’ of the list. Thus
a b c | d
matches any list with at least three elements. The match results in a, b
, and c
being bound to the first three elements and d
to the remaining part of the list.
If the value being matched is of no interest you can use the ‘don’t care’ pattern _
which matches anything without binding any variable.
Constants can appear in a sequence, and match only themselves. Words must be single quoted. For example
"add 2 3
matches exactly the list [add 2 3]
.
Special Patterns
There are special patterns that match individual integers, floats, strings, lists, sets and words and are invoked by the ^
operator. Thus ^int
matches any integer and ^string
any string. How do you know which integer? You use the = primitive. The pattern =v
always succeeds and has the side effect of binding v
to the previous match.
We can use these primitives to refine the listsum
function to add up only the integer components of its argument:
listsum(L) =
match L to
:0 ;
^int =h | t : $h + listsum($t) ;
_ | t : listsum($t) ;
end;
The Alternative Construct
Notice that in general the body of a match
is a sequence of alternatives, each consisting of a pattern (which we call the “guard”) and a corresponding Pyfl expression (which we call the “result”). When a match is evaluated the interpreter proceeds through the alternatives trying to match each guard until a match succeeds. When this happens, the corresponding result is evaluated with the dollared variables replaced by the values bound to them by the side effect of the match. If no guard succeeds, the match returns the word “fail”.
Finally, suppose we want any floating point numbers present in the list to be included in the sum. There is a built-in pattern float
that matches any floating point number, but we need a pattern that matches either int
or float
. As it happens there is a built-in pattern num
which matches int or float. But suppose we didn’t have it.
We can do without num using the alternative construct, in which a list of alternatives is enclosed in decorated angle brackets. In general <%
p0 p1 p2 … %>
succeeds iff some pi succeeds. The variable bindings are those of the first successful match.
Our listsum
definition is now
listsum(L) =
match L to
:0 ;
<% ^int ^float %> =h | t : $h + listsum($t) ;
_ | t : listsum($t) ;
end;
The $ Operator
Values already bound can also be recalled using the $
operator within a pattern. The pattern $v
refers to the most recent binding of v
. Thus we can define a test oc(L)
that returns true or false depending on whether the first element of L
occurs in the tail of L
:
oc(L) =
match L to
_ : false ;
x $x | _ : true ;
x _ | t : oc( $x :: $t)
end;
There is a special construct for matching sublists that are not just list constants. It uses the decorated list parentheses [%
and %]
. Thus [% "add a b %]
matches, for example, [add 2 3]
, with a
and b
bound to 2 and 3 respectively.
Escape to PyFL
Finally, there is an escape to PyFL. The operator ? before a list constant interprets the contents as a PyFL boolean expression that succeeds if the result is true, that fails otherwise. The expression can use the dollar operator to refer to current variable bindings. Here is the definition of a function nd
that tests if its argument is nondecreasing;
nd(L) =
match L to
: true ;
_ : true ;
x y ?[$x < $y] | t : nd($y :: $t) ;
_ _ | _ : false ;
end
The current PyFL interpreter supports all these patterns but is still an interpreter. There is a performance overhead which is especially serious since, as in the examples given, match is invoked for every recursive call. The solution is to compile the patterns into Python but for the time being that is too big a project.