Algebraic Pattern Matching [< 100 views]

One thing Pyfl lacked for a long time was pattern matching – the ability to define functions based on the structure of their arguments. For example, the function listsum, which adds the elements of a list, can be defined with two patterns. The first is listsum applied to the empty list, which returns 0. The second is listsum applied to a list with head h and tail t. The result is h+listsum(t). No need for if-then-else and head and tail operators.

I tried for a long time to find a scheme involving multiple equations, for example

listsum([]) = 0
listsum(h::t) = h+listsum(t)

but I couldn’t make them work. Multiple equations broke the parser and formatter and the transformations that assumed each function has a single defining equation. Finally I solved the problem by introducing a “match” case statement in which the guards are patterns in a mini language of patterns.

Here is listsum using match:

listsum(L) =
match L to
  :0 ;
  h | t : $h + listsum($t) ;
end;

This expression uses two simple patterns: the empty pattern, which matches the empty list, and h | t, which matches any nonempty list. When h | t is matched it has the side effect of binding the variable h to the head of the list and binding the variable t to the tail of the list.

In general match uses a mini-language of patterns generated by about a dozen primitives. The | operator is one of them.

The Basic Patterns

The simplest primitive is the variable. A variable matches anything and results in the variable being bound to whatever was matched. Thus

a b c

matches any list with exactly three elements. The match causes a, b, and c to be bound to these values.

As we have already seen, | delimits the ‘rest’ of the list. Thus

a b c | d

matches any list with at least three elements. The match results in a, b, and c being bound to the first three elements and d to the remaining part of the list.

If the value being matched is of no interest you can use the ‘don’t care’ pattern _ which matches anything without binding any variable.

Constants can appear in a sequence, and match only themselves. Words must be single quoted. For example

"add 2 3

matches exactly the list [add 2 3].

Special Patterns

There are special patterns that match individual integers, floats, strings, lists, sets and words and are invoked by the ^ operator. Thus ^int matches any integer and ^string any string. How do you know which integer? You use the = primitive. The pattern =v always succeeds and has the side effect of binding v to the previous match.

We can use these primitives to refine the listsum function to add up only the integer components of its argument:

listsum(L) =
match L to
  :0 ;
  ^int =h | t : $h + listsum($t) ;
  _ | t : listsum($t) ;
end;

The Alternative Construct

Notice that in general the body of a match is a sequence of alternatives, each consisting of a pattern (which we call the “guard”) and a corresponding Pyfl expression (which we call the “result”). When a match is evaluated the interpreter proceeds through the alternatives trying to match each guard until a match succeeds. When this happens, the corresponding result is evaluated with the dollared variables replaced by the values bound to them by the side effect of the match. If no guard succeeds, the match returns the word “fail”.

Finally, suppose we want any floating point numbers present in the list to be included in the sum. There is a built-in pattern float that matches any floating point number, but we need a pattern that matches either int or float. As it happens there is a built-in pattern num which matches int or float. But suppose we didn’t have it.

We can do without num using the alternative construct, in which a list of alternatives is enclosed in decorated angle brackets. In general <% p0 p1 p2 %> succeeds iff some pi succeeds. The variable bindings are those of the first successful match.

Our listsum definition is now

listsum(L) =
match L to
  :0 ;
  <% ^int ^float %> =h | t : $h + listsum($t) ;
  _ | t : listsum($t) ;
end;

The $ Operator

Values already bound can also be recalled using the $ operator within a pattern. The pattern $v refers to the most recent binding of v. Thus we can define a test oc(L) that returns true or false depending on whether the first element of L occurs in the tail of L:

oc(L) =
match L to
_ : false ;
x $x | _ : true ;
x _ | t : oc( $x :: $t)
end;

There is a special construct for matching sublists that are not just list constants. It uses the decorated list parentheses [% and %]. Thus [% "add a b %] matches, for example, [add 2 3], with a and b bound to 2 and 3 respectively.

Escape to PyFL

Finally, there is an escape to PyFL. The operator ? before a list constant interprets the contents as a PyFL boolean expression that succeeds if the result is true, that fails otherwise. The expression can use the dollar operator to refer to current variable bindings. Here is the definition of a function nd that tests if its argument is nondecreasing;

nd(L) =
match L to
: true ;
_ : true ;
x y ?[$x < $y] | t : nd($y :: $t) ;
_ _ | _ : false ;
end

The current PyFL interpreter supports all these patterns but is still an interpreter. There is a performance overhead which is especially serious since, as in the examples given, match is invoked for every recursive call. The solution is to compile the patterns into Python but for the time being that is too big a project.

About Bill Wadge

I am a retired Professor in Computer Science at UVic.
This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.