Type Checking as Calculation [860 views]

As I’ve said before, PyFL is functional programming for the rest of us. (It’s available at pyflang.com.)

PyFL now has type checking – without type declarations. Instead the type is produced by evaluating the program over the domain of types.

In PyFL gone are all the things that ordinary people find difficult or downright weird: monads, mandatory currying, post- or prefix notation, pattern matching, etc. Instead infix notation and f(x,y,z) syntax for function application. The weird stuff has it’s proponents but PyFL proves it’s not inherently part of functional programming.

Gone in particular are cumbersome, verbose type declarations. In most examples of programming in Haskell it all begins with these declarations. In PyFL you skip this stage.

That doesn’t mean you can e.g. multiply strings without hearing about it. PyFL is dynamically typed and checks at runtime if calculations make sense.

However runtime calculations mean you have to actually run the program to find these type errors – and be lucky enough to encounter them. They won’t necessarily reveal themselves every time out.

As one retired professional pointed out, you may release the software and then have tens of thousands of users running afoul of the dynamic type checker.

So there’s still a need for static analysis to avoid runtime errors. A need for compile time type checking.

But how do you type check a language with no type declarations? I really don’t want to add them – that would be a big retreat from the “for the rest of us” principle.

Fortunately there’s a solution – type inference. That means analyzing the program and deducing at least some of the types, without bothering the programmer.

For example in

a + f(b)
  where
   a = 4;
   b = a+9;
   f(n) = if n<2 then 1 else 3*f(n-1)+1 fi;
  end

it’s obvious that a is an integer, and from that that b is as well. These facts follow from the basic type rule that the sum of two integers is an integer. There is no need for a declaration b:int.

It also appears that f(b) is an integer although at this stage it’s not clear how you would formally justify it. That’s the problem we’ll solve.

Inferring a type is like calculation except you discard the actual data and combine types instead. Using rules like

int + int = int
num + int = num
int * int = int
int / int = num
num < num = bool
if bool then int else int fi = int

Of course you lose information; in particular, you don’t know which arm of a conditional will be selected. This means evaluating recursive programs is problematic because they use conditionals to trigger termination.

Subtypes

Notice that this type scheme has both int (integer) and num (numeric). And notice that int is also numeric, e..g. num+int=num. Here int is a subtype of num. There are two other subtypes: intlist is a subtype of numlist which in turn is a subtype of list. The head of an intlist is an int, the tail of a numlist is a numlist, and so on.

Implementing this partial order required a lot of coding and I skipped having separate stringlist and wordlist types, not to mention listlist. I wrote a function sb(p,q) which tests if p is a subtype of q, and a function lub(p,q) which gives the least upper bound of p and q.

Clearly my domain falls far short of that necessary to imitate the fine distinctions of Haskell type declarations. For example, in Haskell you can declare a variable to be a list of lists of integers. I’d need an infinite partial order. But it works on a broad range of programs and catches a broad range of errors.

Calculating types

So the plan I came up with is to take the PyFL evaluator and modify it so that instead of producing the actual data output by a program, it outputs the types produced by ‘running’ the program over the abstract domain of types. This is the basic idea of abstract interpretation and is hardly original with me.

One of the advantages of this scheme is that not only does it avoid programmer type declarations, it avoids function types. Here is a higher order variant of the program given above

a + f(b)
where
 db(g) = lambda (x) g(g(x)) end;
 inc(u) = u+2;
 f = db(inc);
 a = 4;
 b = 5;
end

It evaluates to int, in spite of the involvement of the second order function db.

Handling recursion

The problem with recursion is that the evaluation doesn’t terminate. This stumped me for a while.

For example, consider the program

f(7)
 where
  f(n) = if n<2 then 1 else n*f(n-1) fi;
 end

When run with the usual evaluator, it quickly gives the correct answer, 5040. The recursion terminated when the if-then-else-fi finally selected the first alternative. But when running with the type evaluator, the if condition was merely bool and both alternatives had to be explored. This resulted in an infinite recursion, the python runtime stack overflowed, there was a segmentation fault and python crashed. Needless to say, no type information was produced

In the meantime I’d added metrics to the evaluator, to see how much computation was generated. One of these metrics was the number of calls to the evaluation function. These additions were carried over to the alternate types-only evaluator.

Finally it dawned on me that I could force termination by putting a cap on this metric – say, 30. I could see what the result was, then increase the cap, look at the new result, increase the cap again, until everything settled down.

This worked even better than I expected. I tried it on a similar scheme for computing dimensionalities in pylucid and it turned out even small values of the cap gave correct answers.

The puzzle here is that this scheme seems obvious yet I’ve never seen it in print. Or any discussion of how you avoid nontermination while evaluating over an alternate domain. Yet evaluating over an alternate domain is the basic idea of abstract interpretation. If anyone has any insight, please share with me.

So what I did was set a cap of 30 on the number of calls to the evaluate function. It turned out that 30 is a lot and I could have gotten away with a much smaller cap, but so what?

Now, when I run the type evaluator, it quickly halts and produces … int, correct. It has succeeded in deducing that the factorial of an integer is an integer. Without programmer input.

The type none

The question arose, what does an evaluation return when it’s throttled? I guessed that it should return the bottom element of the type domain. Since there wasn’t a bottom element (yet) I added one: the type none. The type none is not the type of any data object. If you think of a type as a set (the set of all objects of that type), then none is the empty set.

I had to work out the rules for calculating with none. Since none is a subtype of, say, num, it can function as a num, so we have the rule none+num=num. (And none+none=none.)

Also if we are evaluating f(x,y) and f evaluates to none, the result should be none. I admit this was guesswork but it gives the right answers.

This required rewriting sb and lub, which got fairly complex. If I want to expand my domain I’ll have to come up with something more systematic.

Y, the ultimate test

The factorial program works because recursion is built in to PyFL (and Haskell), the evaluator evaluates the definition of fac in the same environment that it’s defined. But what if recursion wasn’t built in?Could we still define factorial? This was the problem facing the developers of the lambda calculus (which doesn’t have built-in recursion).

At first it seemed unlikely. But then Curry invented the Y combinator. Here it is

\f (\x f(x x)) (\x f(x x))

It’s just a small \-expression with the magic property that Yf reduces to f(Yf). It has a rather nice definition in PyFL notation, namely

Y(f) = g(g) where g(x) = f(x(x)) end;

In this form it’s easy to see that it works. We substitute g for x in the equation and get g(g) = f(g(g)).

This works fine in PyFL: the program

fac(7)
 where
  Y(f) = g(g) where g(x) = f(x(x)) end;
  a(f) = lambda (n) if n<2 then 1 else n*f(n-1) fi end;
  fac = Y(a);
 end

evaluates to 5040, the right answer.

Note that this program is nonrecursive: no variable is defined directly or indirectly in terms of itself.

This program cannot be written in Haskell. It uses self application which can’t be typed. Obviously we can write factorial in Haskell, but not this way. We have to resort to Haskell’s built-in recursion, say with the definition Y(f) = f(Y(f)).

Then the question arises, what about PyFL’s type checking? What happens when we evaluate the type of this program?

The answer is … int! Yes, PyFL has typed an untypable program!

To be honest, I wasn’t sure this would work. But in retrospect, if PyFL can handle the definition of Y over the integers, why not over types?

There’s more to type checking than Haskell’s rigid declarations.

2 Responses to Type Checking as Calculation [860 views]

carl says:

May 31, 2022 at 8:57 pm

I’ve tried different browsers, and a few guesses at a *new* url, But no go, I get a blank screen when I tried to download PyFL, can you check? thanks.

- Bill Wadge says:
  
  June 1, 2022 at 8:42 pm
  
  Fixed, thanks for the heads up