As I’ve said before, PyFL is functional programming for the rest of us. (It’s available at pyflang.com.)

PyFL now has type checking – without type declarations. Instead the type is produced by evaluating the program over the domain of types.

In PyFL gone are all the things that ordinary people find difficult or downright weird: monads, mandatory currying, post- or prefix notation, pattern matching, etc. Instead infix notation and *f(x,y,z) *syntax for function application. The weird stuff has it’s proponents but PyFL proves it’s not inherently part of functional programming.

Gone in particular are cumbersome, verbose type declarations. In most examples of programming in Haskell it all begins with these declarations. In PyFL you skip this stage.

That doesn’t mean you can e.g. multiply strings without hearing about it. PyFL is dynamically typed and checks at runtime if calculations make sense.

However runtime calculations mean you have to actually run the program to find these type errors – and be lucky enough to encounter them. They won’t necessarily reveal themselves every time out.

As one retired professional pointed out, you may release the software and then have tens of thousands of users running afoul of the dynamic type checker.

So there’s still a need for static analysis to avoid runtime errors. A need for compile time type checking.

But how do you type check a language with no type declarations? I really don’t want to add them – that would be a big retreat from the “for the rest of us” principle.

Fortunately there’s a solution – type inference. That means analyzing the program and deducing at least some of the types, without bothering the programmer.

For example in

```
a + f(b)
where
a = 4;
b = a+9;
f(n) = if n<2 then 1 else 3*f(n-1)+1 fi;
end
```

it’s obvious that *a* is an integer, and from that that *b* is as well. These facts follow from the basic type rule that the sum of two integers is an integer. There is no need for a declaration *b:int*.

It also appears that *f(b)* is an integer although at this stage it’s not clear how you would formally justify it. That’s the problem we’ll solve.

Inferring a type is like calculation except you discard the actual data and combine types instead. Using rules like

```
int + int = int
num + int = num
int * int = int
int / int = num
num < num = bool
if bool then int else int fi = int
```

Of course you lose information; in particular, you don’t know which arm of a conditional will be selected. This means evaluating recursive programs is problematic because they use conditionals to trigger termination.

**Subtypes**

Notice that this type scheme has both *int* (integer) and *num* (numeric). And notice that *int* is also numeric, e..g. *num+int=num*. Here *int* is a subtype of *num*. There are two other subtypes: *intlist* is a subtype of *numlist* which in turn is a subtype of *list*. The head of an *intlist* is an *int*, the tail of a *numlist* is a *numlist*, and so on.

Implementing this partial order required a lot of coding and I skipped having separate *stringlist* and *wordlist* types, not to mention *listlist*. I wrote a function *sb(p,q)* which tests if *p* is a subtype of *q*, and a function *lub(p,q)* which gives the least upper bound of *p* and *q*.

Clearly my domain falls far short of that necessary to imitate the fine distinctions of Haskell type declarations. For example, in Haskell you can declare a variable to be a list of lists of integers. I’d need an infinite partial order. But it works on a broad range of programs and catches a broad range of errors.

**Calculating types**

So the plan I came up with is to take the PyFL evaluator and modify it so that instead of producing the actual data output by a program, it outputs the types produced by ‘running’ the program over the abstract domain of types. This is the basic idea of abstract interpretation and is hardly original with me.

One of the advantages of this scheme is that not only does it avoid programmer type declarations, it avoids function types. Here is a higher order variant of the program given above

```
a + f(b)
where
db(g) = lambda (x) g(g(x)) end;
inc(u) = u+2;
f = db(inc);
a = 4;
b = 5;
end
```

It evaluates to *int*, in spite of the involvement of the second order function *db*.

**Handling recursion**

The problem with recursion is that the evaluation doesn’t terminate. This stumped me for a while.

For example, consider the program

```
f(7)
where
f(n) = if n<2 then 1 else n*f(n-1) fi;
end
```

When run with the usual evaluator, it quickly gives the correct answer, 5040. The recursion terminated when the *if-then-else-fi* finally selected the first alternative. But when running with the type evaluator, the *if* condition was merely *bool* and both alternatives had to be explored. This resulted in an infinite recursion, the python runtime stack overflowed, there was a segmentation fault and python crashed. Needless to say, no type information was produced

In the meantime I’d added metrics to the evaluator, to see how much computation was generated. One of these metrics was the number of calls to the evaluation function. These additions were carried over to the alternate types-only evaluator.

Finally it dawned on me that I could force termination by putting a cap on this metric – say, 30. I could see what the result was, then increase the cap, look at the new result, increase the cap again, until everything settled down.

This worked even better than I expected. I tried it on a similar scheme for computing dimensionalities in pylucid and it turned out even small values of the cap gave correct answers.

The puzzle here is that this scheme seems obvious yet I’ve never seen it in print. Or any discussion of how you avoid nontermination while evaluating over an alternate domain. Yet evaluating over an alternate domain is the basic idea of abstract interpretation. If anyone has any insight, please share with me.

So what I did was set a cap of 30 on the number of calls to the evaluate function. It turned out that 30 is a lot and I could have gotten away with a much smaller cap, but so what?

Now, when I run the type evaluator, it quickly halts and produces … *int*, correct. It has succeeded in deducing that the factorial of an integer is an integer. Without programmer input.

**The type none**

The question arose, what does an evaluation return when it’s throttled? I guessed that it should return the bottom element of the type domain. Since there wasn’t a bottom element (yet) I added one: the type *none*. The type *none* is not the type of any data object. If you think of a type as a set (the set of all objects of that type), then *none* is the empty set.

I had to work out the rules for calculating with *none*. Since *none* is a subtype of, say, *num*, it can function as a *num*, so we have the rule *none+num=num*. (And *none+none=none*.)

Also if we are evaluating *f(x,y)* and *f* evaluates to *none*, the result should be *none*. I admit this was guesswork but it gives the right answers.

This required rewriting *sb* and *lub*, which got fairly complex. If I want to expand my domain I’ll have to come up with something more systematic.

**Y, the ultimate test**

The factorial program works because recursion is built in to PyFL (and Haskell), the evaluator evaluates the definition of *fac* in the same environment that it’s defined. But what if recursion wasn’t built in?Could we still define factorial? This was the problem facing the developers of the lambda calculus (which doesn’t have built-in recursion).

At first it seemed unlikely. But then Curry invented the Y combinator. Here it is

`\f (\x f(x x)) (\x f(x x))`

It’s just a small \-expression with the magic property that Yf reduces to f(Yf). It has a rather nice definition in PyFL notation, namely

`Y(f) = g(g) where g(x) = f(x(x)) end;`

In this form it’s easy to see that it works. We substitute g for x in the equation and get g(g) = f(g(g)).

This works fine in PyFL: the program

```
fac(7)
where
Y(f) = g(g) where g(x) = f(x(x)) end;
a(f) = lambda (n) if n<2 then 1 else n*f(n-1) fi end;
fac = Y(a);
end
```

evaluates to 5040, the right answer.

Note that this program is nonrecursive: no variable is defined directly or indirectly in terms of itself.

This program cannot be written in Haskell. It uses self application which can’t be typed. Obviously we can write factorial in Haskell, but not this way. We have to resort to Haskell’s built-in recursion, say with the definition *Y(f) = f(Y(f))*.

Then the question arises, what about PyFL’s type checking? What happens when we evaluate the type of this program?

The answer is … *int*! Yes, PyFL has typed an untypable program!

To be honest, I wasn’t sure this would work. But in retrospect, if PyFL can handle the definition of Y over the integers, why not over types?

There’s more to type checking than Haskell’s rigid declarations.