Power concedes nothing without a demand.
When the late Ed Ashcroft and I invented Lucid, we had no idea what we were getting in for.
La dee dah
At first we (or at least I) thought it would be pretty straight forward. The idea was to replace assignments like
I := 1; ... while ... I := I+1; ... end ...
with equations like
first(I) = 1; next(I) = I+1;
I = 1 fby I+1;
The original motivation was that proving properties of programs would be easier if the program statements were equations.
As for implementing the language, the idea was that we could compile the equational form into machine code using conventional technology.
However the devil was in the dots … . It dawned on us/me that since the semantics was based on infinite sequences (which we thought of as histories of computations) this implied that the computations went on forever.
That was fine as long as it was supposed to be a continuously operating program, for example to list the squares of the natural numbers, But what if it is supposed to eventually halt and produce a single result?
For example, the following imperative program calculates an approximation to the square root of 2, outputs it, and terminates.
a := 1; err := 1; while err > 0.0001 err := abs(2-a*a); a := (a+2/a)/2; end; root2 := a; write(root2);
The equations for a and err are obviously
a = 1 fby (a+2/a)/2; err = abs(2-a*a);
but what about root2? What is its equation? And it’s not even a stream, it’s a single value.
Extract the root
Eventually we devised an operator as_soon_as that extracts a value from a stream. The operator as_soon_as (shortened to asa) takes two arguments and returns the value of its first argument that corresponds to the first time its second argument is true. Thus if X is <x0,x1,x2,x3,…> and P is <f,f,f,t,…> then X asa P is <x3,x3,x3,x3,…>.
The full Lucid program was
I = 1 fby I+1; a = 1 fby (a+2/a)/2; err = abs(2-a*a); root2 = a așa err<0.0001;
(and the order of the equations doesn’t matter).
We were therefore forced to resort to lazy evaluation: you compute the value of an expression only if you need it. In particular, you don’t keep computing values of the sequence denoted by a.
Compiling is out
However compiling became very complicated. You had to analyze the program to figure out which values will be actually needed. We just waved our hands and said it could be done. (We also needed to analyze the program to figure out that variables like root2 are ‘really’ just constants and need be output only once. We only recently solved this problem.)
So we tried another strategy, namely compiling the program into a network of dataflow filters connected by pipelines. David May (yes, that David May, then a grad student) thought this was a great idea and began working on an implementation along these lines. Then one fatal Monday we met in the University of Warwick (UK) Arts Centre cafe for a cheese sandwich lunch. “It doesn’t work” he announced.
Dataflow is out
The problem was if-then-else-fi and other primitives that don’t need (and may discard) some inputs. Pipeline dataflow filters wait for data tokens to arrive on all input lines, consume one taken from each line, then produce an output token.
This causes problems if there is an if-then-else-fi filter. If the input streams are P, X, and Y, the filter should wait for tokens pn, xn, and yn to show up, then iff pn is true, send on xn and discard yn, otherwise send on yn and discard xn.
Sounds simple enough but there is a fatal issue: waiting for values that you don’t need and will discard.
What if the unneeded values show up late, or not at all? Then we will have delayed the computation for no reason. We can tweak the operation so that as soon as the needed token shows up, it is passed on. But we’re still stuck waiting for the unneeded value and there’s no way to skip it. If it never shows up (because of deadlock upstream) we’re in trouble because our own output deadlocks prematurely.
The problem is that the semantics of Lucid requires
if t then x else y fi = x
if f then x else y fi = y
If we let ⊥ denote a nonterminating (deadlocking) stream, we have
if t then x else ⊥ fi = ⊥
if f then ⊥ else y fi = ⊥
which violate the basic rules for if-then-else-fi given above.
Furthermore, if we adopt wait-and-discard we may waste computing resources (those spent computing unneeded values) and these resources could be significant.
Kludges are out
There are various kludges we could try, like sending ‘kill’ tokens upstream to cancel unneeded computations, but these run in to trouble if there are cycles in the network upstream. All a giant headache.
For that reason all simple-minded dataflow models lack a three-input one-output conditional filter. Instead they typically have a one-input two-output filter that sends a token down one of the output lines. It’s unnatural to program with such a primitive and anyway the result is two lines with different and unpredictable data rates.
In other words, David May was perfectly justified in declaring that pipeline dataflow “doesn’t work” as an implementation of Lucid. Luckily, he had a solution.
His solution (Tom Cargill and others independently came up with the same idea) was to systematically use demand-driven evaluation. The interpreter demands the value of output, which generates a demand for the value of root2, which in turn generates demands for the values of a and err.
We can demand the value of root2, because it’s just one number, but a and err are (potentially) infinite sequences that can’t be returned as the answer to a single demand. The clever idea is to allow us to demand specific indexed components of these streams, e.g. the value of a when time=2 or the value of err when time=3.
Indexed demands propagate, so that the demand for a component of one variable at a given timepoint generates demands for possibly different variables at possibly different timepoints.
The propagation rules are very simple:
A demand for A+B at time t generates demands for A at time t and B at time t,
and the result is the sum of the two results (other data operations work similarly)
A demand for first A at time t produces a demand for A at time 0, and the result is the answer.
A demand for next A at time t generates a demand for A at time t+1, and returns the answer as the result.
A demand for A fby B at time 0 returns the answer to a demand for A at time 0; while a demand for A fby B at time t+1 returns the answer to a demand for B at time t.
A demand for X asa P at time t generates demands for P at times 0, 1, 2, … until the answer is true (say at time r), and then returns the answer to the demand for X at time r.
First notice that all four primitives discard data … pipeline dataflow doesn’t implement any of them safely or efficiently.
Also, ‘time’ is just a formal parameter, it has no necessary connection to wall clock time. The time values do not necessarily increase as the computation proceeds. Thus we may demand the time 8 value of a variable, then the time 5 value, even of the same variable.
In fact it is possible to write programs that recurse into the ‘future’, like the following that computes the factorial of 7:
first f where n = 7 fby n-1; f = if n<1 then 1 else n * next f fi; end
The variable f is defined in terms of its own future yet the demand-driven interpreter produces 5040, the right answer.
It can be shown that the demand driven interpreter is 100% faithful to the statements-as-equations semantics. Every set of equations has a unique minimum solution and that’s what the interpreter computes. Usually there is a unique solution, but “minimum” means having the ‘most’ ⊥’s – in other words, the least ‘defined’ solution. Another way of putting it is that no actual value appears out of nowhere.
For example, the equation I = 1 fby next I has I = <1,7,7,7,…> as a solution but where did 7 come from? The minimum solution is I = <1,⊥,⊥,⊥,…>, and this is what the interpreter produces. In other words, if you demand I at time 0 you get 1, but if you demand I at time t>0, the computation fails to terminate. As it should, if it’s faithful to the semantics.
Ed Ashcroft loved words and wordplay. One day, browsing through a dictionary or thesaurus, he came across the word “eduction”. I remember the definition he found was something like “the act of drawing forth or eliciting … results … from the data …”. Perfect! This is what we should call it! And we did.
The question arose, is eduction dataflow? Not pipeline data-push dataflow, that’s for sure. We decided to stake a claim and defined eduction as “tagged demand-driven dataflow”. Eduction is briefly described (but not so named) in the Lucid, the Dataflow Language (1985) book.
Nevertheless the book does, on the other hand, explain the pipeline dataflow model even though it admits that it cannot serve as a general implementation scheme. There are two reasons for this.
First, pipeline dataflow serves as an excellent heuristic for writing and understanding programs. The eduction model is not usually a very good guide – everything seems to happen backwards.
Secondly, pipeline dataflow works fine for many unsophisticated programs that process data in a straightforward way. It’s much simpler than eduction and has much lower overhead. Program analysis could automatically identify programs that are eligible for a pipeline implementation.
Nevertheless, eduction is a general technique so we need to investigate what’s required to make it work.
The first thing we notice its that it doesn’t use storage (apart from cells on the invisible stack that implements recursive calls to the evaluation routine).
Not using storage is a bad idea, because it means that the interpreter recomputes demands that are repeated. This can get very expensive in time if not in space.
The solution is a cache, which we traditionally call the warehouse. Every time we calculate the value of variable V at time t, we record this fact. The warehouse is an associative store indexed by the pair (V,t). In modern languages like Python or Swift the warehouse can be implemented in a straightforward yet efficient manner using the built-in dictionary primitive.
The second problem is that the warehouse can in theory fill up as the computation proceeds. This is less of an issue with modern computers – even consumer laptops – that have ridiculous amounts of storage. For example, the older MacBook Air I’m using can store over a billion numbers. The PyLucid interpreter stores everything and never runs into trouble with the modest programs that appear in this blog.
Nevertheless a completely general implementation needs to manage warehouse storage. Tony Faustini and I came up with an effective heuristic we called the retirement plan. Briefly, it sweeps the warehouse periodically and discards values that haven’t been used recently.
The next complication involves storing and fetching data. In the simple root 2 program, we generate a demand for root2 at time 0 and store the value with tag (root2,0). So far so good.
Now suppose we extend the program and when we evaluate it we get demands for root2 at times 3 and 5. What do we do?
We can store the same approximation to √2 with tags (root2,3) and (root2,5). Now we’re wasting warehouse space by storing the same data just with different tags. Depending on the extended program, this could be very expensive in terms of space. We were lucky originally that only time 0 was demanded. But we can’t count on luck.
Now suppose we have a demand for root2 at time 7. We look in the warehouse using tag (root2,7) and find nothing. As a result we recompute root2. This is wasteful of time.
The only way we can avoid wasting space or time is to find out that root2 is constant – is insensitive to the time parameter. This requires static program analysis as described in the time sensitivity blog post.
One of the advantages of treating the time index as a formal parameter is that it suggests other dimensions can also be treated as formal parameters. In other words, eduction opens the door to multidimensional dataflow.
PyLucid has two dimensions, time (t) and space (s). As that post explains, it means we can write programs employing time-varying arrays. We can still use the pipeline heuristic, but we must imagine infinite arrays travelling down the pipes.
Eduction has no trouble handling multiple dimensions. In the simplest case, we just have slightly more elaborate demands, say (X,t=3,s=4). However with many dimensions, passing around coordinates is cumbersome. Instead we have ‘registers’ (special global variables) that hold the coordinates. Then to evaluate next X, for example, we increment the time register by one, demand the value of X, then decrement the time regiser by one.
A multidimensional warehouse
There is one catch, and that involves accessing the warehouse. Suppose we demand the value of X and get 27 as the result. With what keys do we store 27 in the warehouse?
We could attach the values of all the registers but that in general would result in wasting space on duplicate entries – the same problem as with time sensitivity, as described above, but much more serious.
We could keep track of the registers actually examined in computing the demanded value of X, and use them as the keys, but what about on the other end when we have to search for a demanded value of X? A priori we have no way of knowing what dimensions entered into producing the value we are looking for.
The only general solution is dimensional analysis, the process of discovering which dimensions might enter into the production of a given variable. Upper bounds are enough. For example, we may discover that dimensions s and t are enough to get aa value for X, but that Y may need dimension h as well. Then when we search for a value of X, we use the current contents of registers s and t as keys. But for a value of Y, we also include the current contents of the h register.
Dimensional analysis was the main technical obstacle holding back the development of Lucid, and it is solved in Shennat’s dissertation.
User defined functions
So far we have talked about only 0-order programs – programs that have no functions other than built-ins like next. What does it mean to demand, say, the value of fac(n+1) where the user has written their own definition of fac. (David May’s implementation did not support user defined functions.)
This caused some head scratching till we got a hold of Calvin Ostrum’s interpreter, which did support user defined functions. Upon examining the code we discovered that he introduced an extra dimension, that we called the place dimension, that specified where in the function calling tree a demand was being made. Ali Yaghi, then a PhD student at Warwick, revised and extended Ostrum’s scheme and formalized it in terms of intensional logic. The result is what we call Yaghi Code.
The point of Yaghi code is that it magically reduces a first order program to a 0-order program, to which we can apply eduction. The only cost is two extra intensional operators, call and actuals, and an extra “place” dimension. We’ve already seen that extra dimensions are not a serious problem for eduction.
Higher order functions
For a long time Lucid was strictly first order and this worried us because we liked to call it a functional dataflow language. For a long time we couldn’t see how to extend it. Then P. Rondogiannis and I came up with a solution that in hindsight seems obvious: more dimensions!
The idea is that one place dimension reduces a first order program to a 0-order program. The same procedure can reduce a second order program to a first order program, then adding another place dimension produces a 0-order program, which can be educed.
In general, some – some – nth order programs can be translated into 0-order programs that use n place dimensions and n families of call/actuals operators.This is not a general solution because only programs with certain function types can be translated. In particular, we cannot translate programs that employ partial application; in other words, with functions that return other functions as result.
A number of smart people have tried to fix this, without success. My hunch is that it can’t be done, though I don’t know why.
At this point the reader might start wondering, what is the point of all this? Programmers often find the side effects-free style of Lucid programming constraining, because they can’t just tell the computer what to do. Furthermore, implementing Lucid is quite a challenge because you can’t simply turn Lucid code into machine code.
In fact there are huge advantages to writing in Lucid and implementing the program with eduction. To begin with
You can understand programs.
The statements in a Lucid program really are mathematical equations. Inside a where clause the order is unimportant and the result of a where clause is derived from the (usually unique) solution of these equations. Lucid has evolved but the statements-as-equations principle has remained nonnegotiable. For example, we do not allow compound expressions on the left hand side because that can undermine the basic principle.
For a start we can apply the rules of algebra exactly because there are no side effects. If X = A+B and A and B are both small integers, we can conclude that X is also an integer. And if A and B both increase with time, we can conclude that so does X. Static analysis of Lucid programs is vastly simpler than that of imperative languages like Python.
We can transform programs
Also, we can safely apply the transformation rules of conventional algebra. For example, if Y = P*Q+R we can add an equation V=P*Q (V not already in use) and change the definition of Y to Y = V+R. The expression X*X can be replaced by X**2; no side effects. Or if F(R,S) is defined to be R – 2*S, then F(G+H,G-H) can be replaced by (G+H) – 2*(G-H) and then by 3*H-G.
The PyLucid compiler proceeds by applying meaning-preserving transformations. In the end (after introducing Yaghi code) the entire program is reduced to a (large) unordered set of ‘atomic’ equations By ‘atomic’ we mean that each equation consists of a variable equated to an expression which is either a data constant or a single operation applied to variables.
The atomic form is ready for eduction. But it is still a Lucid program, is human readable and can be saved in a simple text file. It is semantically equivalent to the original. The atomic form is also amenable to program analysis, for example determining dimensionalities.
Eduction can be distributed
Once we have the atomic form of the program, we can store it on different machines and have a number of warehouses also on different machines. We could divide up the work according to variables, e.g. have one machine evaluating A, B, and C, and another X, Y, and Z.
Of course a demand for say, B could generate a demand for, say, Z but this demand could be sent across the network. Program analysis could tell us how to split up the work so as to minimize network traffic.
There would be no problem duplicating warehouse entries because you are never going to have discrepancies – you can use whichever warehouse you want.
It’s fault tolerant
With eduction the program does not change – unlike systems based, say, on combinator reduction. If a value goes missing for whatever reason, it can be recomputed (although this may be expensive in terms of time). For this reason the warehouse strategy can be only a heuristic, like the retirement plan.
Fault tolerance is vital for a distributed implementation because it means communications don’t have to be 100% reliable.
It could be very fast
Eduction has plenty of provision for parallelism. There is no inherent contention between demanding a value of X and demanding a value of Y, unless (say) the demand for X generates a demand for Y (at the same coordinate). There are no races because there are no side effects.
The GLU project used Lucid as a coordination language to link pieces of legacy C code. They achieved modest speed up, typically an order of magnitude. This was back in the 90s when computers were pathetically weak (in terms of speed and storage) compared to today. Surely we can do much better.
Pingback: Tech roundup 151: a journal published by a bot - Javi López G.