The axiom of choice (AC) seems harmless enough. It says that given a family of non empty sets, there is a *choice function* that assigns to each set an element of that set.

AC is practically indispensable for doing modern mathematics. It is an existential axiom that implies the existence of all kinds of objects. But it gives no guidance on how to find examples of these objects. So in what sense do they exist?

**A voting scheme **

*Don’t buy a single vote more than necessary. – Douglas William Jerrold*

Voting is in the news today as are various voting schemes. However when it comes to voting a proposal up or down, there’s only one simple criterion: majority rule. If the voters are v0, v1, v2, … vn-1 then the proposal is accepted if at least n/2+1 are in favour.

But what if there are infinitely many voters v0, v1, v2, … ? What does it mean for there to be a majority in favour? AC implies that there exists an infinitary voting scheme (this is not obvious) but supplies not a hint about how it could work.

One possibility is to pass the proposal if infinitely many voters voted Aye, but it’s possible that at the same time infinitely many voted Nay.

A vote is an infinite sequence (e..g.) Aye, Nay, Nay, Aye,… and a voting scheme is a function which assigns either Aye or Nay to each voter. A *majority* is a set of voters such that any motion for which they all vote Aye, passes.

Then we require the following properties to hold:

Given any set of voters, either it or its complement is a majority, but not both

If everyone switches their vote, the result switches

The result of adding any number of voters to a majority is a majority

The result of removing a single voter from a majority is still a majority.

The intersection of two majorities is a majority

There are a number of consequences of these properties we can immediately derive. Let’s call the complement of a majority a *minority*.

A motion passes iff the set of those who voted Aye is a majority and fails if it is a minority

The set of all voters is a majority and the empty set is a minority

Any finite set is a minority

Any cofinite set is a majority

If the vote is unanimous, the result follows the vote

The union of two minorities is a minority

Sounds simple? Well don’t try to invent a particular voting scheme because you won’t succeed. Any scheme that can be defined concretely will fail at least one of these properties. In fact it’s been shown that it’s consistent with the axioms of set theory and a weak form of the axiom of choice that there is no voting scheme.

Here is a simple argument against the existence of a voting scheme. Let E be the even numbers and O the odds. Either E or O is a majority, say E. But how is O smaller than E?They are isomorphic, if the evens and odds swap votes, we get the same partition but now the former majority voters are in the minority. How?

Nevertheless full AC says a voting scheme exists. The question is, exists in what sense?

**Non measurable set**

You’ve probably heard of the Banach-Tarski result. AC implies that it is possible to divide a unit sphere into five pieces then reassemble them into two unit spheres (using only translations and rotations, that should preserve volume). This is like proving that 1=1+1 except that four of the pieces can’t be assigned volumes: they are non measurable sets.

AC implies the existence of non measurable sets – e.g. solids that don’t have a volume – but don’t ask to see an example of one. Any set you can describe precisely will be measurable – for example, any Borel set. Analysis texts devote a lot of space to proving that the results of various operations, like countable union, preserve measurability.

These texts could be greatly simplified if they just assumed that all sets are measurable. Of course if they also assumed AC they’d be in trouble but you can do most of analysis with weaker forms of choice, like countable choice, that don’t imply the existence of non measurable sets.

**An indeterminate game**

You’re all familiar with finite discrete games like checkers and chess. Simplifying a bit, they have the following properties:

Players I and II alternate, I moving first, until the game ends

On each move the mover has a finite choice of moves

On each move the mover plays a natural number

Each player knows the sequence of moves up to the current position

If the last move produces a winning position, the mover wins

If the last move produces a losing position, the mover loses

If the last move produces neither, the game ties

It’s actually quite tricky to make this precise, and I’ll skip the details. The crucial concept is that of a *strategy*, which is simply a function which given the moves so far, gives the next move for the player concerned. A *winning* strategy is one that always eventually puts its user in a winning position.

Zermelo showed more than a century ago that if a game always ends, then either one player has a winning strategy, or both have a tying strategy.

But we can also define infinite games. In fact it’s simpler. Given any subset G of the Baire space (set of sequences of natural numbers):

Players I and II alternate, I moving first.

On each move each player plays a natural number

If the resulting infinite sequence a0,b0,a1,b1,a2,b2, … is in G, II wins, otherwise I wins.

A winning strategy for II is a function which takes a0,b0,…an and yields bn. A strategy for I takes a0,b0,…an,bn and yields an+1.

Since there are no ties we might expect that one of the players has a winning strategy (in which case the game is said to be *determinate*). Not so fast.

AC implies the existence of a nondeterminate game. An example? Don’t look for one, for the usual reasons. Any game you can define will be determinate. In particular, if G is a Borel set, then G is determinate.

There are a number of weaker versions of AC that can be proved directly in ZF. Suppose, for example, that each choice set has exactly two reals. Then choose the smallest! More generally the same idea works if each choice set has a finite set of reals.

**A well ordering of the reals**

But what if a choice set has an infinite number of reals? Say, all those greater than 0? There is no smallest.

However we could use another ordering of the reals, in which any subset has a least element. This is called a well-ordering. Then any family of sets of reals would have a choice function.

The natural numbers are well ordered (by arithmetic ordering), so every family of nonempty sets of natural numbers has a choice function,

Is there a well-ordering of the reals? AC implies there is, but don’t try to find an example. No ordering you can define will be a well-ordering, otherwise you could prove AC from ZF, which has been shown to be impossible (if ZF is consistent so is ZF+Β¬AC)

Yet another mathematical object that does not exist in any practical sense.

**An infinitesimal **

One important application of AC is to prove the *compactness* property of first order logic. This says that if every finite subset of a set of first order formulas is consistent (has an interpretation), the whole set has an interpretation (this is nontrivial because different finite subsets my have different interpretations).

One application is to prove the possible existence of *infinitesimals*. An infinitesimal is a number that is greater than 0 but less than 1/n for every natural number n.

For a long time calculus was based on infinitesimals. The derivative f’ of a function f was ‘defined’ as (f(x+π)-f(x))/π. Engineers still think in terms of infinitesimals dx and dy.

In the 1800s infinitesimals were declared inconsistent and abolished, replaced by the π-π formalism.

However in the 1960s Abraham Robinson pointed out that compactness implies the existent of an extension of the reals with infinitesimals. If we take all true first order properties of the reals (such as x+y=y+x or log(xy) = log(x) + log(y) and add the formulas

π<1

π<1/2

π<1/3

π<1/4

…

Then every finite subset is consistent. Hence, by compactness, the whole set is consistent. Robinson called the resulting structure the hyperreals and it’s a model of the first order theory of the reals, with infinitesimals.

The only snag is, what is π, the infinitesimal whose existence is guaranteed by compactness? Don’t ask, because there is no hope of defining it. If π works so does π/2, or π**2, or βπ. There is no infinitesimal distinguished in the same sense that i and -i are distinguished in the complex numbers. So the hyperreals have infinitesimals but don’t try to choose a particular one.

**Degrees of Existence**

One thing is clear, existence is not a straight forward binary property. It’s a spectrum. On one end there’s the existence of integers like 42 and recursively defined functions like factorial. On the other end is the voting scheme, which seems like pure vapourware.

AC by itself implies only vapourware. However these zombie-like objects like the voting scheme are necessary for the smooth functioning of the mathematical universe. Without them the universe is chaotically irregular.

The universe with AC is like an all-conquering army with straight ranks but ranks filled in part with zombies. Their presence fills us with discomfort but without them we’re lost.

]]>–

*[All the images in this post were produced with generative AI – Midjourney,* *DALL-E 2, Stable diffusion. Most by Paul DelSignore, not by me}*

I used to teach the AI course at the University of Victoria – thank God I’m retired. I couldn’t have kept up with the breakthroughs in translation, game playing, and especially generative AI.

When I taught AI, it was mainly Good Old Fashioned AI (GOFAI). I retired in 2015, just before the death of GOFAI. I dodged a bullet.

I am in awe of NFAI (New-Fangled AI) yet I still don’t understand how it works. But I do understand GOFAI and I’d like to share my awe of NFAI and my understanding of why GOFAI is not awe-full.

**Seek and Ye Shall Find**

For a long timeAI was almost a joke amongst non-AI computer scientists. There was so much hype but the hyped potential breakthroughs never materialized. One common quip was that AI was actually natural stupidity.

Many departments, like my own, basically boycotted the subject, maybe only offering a single introductory course

The heart of GOFAI is searching – of trees and, more generally, graphs. For many decades the benchmark for tree searching was chess. Generations (literally) of AI researchers followed the program first proposed by Norbert Wiener in the 40s, based on searching the chess game tree. Every ten years AI evangelists would promise that computer chess mastery was only ten years away

Wiener’s idea, described in his pioneering book *Cybernetics,* was a min/max search of the game tree, resorting to a heuristic to evaluate positions when the search got too deep.

The chess game tree gets big very quickly and it wasn’t until decades later (the late 1990’s) that IBM marshalled the horsepower to realize Wiener’s dream. They built a special purpose machine, Deep Blue, capable of examining 100 million positions per second. Deep Blue eventually won, first a game, then a whole match, against Gary Kasparov, the world champion.

Deep Blue was the high water mark of GOFAI and there was no real followup. Deep Blue’s successor, Watson, could win at mastermind but commercial applications never materialized.

**AlphaGo and AlphaZero**

I was impressed by Deep Blue but wondered about the game of Go (Baduk, Wei-chi). The board is 19×19 and the game tree is incomparably bigger than that of chess. If you’d asked me at the time I would have said Go mastery was inconceivable (which, if we had to use GOFAI, was true).

Then in 2016 the unthinkable occurred: a program, called “AlphaGo”, started beating Go champions. It did not use Wiener’s approach; instead it used Machine Learning (ML) (don’t ask me how that works).

AlphaGo trained by playing millions of games against itself. Originally it was given hundreds of thousands of expert level human games but its successor AlphaZero dispensed with them and simply taught itself. It took only a few hours to reach expert level, which for humans took hundreds of years. Variants of the software mastered chess, checkers, and shogi in a similar fashion.

**Google Translate**

About the same time users of Google Translate noticed a sudden dramatic increase in the quality of its translations, although Google at the time said nothing. In fact Google had switched from statistical to neural networks (don’t ask).

Currently Google’s translations are spectacularly good (and Deepl’s even better). In translating into French, for example, it usually gets the pronouns right as well a well as the gender/number agreements. If you want perfect output you still have to at least proofread the output because occasionally there are mistakes, usually minor.

Here is a translation from today’s *Le Monde*. It’s perfect.

Since the beginning of the conflict, Russia has refused to describe its invasion of Ukraine as a war, calling it a “special military operation”. Shortly after the conflict broke out, the authorities banned the Russian media from using any information other than official statements on the subject and banned the use of words such as “war” and “invasion”, under penalty of being punished by the law.

GOFAI never produced really good translations, relying as it did on facts and rules. I remember the whole class getting a good laugh out of the mistakes. Once, translating a news article about Iraq from German, it claimed Saddam “warned the US before an attack” when what it should have said was “Saddam warned the US *of* an attack”. Modern systems would never make such a basic mistake.

**Generative Speech**

Then comes GPT 3 and the other systems that generate speech. Originally they were merely predictors that would complete incomplete sentences. But now they run on and produce whole articles. Which (usually) make sense. Here’s an example:

*Author Prompt:** Write me an article about creativity please *

*GPT-3 Response:** Iβll be honest. I donβt know much about creativity. Iβm not a psychologist and Iβm not a creative myself. Iβm just a guy who likes to read about psychology and creativity. What I can do, though, is write you an article about creativity thatβs based on what I have read. That way you get a proper article, but you donβt have to pay for the expertise of a psychologist or a creative person.*

GOFAI never achieved anything along these lines. But then my mind was well and truly boggled by …

**Generative Art**

Along came DALL-E and DALL-E 2. But it wasn’t till Stable Diffusion was released that I started paying attention. Of course there was the pictures of astronauts on horseback and cats wearing sunglasses. But what really impressed me was pictures *in the style of* well known artists. Here are two of my favourites :

The first is an abstract image in the style of Picasso. I can’t find the original but MidJourney’s version is just marvellous. I’d have no hesitation to print it, frame it, and hang it on my wall.

My second favourite is a wonderful portrait of Superman – ‘by’ Rembrandt! As one observer commented, “those eyes have seen some shit!”

But even the cheesy astronaut image is impressive.

The striking fact is that *you can’t see the astronaut’s left leg*. The image generator seems to understand that you can’t see through opaque objects (namely, the horse).

GOFAI would need literally hundreds of rules just about what to do when bodies overlap, what to show, what objects are transparent and to what degree etc etc.

**On reflection**

OK let’s go all in – let’s look at a cat wearing sunglasses. Ew cheesy – but there’s something remarkable about the image.

It’s the reflections in the lenses of the sunglasses. Not only are they visible, but the reflections are, correctly, the same. How does Midjourney coordinate the images in separate parts of the picture?

**A closer look**

When I see this image I have to ask, where did all this come from? Midjourney is trained on 5 billion images but condenses this training to 5 GB. So there’s not enough room to include exact copies of images found in the training set. We can assume that this (apparent) photo does not exist as is on the internet.

In particular what about the blue feathers on either side of the subject’s neck (they are not mirror images). Where did they come from? Did one of the training images have them?

The mystery is that this image is the result of combining training set images, but how are they put together? The best GOFAI could do is chop up the training images and put them together like a badly fitting crossword puzzle with visible seams and limited symmetry. I’m baffled.

**The social implications of AI technology**

*It is questionable if all the mechanical inventions yet made have lightened the day’s toil of any human being.*

~ *John Stuart Mill*

There is a lot of controversy Midjourney and other generative image programs.

The first question is, are these images art? I think some of the images presented here are definitely art, even good art. If you’re not convinced, have another ‘Rembrandt’.

The second question is, is imitating the *style* of certain artists fair? I don’t know, but there seems no way to stop it. Currently nothing stops a human artist from studying living artists and imitating their styles. Midjourney etc are just especially good at this.

In a sense, this imitation broadens the exposure of the imitated artists. Now everyone can have, say, a Monet of their own.

Finally, a vital question is, how will this affect today’s working artists? Here the answer is not so optimistic.

Generative AI is not the first disruptive technology. There’s photography, the closest analog, digital art in general, the telephone, the automobile, the record player, the printing press, and so on.

Each of these had the effect of obsoleting the skills of whole professions. It didn’t wipe them out, but the vast increase in productivity put large numbers out of work. And those that remained had to acquire and use the new tools. Because of economic competition they had to work harder than ever to keep up.

Labor saving technology inevitably becomes profit saving technology. The tractor is an example. Initially it (and farm machinery in general) were marketed as labor saving. But eventually competition forced every farmer to get machinery or sell out (which most had to do). The result was the same or more food produced by a fraction of the former number of farmers, working their butts off.

So I predict AI will shrink the number of artists and force them to use Midjourney etc. For art consumers, it will be good news – like drinking from a firehose. A new individual Monet every week. Do it yourself illustrations for personal blogs. But not change in society as a whole.

]]>When the late Ed Ashcroft and I invented Lucid, we had no idea what we were getting in for.

**La dee dah**

At first we (or at least I) thought it would be pretty straight forward. The idea was to replace assignments like

```
I := 1;
...
while ...
I := I+1;
...
end
...
```

with equations like

```
first(I) = 1;
next(I) = I+1;
```

then, later,

`I = 1 fby I+1;`

The original motivation was that proving properties of programs would be easier if the program statements were equations.

As for implementing the language, the idea was that we could compile the equational form into machine code using conventional technology.

**Oh, shoot**

However the devil was in the dots … . It dawned on us/me that since the semantics was based on infinite sequences (which we thought of as histories of computations) this implied that the computations went on forever.

That was fine as long as it was supposed to be a continuously operating program, for example to list the squares of the natural numbers, But what if it is supposed to eventually halt and produce a single result?

For example, the following imperative program calculates an approximation to the square root of 2, outputs it, and terminates.

```
a := 1;
err := 1;
while err > 0.0001
err := abs(2-a*a);
a := (a+2/a)/2;
end;
root2 := a;
write(root2);
```

The equations for *a *and *err* are obviously

```
a = 1 fby (a+2/a)/2;
err = abs(2-a*a);
```

but what about *root2*? What is its equation? And it’s not even a stream, it’s a single value.

**Extract the root**

Eventually we devised an operator *as_soon_as* that *extracts* a value from a stream. The operator *as_soon_as* (shortened to *asa*) takes two arguments and returns the value of its first argument that corresponds to the first time its second argument is true. Thus if *X* is *<x0,x1,x2,x3,…>* and *P* is *<f,f,f,t,…>* then *X asa P *is *<x3,x3,x3,x3,…>*.

The full Lucid program was

```
I = 1 fby I+1;
a = 1 fby (a+2/a)/2;
err = abs(2-a*a);
root2 = a aΘa err<0.0001;
```

(and the order of the equations doesn’t matter).

We were therefore forced to resort to lazy evaluation: you compute the value of an expression only if you need it. In particular, you don’t keep computing values of the sequence denoted by *a.*

**Compiling is out**

However compiling became very complicated. You had to analyze the program to figure out which values will be actually needed. We just waved our hands and said it could be done. (We also needed to analyze the program to figure out that variables like root2 are ‘really’ just constants and need be output only once. We only recently solved this problem.)

So we tried another strategy, namely compiling the program into a network of dataflow filters connected by pipelines. David May (yes, that David May, then a grad student) thought this was a great idea and began working on an implementation along these lines. Then one fatal Monday we met in the University of Warwick (UK) Arts Centre cafe for a cheese sandwich lunch. “It doesn’t work” he announced.

**Dataflow is out**

The problem was *if-then-else-fi* and other primitives that don’t need (and may discard) some inputs. Pipeline dataflow filters wait for data tokens to arrive on all input lines, consume one taken from each line, then produce an output token.

This causes problems if there is an* if-then-else-fi *filter. If the input streams are P, X, and Y, the filter should wait for tokens *pn, xn*, and yn to show up, then iff *pn* is true, send on *xn *and discard *yn*, otherwise send on *yn* and discard xn.

Sounds simple enough but there is a fatal issue: waiting for values that you don’t need and will discard.

What if the unneeded values show up late, or not at all? Then we will have delayed the computation for no reason. We can tweak the operation so that as soon as the needed token shows up, it is passed on. But we’re still stuck waiting for the unneeded value and there’s no way to skip it. If it never shows up (because of deadlock upstream) we’re in trouble because our own output deadlocks prematurely.

The problem is that the semantics of Lucid requires

* if t* *then x else y fi = x if f then x else y fi = y*

If we let β₯ denote a nonterminating (deadlocking) stream, we have

* if t then x else β₯ fi = β₯ if f then β₯ else y fi = β₯*

which violate the basic rules for *if-then-else-fi* given above.

Furthermore, if we adopt wait-and-discard we may waste computing resources (those spent computing unneeded values) and these resources could be significant.

**Kludges** **are out**

There are various kludges we could try, like sending ‘kill’ tokens upstream to cancel unneeded computations, but these run in to trouble if there are cycles in the network upstream. All a giant headache.

For that reason all simple-minded dataflow models lack a three-input one-output conditional filter. Instead they typically have a one-input two-output filter that sends a token down one of the output lines. It’s unnatural to program with such a primitive and anyway the result is two lines with different and unpredictable data rates.

In other words, David May was perfectly justified in declaring that pipeline dataflow “doesn’t work” as an implementation of Lucid. Luckily, he had a solution.

**Demand results**

His solution (Tom Cargill and others independently came up with the same idea) was to systematically use demand-driven evaluation. The interpreter demands the value of *output*, which generates a demand for the value of *root2,* which in turn generates demands for the values of *a* and *err*.

We can demand the value of *root2*, because it’s just one number, but *a* and *err* are (potentially) infinite sequences that can’t be returned as the answer to a single demand. The clever idea is to allow us to demand specific indexed components of these streams, e.g. the value of *a* when time=2 or the value of *err* when time=3.

Indexed demands propagate, so that the demand for a component of one variable at a given timepoint generates demands for possibly different variables at possibly different timepoints.

The propagation rules are very simple:

A demand for *A+B* at time t generates demands for A at time t and B at time t,

and the result is the sum of the two results (other data operations work similarly)

A demand for *first A* at time t produces a demand for A at time 0, and the result is the answer.

A demand for *next A* at time t generates a demand for *A* at time t+1, and returns the answer as the result.

A demand for *A fby B* at time 0 returns the answer to a demand for *A* at time 0; while a demand for A* fby B *at time t+1 returns the answer to a demand for *B* at time t.

A demand for* X asa P* at time t generates demands for P at times 0, 1, 2, … until the answer is *true* (say at time r), and then returns the answer to the demand for *X* at time r.

**Some observations**

First notice that all four primitives discard data … pipeline dataflow doesn’t implement any of them safely or efficiently.

Also, ‘time’ is just a formal parameter, it has no necessary connection to wall clock time. The time values do not necessarily increase as the computation proceeds. Thus we may demand the time 8 value of a variable, then the time 5 value, even of the same variable.

**Anachronistic programs**

In fact it is possible to write programs that recurse into the ‘future’, like the following that computes the factorial of 7:

```
first f
where
n = 7 fby n-1;
f = if n<1 then 1 else n * next f fi;
end
```

The variable f is defined in terms of its own future yet the demand-driven interpreter produces 5040, the right answer.

**Minimum solution**

It can be shown that the demand driven interpreter is 100% faithful to the statements-as-equations semantics. Every set of equations has a unique minimum solution and that’s what the interpreter computes. Usually there is a unique solution, but “minimum” means having the ‘most’ β₯’s – in other words, the least ‘defined’ solution. Another way of putting it is that no actual value appears out of nowhere.

For example, the equation* I = 1 fby next I* has I = <1,7,7,7,…> as a solution but where did 7 come from? The minimum solution is I = <1,β₯,β₯,β₯,…>, and this is what the interpreter produces. In other words, if you demand I at time 0 you get 1, but if you demand I at time t>0, the computation fails to terminate. As it should, if it’s faithful to the semantics.

**Eduction**

Ed Ashcroft loved words and wordplay. One day, browsing through a dictionary or thesaurus, he came across the word “eduction”. I remember the definition he found was something like “the act of drawing forth or eliciting … results … from the data …”. Perfect! This is what we should call it! And we did.

The question arose, is eduction dataflow? Not pipeline data-push dataflow, that’s for sure. We decided to stake a claim and defined eduction as “tagged demand-driven dataflow”. Eduction is briefly described (but not so named) in the *Lucid, the Dataflow Language* (1985) book.

Nevertheless the book does, on the other hand, explain the pipeline dataflow model even though it admits that it cannot serve as a general implementation scheme. There are two reasons for this.

First, pipeline dataflow serves as an excellent *heuristic* for writing and understanding programs. The eduction model is not usually a very good guide – everything seems to happen backwards.

Secondly, pipeline dataflow works fine for many unsophisticated programs that process data in a straightforward way. It’s much simpler than eduction and has much lower overhead. Program analysis could automatically identify programs that are eligible for a pipeline implementation.

Nevertheless, eduction is a general technique so we need to investigate what’s required to make it work.

**Implementing eduction**

The first thing we notice its that it doesn’t use storage (apart from cells on the invisible stack that implements recursive calls to the evaluation routine).

Not using storage is a bad idea, because it means that the interpreter recomputes demands that are repeated. This can get very expensive in time if not in space.

The solution is a cache, which we traditionally call the *warehouse*. Every time we calculate the value of variable V at time t, we record this fact. The warehouse is an associative store indexed by the pair (V,t). In modern languages like Python or Swift the warehouse can be implemented in a straightforward yet efficient manner using the built-in *dictionary* primitive.

The second problem is that the warehouse can in theory fill up as the computation proceeds. This is less of an issue with modern computers – even consumer laptops – that have ridiculous amounts of storage. For example, the older MacBook Air I’m using can store over a *billion* numbers. The PyLucid interpreter stores everything and never runs into trouble with the modest programs that appear in this blog.

Nevertheless a completely general implementation needs to manage warehouse storage. Tony Faustini and I came up with an effective heuristic we called the retirement plan. Briefly, it sweeps the warehouse periodically and discards values that haven’t been used recently.

**Tags**

The next complication involves storing and fetching data. In the simple root 2 program, we generate a demand for *root2* at time 0 and store the value with tag *(root2,0)*. So far so good.

Now suppose we extend the program and when we evaluate it we get demands for *root2* at times 3 and 5. What do we do?

We can store the same approximation to β2 with tags *(root2,3)* and *(root2,5).* Now we’re wasting warehouse space by storing the same data just with different tags. Depending on the extended program, this could be very expensive in terms of space. We were lucky originally that only time 0 was demanded. But we can’t count on luck.

Now suppose we have a demand for *root2* at time 7. We look in the warehouse using tag *(root2,7)* and find nothing. As a result we recompute *root2*. This is wasteful of time.

The only way we can avoid wasting space or time is to find out that *root2* is constant – is insensitive to the time parameter. This requires static program analysis as described in the time sensitivity blog post.

**Extra dimensions**

One of the advantages of treating the time index as a formal parameter is that it suggests other dimensions can also be treated as formal parameters. In other words, eduction opens the door to multidimensional dataflow.

PyLucid has two dimensions, time (t) and space (s). As that post explains, it means we can write programs employing time-varying arrays. We can still use the pipeline heuristic, but we must imagine infinite arrays travelling down the pipes.

Eduction has no trouble handling multiple dimensions. In the simplest case, we just have slightly more elaborate demands, say (X,t=3,s=4). However with many dimensions, passing around coordinates is cumbersome. Instead we have ‘registers’ (special global variables) that hold the coordinates. Then to evaluate next X, for example, we increment the time register by one, demand the value of X, then decrement the time regiser by one.

**A multidimensional warehouse**

There is one catch, and that involves accessing the warehouse. Suppose we demand the value of X and get 27 as the result. With what keys do we store 27 in the warehouse?

We could attach the values of all the registers but that in general would result in wasting space on duplicate entries – the same problem as with time sensitivity, as described above, but much more serious.

We could keep track of the registers actually examined in computing the demanded value of *X*, and use them as the keys, but what about on the other end when we have to search for a demanded value of X? A priori we have no way of knowing what dimensions entered into producing the value we are looking for.

The only general solution is *dimensional analysis*, the process of discovering which dimensions might enter into the production of a given variable. Upper bounds are enough. For example, we may discover that dimensions s and t are enough to get aa value for X, but that Y may need dimension h as well. Then when we search for a value of X, we use the current contents of registers s and t as keys. But for a value of Y, we also include the current contents of the h register.

Dimensional analysis was the main technical obstacle holding back the development of Lucid, and it is solved in Shennat’s dissertation.

**User defined ****function**s

So far we have talked about only 0-order programs – programs that have no functions other than built-ins like *next*. What does it mean to demand, say, the value of *fac(n+1) *where the user has written their own definition of *fac*. (David May’s implementation did not support user defined functions.)

This caused some head scratching till we got a hold of Calvin Ostrum’s interpreter, which did support user defined functions. Upon examining the code we discovered that he introduced an extra dimension, that we called the *place* dimension, that specified where in the function calling tree a demand was being made. Ali Yaghi, then a PhD student at Warwick, revised and extended Ostrum’s scheme and formalized it in terms of intensional logic. The result is what we call Yaghi Code.

The point of Yaghi code is that it magically reduces a first order program to a 0-order program, to which we can apply eduction. The only cost is two extra intensional operators,* call* and *actuals*, and an extra “place” dimension. We’ve already seen that extra dimensions are not a serious problem for eduction.

**Higher order functions**

For a long time Lucid was strictly first order and this worried us because we liked to call it a *functional* dataflow language. For a long time we couldn’t see how to extend it. Then P. Rondogiannis and I came up with a solution that in hindsight seems obvious: more dimensions!

The idea is that one place dimension reduces a first order program to a 0-order program. The same procedure can reduce a second order program to a first order program, then adding another place dimension produces a 0-order program, which can be educed.

In general, some – some – nth order programs can be translated into 0-order programs that use n place dimensions and n families of *call/actuals* operators.This is not a general solution because only programs with certain function types can be translated. In particular, we cannot translate programs that employ partial application; in other words, with functions that return other functions as result.

A number of smart people have tried to fix this, without success. My hunch is that it can’t be done, though I don’t know why.

**Advantages**

At this point the reader might start wondering, what is the point of all this? Programmers often find the side effects-free style of Lucid programming constraining, because they can’t just tell the computer what to do. Furthermore, implementing Lucid is quite a challenge because you can’t simply turn Lucid code into machine code.

In fact there are huge advantages to writing in Lucid and implementing the program with eduction. To begin with

**You can understand programs**.

The statements in a Lucid program really are mathematical equations. Inside a *where* clause the order is unimportant and the result of a where clause is derived from the (usually unique) solution of these equations. Lucid has evolved but the statements-as-equations principle has remained nonnegotiable. For example, we do not allow compound expressions on the left hand side because that can undermine the basic principle.

For a start we can apply the rules of algebra exactly because there are no side effects. If X = A+B and A and B are both small integers, we can conclude that X is also an integer. And if A and B both increase with time, we can conclude that so does X. Static analysis of Lucid programs is vastly simpler than that of imperative languages like Python.

**We can transform programs**

Also, we can safely apply the transformation rules of conventional algebra. For example, if *Y = P*Q+R* we can add an equation *V=P*Q* (V not already in use) and change the definition of Y to *Y = V+R.* The expression *X*X* can be replaced by *X**2*; no side effects. Or if *F(R,S)* is defined to be *R – 2*S*, then *F(G+H,G-H*) can be replaced by* (G+H) – 2*(G-H)* and then by* 3*H-G*.

The PyLucid compiler proceeds by applying meaning-preserving transformations. In the end (after introducing Yaghi code) the entire program is reduced to a (large) unordered set of ‘atomic’ equations By ‘atomic’ we mean that each equation consists of a variable equated to an expression which is either a data constant or a single operation applied to variables.

The atomic form is ready for eduction. But it is still a Lucid program, is human readable and can be saved in a simple text file. It is semantically equivalent to the original. The atomic form is also amenable to program analysis, for example determining dimensionalities.

**Eduction can be distributed**

Once we have the atomic form of the program, we can store it on different machines and have a number of warehouses also on different machines. We could divide up the work according to variables, e.g. have one machine evaluating A, B, and C, and another X, Y, and Z.

Of course a demand for say, B could generate a demand for, say, Z but this demand could be sent across the network. Program analysis could tell us how to split up the work so as to minimize network traffic.

There would be no problem duplicating warehouse entries because you are never going to have discrepancies – you can use whichever warehouse you want.

**It’s fault tolerant**

With eduction the program does not change – unlike systems based, say, on combinator reduction. If a value goes missing for whatever reason, it can be recomputed (although this may be expensive in terms of time). For this reason the warehouse strategy can be only a heuristic, like the retirement plan.

Fault tolerance is vital for a distributed implementation because it means communications don’t have to be 100% reliable.

**It could be very fast**

Eduction has plenty of provision for parallelism. There is no inherent contention between demanding a value of X and demanding a value of Y, unless (say) the demand for X generates a demand for Y (at the same coordinate). There are no races because there are no side effects.

The GLU project used Lucid as a coordination language to link pieces of legacy C code. They achieved modest speed up, typically an order of magnitude. This was back in the 90s when computers were pathetically weak (in terms of speed and storage) compared to today. Surely we can do much better.

]]>Pascal had *while* loops and we managed to do iteration with equations. However in Pascal you can nest *while* loops and have iterations that run their course while enclosing loops are frozen. This was a problem for us.

To see the problem consider the following Pascal-like program which repeatedly inputs a number, computes an approximation to its square root using Newton’s method, and prints it out

```
N = input() ;
while N ne eod
a = 1 ;
err = abs(N-1)
while err > 0.0001
a = (a+N/a)/2 ;
err = abs(N-a*a) ;
end
end
```

We can naively rewrite this as Lucid giving

```
output
where
N = input;
output =
a aΘa err < 0.0001
where
a = 1 fby (a+N/a)/2
err = abs(N-a*a)
end;
end
```

All very good but it doesn’t work – it’s output is garbage. The problem is that N continues changing in sync with the approximation a. The Newton iteration is chasing a moving target and may not even terminate.

The difference between Lucid and Pascal is that with Pascal by default nothing changes whereas with Lucid everything changes by default.

Obviously, if Lucid wants to be a general purpose language it needs nesting. We needed some way to ‘freeze’ the current value of N while the inner loop is running. We came up with the “*is current*” declaration, an ad hoc solution.

The program becomes

```
output
where
n = input;
output =
a aΘa err < 0.0001
where
N is current n;
a = 1 fby (a+N/a)/2
err = abs(N-a*a)
end;
end
```

There were many problems with this solution, starting with the fact that N is no longer defined by an equation. The i*s current *statement was unpopular with Lusers (Lucid users) and tended to be dropped. GLU did not have nesting and instead used temporary extra ‘throw away’ dimensions.

Can we do better? I think so, and I’m going to outline a proposal. The idea is to use operations (I call them”hyper filters”) that work on streams (I call them “hyperstreams”) that are functions of a whole sequence t0,t1,t2,… of time parameters, not just t0 as with ordinary streams. The idea is that t0 is inner, local time, t1 is time in the enclosing loop, t2 time in the second outer loop, and so on.

I’m also going to correct original Lucid’s biggest mistake, which was to try to get away with only one type of *where* clause. The semantics were a mess. Instead we have a plain *where* clause which simply hides definitions, and *whereloop*, used for nesting. Then we’ll define *whereloop* in terms of a simple translation into conventional *where*.

I’m not satisfied with simply reviving the original simple ‘freezing’ form of nesting. I’m proposing a more general framework that allows, for example, several rounds of the inner loop to produce one value for the outer loop. This framework uses two general purpose multi (time) dimensional operators, *active* and *contemp* (orary).

Although these two hyperfilters act on infinite dimensional hyperstreams we can understand them in terms of one- and two- dimensional streams extended point wise to the other dimensions. A *whereloop* implicitly applies *active* to all its globals and *contemp* to its result.

The operator *active* takes a stream and duplicates it over t0.

In other words, if a is the argument stream, *active(a)* repeatedly starts from scratch in each invocation of the inner loop.

The operator *contemp*, on the other hand, takes a two-dimensional stream and samples it.

In other words, if *contemp(w)* is the contemporary value of w, the value in the current enclosing time.

It’s easy to check that *contemp* and *active* are dual, that *contemp(active(a)) = a.*

If we want simple freezing we use the operator *current*, defined as

current(<g0,g2,g3,…>) = <<g0,g0,g0,…>,<g1,g1,g1,…>,<g2,g2,g2,…>,…>

The square root program becomes

```
output
where
n = input;
output =
a aΘa err < 0.0001
whereloop
N = current n;
a = 1 fby (a+N/a)/2
err = abs(N-a*a)
end;
end
```

After translating the whereloop we get

```
output
where
n = input;
output =
contemp(a aΘa err < 0.0001)
where
N = current active(n);
a = 1 fby (a+N/a)/2
err = abs(N-a*a)
end;
end
```

Inside the inner where n might look like <2,9,3,…>, *active n *will be

<<2,9,3,…>,<2,9,3,…>,<2,9,3,…>,…>

and *current active n* will be

<<2,2,2,…>,<9,9,9,…>,<3,3,3,…),…>

and now a is no longer chasing a moving target. The hyperstream a might look like

<<1,1.6666,1.478,…,1.414…,1.414…,>,<1,2.5,2.9,…,3,3,3,…>,<1,1.2..,1.6…,1.73…,1.73…,>,…>

and *a aΘa err<0.0001* will be

<<1.414…,1.414…,1..414…,>,<3,3,3,…>,<1.73…,1.73…,1.73…,…>,…>

and *contemp(a asa err<0.0001 )* is

<1.414…,3,1.73…,…>

which is what we want.

So much for the simplest form of nesting, where outer values are frozen during the inner computation. Can we do better? Yes.

Suppose we want to produce a single value that repeatedly combines several values from the outer loop (not possible in traditional Lucid). To be specific, suppose that m is a series of positive integers interrupted by 0’s. We want to produce the stream of sums of the numbers up to each 0. For example, if m is of the form

<3,2,4,0,6,0,3,8,5,0,…>

then the value of the loop should be of the form <9,6,16,…>.

Here is the program

```
m
where
n = input;
m =
sum asa y eq 0
whereloop
sum = 0 fby sum+y;
y = n Fby y after y eq 0;
end;
end
```

It uses the (ordinary) filter *after* and the hyperfilter *Fby*. The former returns that part of its first argument after its second argument is true for the first time. For example, if P begins <f,f,f,t,…> and X begins <x0,x1,x2,x3,x4,…> then *X after P is* <x4,x5,x6,…>. The definition of *after* is

```
X after P = if first P then next X
else first X fby next X after next P
fi;
```

This is standard one dimensional Lucid. But *Fby* is two dimensional; it’s *fby* in the time dimension t1. If V is <<x0,x1,x2,…>,<y0,y2,y3,…>,<z0,z1,z2,…>,…> and W is <<a0,a1,a2,…>,<b0,b1,b2,…>,<c0,c1,c2,…>,….> then *V Fby W* is

<<x0,x1,x2,…>,<a0,a1,a2,…>,<b0,b1,b2,…>,<c0,c1,c2,…>,…>

(Incidentally there does not seem to be any way to define *Fby* in terms of simpler primitives.)

Once we translate the whereloop we get

```
m
where
m = input;
n =
contemp(sum asa y eq 0)
where
sum = 0 fby sum+y;
y = active(m) Fby y after y eq 0;
end;
end
```

If m is as above then y is

<3,2,4,0,6,0,3,8,5,0,…>,<6,0,3,8,5,0,…>,<3,8,5,0,…>,…>

*sum* is

<<0,3,5,9,9,15,15,…>,<0,6,6,9,17,…>,<0,3,11,16,16,…>,…>

*sum aΘa y eq 0* is

<<9,9,9,…>,<6,6,6,…>,<16,16,16,…>,…>

and *contemp(sum aΘa y eq 0*) is <9,6,16,…>, as required.

On the other end of the spectrum, we want a loop which produces a number of values for every step in the enclosing loop. Suppose that n is a stream of natural numbers, say <2,5,8,3,…> and we want to produce a stream m that enumerates the binary digits of the components of n separated by “;”, so that m will be of the form

<0,1,”;”,1,0,1,”;”,0,0,0,1,”;”,1,1,”;”,…>

The following program does the job

```
m
where
n = input;
m =
(";" fby digit) Until (false fby k eq 0)
whereloop
k = current n fby k/2;
digit = k mod 2;
end;
end
```

The operator Until is defined recursively as

` x Until p = if First first p then (Next x Until Next p) else First first x fby (next x Until next p) fi`

Here *First* and *Next* are the t1 versions of first and next.

Does this work? I believe so, but I leave it to you as a (nontrivial) exercise to check it. Let me know iff it doesn’t work out.

It all looks hunky-dory but there is a problem: global functions. I explained that *whereloop* is translated by applying *active* to all the global (individual) variables. But what about a function f that is defined outside the *whereloop* but is called inside? The hitch is that the definition of f may contain a global g that normally should be *active*-ated. If we do nothing we end up ‘smuggling’ g inside the *whereloop* with unpredictable consequences because we have not turned it into a two dimensional stream.

We can’t just ignore global function calls. What do we do with them? We can try Yaghi code, which seems to work, but I’ll leave the details for a later time. In the meantime, there’s no problem using globally defined functions that have no globals.

]]>I’m going to set things right by releasing an up to date version of PyLucid (Python-based Lucid) and up to date instructions on how to use it to run PyLucid programs. The source can be found at pyflang.com and this blog post is the instructions. (The source also has a brief README text file.)

Go to pyflang.com and download the file pylucid.1.2.zip and unzip it. There will appear three objects: a directory *source*, a command *repl*, and a file *README.txt*. This is all you need.

The source directory contains about 30 .py files (python code). You don’t have to know anything about them if all you want to do is run PyLucid programs.

The README.txt is basically a condensed version of this post.

The repl command launches the PyLucid Read-Evaluate-Print-Loop. You repeatedly enter one-character commands (possibly with arguments) and that’s how you interact with the interpreter. No UNIX commands other than repl itself.

Actually, “evaluate” is a bit misleading because you don’t enter PyLucid code on the command line. Instead you manipulate a program stored in an invisible buffer. The program stored in the buffer is called the current “import”, and when you first launch the repl it will inform you that the current import is “etothex”. It’s a simple program that calculates e using the power series for e**x with x=1.

To view (the program in) the buffer, use the “b” command. To edit it, use “v” (which launches vi). To evaluate (run) the program, use “e”.

The e command expects the buffer to contain a *where* clause, which consists of a *subject* (the value of the clause) and an (unordered) set of equations defining variables and functions. The variables denote two-dimensional datasets and the functions denote transformations (filters) on such datasets.

The e command evaluates the subject in the context of the definitions (the *body*) of the where clause. Note that the right hand sides of the definitions can be arbitrarily complex and may, for example, contain nested where clauses.

The e command evaluates the program and displays the results. However in PyLucid the value of a program is in general two dimensional – varying in both space and time. The interpreter uses the horizontal dimension for space, and the vertical dimension for time. Both dimensions are a priori infinite so it is necessary to limit the display in both directions.

The simplest way to do this is by defining the *parameters* (distinguished variables) “rows” and “columns”. For example, if the body includes the definitions “rows=10” and “columns=3” the display will be 10 (vertically) by 3 (horizontally), showing the values for time going from 0 to 9 and space going from 0 to 2.

More sophisticated effects can be achieved using the end-of-data and end-of-space special values.

The remaining commands are straightforward and are documented by the h[elp] command. However it may not be obvious how to create a new program, import it, and then remove it.

Suppose you want to create a program called “bleen”. The command “v bleen” will create it and “i bleen” will import it. At the moment there’s no way to remove a program but that will soon be fixed.

Finally, I should mention that there are two features currently not supported because bugs showed up during testing. One is nested iteration, the other is variable binding operators. I’m working on fixing them but in the meantime I thought it was more important to get PyLucid out there.These features and others will be available in future releases Real Soon Now.

These future releases will have new features, not just bug fixes. There will be more example programs. And I will be upgrading I/O so that in particular you can specify input and output prompts.

]]>*The problem he tackled was that of dimensional analysis of multidimensional Lucid programs. This means determining, for each variable in the program, the set of relevant dimensions, those whose coordinates are necessary for evaluating individual components.*

**Objective:** to design Dimensional Analysis (DA) algorithms for the multidimensional dialect PyLucid of Lucid, the equational dataο¬ow language.

*Dataflow is hardly an unknown concept but most dataflow systems are stream based β there is only one dimension, time. Lucid, by contrast allows multiple dimensions. Evaluation is demand-driven in which demands are for values of variables at given coordinates. These demands generate demands for possibly different variables at possibly different coordinates. Values computed are cached, labeled by the coordinates needed.*

**Significance:** DA is indispensable for an efficient implementation of multidimensional Lucid and should aid the implementation of other data flow systems, such as Google’s TensorFlow.

*DA is indispensable because to retrieve a value from the cache we need to know which coordinates will form the label to be searched for. Without DA the cache would fill with duplicate entries labeled by irrelevant dimensions.*

Dataflow is a form of computation in which components of multidimensional datasets (MDDs) travel on communication lines in a network of processing stations. Each processing station incrementally transforms its input MDDs to its output, another (possibly very different) MDD.

MDDs are very common in Health Information Systems and data science in general. An important concept is that of *relevant dimension.* A dimension is relevant if the coordinate of that dimension is required to extract a value. It is very important that in calculating with MDDs we avoid non-relevant dimensions, otherwise we duplicate entries (say, in a cache) and waste time and space.

Suppose, for example, that we are measuring rainfall in a region. Each individual measurement (say, of an hourβs worth of rain) is determined by location (one dimension), day, (a second dimension) and time of day (a third dimension). All three dimensions are *a priori*relevant.

Now suppose we want the total rainfall for each day. In this MDD (call it N) the relevant dimensions are location and day, but time of day is no longer relevant and must be removed. Normally this is done manually.

**Research question:** can this process be automated?

We answer this question affirmatively by devising and testing algorithms that produce useful and reliable approximations (specifically, upper bounds) for the dimensionalities of the variables in a program. By *dimensionality* we mean the set of relevant dimensions. For example, if M is the MDD of raw rain measurements, its dimensionality is {location, day, hour}, and that of N is {location, day}. Note that the dimensionality is more than just the *rank*, which is simply the number of dimensions.

**Background**: There is extensive research on dataflow itself, which we summarize. However, an exhaustive literature search uncovered no relevant previous DA work other than that of the GLU (Granular LUcid) project in the 90s. Unfortunately the GLU project was funded privately and remains proprietary β not even the author has access to it.

*The GLU project was funded at SRI in Stanford by Mitsubishi. The GLU project did carry out some DA.*

**Methodology**: We proceeded incrementally, solving increasingly difficult instances of DA corresponding to increasingly sophisticated language features. We solved the case of one dimension (time), two dimensions (time and space), and multiple dimensions.

*These algorithms proceed by accumulating approximations. For example, to determine which variables are time sensitive, we start with those that obviously are: those defined by a fby expression. Then we add those that are defined in terms of a time sensitive variable by a data operation or a next operator. Any variable defined by first is definitively not time sensitive.*

*The accumulation process continues until it settles down and no new time sensitive variables are discovered. At that point we have to assume that all variables discovered are actually time sensitive but we can be sure that variables not added are time constant.*

We also solved the difficult problem (which the GLU team never solved) of determining the dimensionality of programs that include user defined functions, including recursively defined functions. We do this by adapting the PyLucid interpreter (to produce the DAM interpreter) to evaluating the entire program over the (finite) domain of dimensionalities.

*This is tricky because this evaluation normally will not terminate. So we instrument the evaluator by e.g. counting calls to the evaluation function. Then we cap this count, evaluate, increase the cap, re-evaluate, etc., until the values settle down. (In practice very small caps suffice.)*

**Results**: Experimentally validated algorithms that produce useful upper bounds for the dimensionalities of the variables in multidimensional PyLucid programs, including those with user defined functions.

*Our results are purely experimental, we do not provide formal proofs. But the experiments were 100% successful.*

I like to take pictures (you’ll see some here). Some of them turn out good but I’m not in the same league as the real professionals.

I’m very curious about what makes a good picture and am amused by newbie mistakes. Like not getting close enough. Or having the sun at your back so your subjects are squinting.

Or taking all your pictures in landscape mode.

You can hardly blame the newbies, cameras are set up to be used in landscape mode. To take a portrait you need the awkward hand-over-the-camera maneuver pictured above. Practically a sign of a better-than-average photographer.

I love portrait mode. Recently my wife gave me an Aura Frames smart photo frame. It’s brilliant, starting with the images. (I have no investment in AF).Your images are stored in the cloud, so you can have tons of them. They are presented as a side show, with you controlling the interval.

One of their best ideas is to allow the frame to operate in portrait mode – just place it upright on the surface, as in the picture on the left. And that brought up the question, which mode to use?

Initially I set it up in landscape mode and loaded a bunch of my pictures into the cloud. This was disappointing since pictures that weren’t landscape were crudely cropped and peoples’ feet and heads disappeared into the edge. Same with portrait, this time arms disappeared.

Aura allows you to upload entire Mac photos albums so my next step was to create two albums, with pictures (cropped properly) into landscape and portrait modes, respectively.

There were good pictures in both albums, but I soon preferred the portrait album. There was something more impressive or active about them, whereas the landscape album was laid back.

For example, political posters (and posters for movies and concerts) are invariably in portrait mode. They are designed to be energizing.

So I decided to stick with portrait only for my frame and began going through my pictures and editing them to portraits (mainly copying and resizing). I even ransacked the landscape album and was able to convert most of them to decent portrait images.

**Portrait mode in history**

It has to be said, that historically the portrait mode has been dominant. The earliest writing was done on stone or clay tablets, and they were much taller than they were wide.

Throughout history important documents designed to impress have been produced in portrait mode. Hollywood aside, we don’t know about the ten commandments but we know for sure about the Gutenberg Bible, which, like almost all books, is made of portrait pages.

Famous declarations, like the French revolutionary Rights of Man and Citizen, were portraits.

Why did portrait mode dominate for so long? A partial explanation is that text is much easier to read in portrait, because the lines are shorter. With landscape it you have to follow the long lines carefully and have trouble going back to find where the next line begins. The Gutenberg Bible is not only portrait, each page has two tall columns.

For that reason letters, reports, academic papers and in general Word documents are portrait.

**The Changing Landscape**

Why did Landscape emerge as such a strong contender? The short answer is TV, in the early 1950’s.

Portrait is good for showing one or two people, but if you want to show a group you need landscape, because groups of humans spread sideways. TV shows are almost all about groups of humans. Also TV shows have a lot of literal landscapes (like the westerns that were popular in the 50’s).

So there was really no choice about which mode to use for television, and that had a knock-on effect for other media.The same can probably be said about cameras, which in the consumer market are mainly used to take pictures of groups of people (family).

It’s now hard to imagine, but the first personal computers (like the Xerox Parc Alto) had portrait screens. They were designed to produce, edit, and display documents, so portrait was the obvious choice.

This started changing when computers began being used for other purposes, like games and, later, video. For a while there were even ‘bimodal’ monitors that could be rotated between landscape and portrait but they soon disappeared. Nowadays we’re surrounded by rigid landscape screens, with Aura Frames being an outstanding exception.

**The Aesthetics of Landscape and Portrait**

As I said, I prefer portrait. I noticed that when I edited a picture into portrait, I got better images than when I edited them to landscape. In landscape, there tended to be uninteresting spaces on either side.

Hold on then, aren’t there wasted spaces in portrait images? On the whole, no. Top and bottom are equally important. Sometimes newbies waste the top, typically by pointing the camera up so that, for example the subjects’ faces appear in the middle of the image. (This is a very common error).

Sometimes the newbies point the camera down, so that the bottom of the image is empty foreground (sometimes they achieve both).

The key to successful portrait mode pictures is to have the interesting parts in the upper half but also have something to look at in the lower part so that it’s not totally barren (there are other patterns as well).

For example, in the image on the right, the important part, the flower bowl, is at the top, while the pillar beneath it, with the flowers at the base, is also attractive if not the main event.

Similarly, in my brother’s flowers picture the important part, the blossoms, are at the top but the vase and the drooping stems keep the bottom interesting.

I don’t know of any such formula to avoid boring parts of landscape photos.

**Inside Every Portrait Image …**

… is a landscape image trying to get out. After watching my portrait photo slide show over and over I realized that this pattern was very common (and violated in images that I felt for the most part were badly composed).

For example, this picture of Berkeley’s Sather Gate looks a bit off. The top half is mostly empty blue sky, of no interest. The interesting part is in the bottom half.

Eventually I became curious about the result of cropping the image and removing the bottom, less interesting part. And keeping the top part which should be in landscape mode. The result, it turned out was in general a reasonable landscape (though not necessarily better) image.

Here are the landscape images lurking in the column and flowers pictured above.

What about Sather Gate? Extracting the top half would be pointless. The bottom half doesn’t work our either,, because the top of the Gate gets clipped. We need to crop a bit short of the bottom.

The result is a much better image.

There is no wasted space in the image.

In this case the cropping improves the image.

The question arises, what will be the aspect ratio of the cropped landscape image, assuming we take exactly 1/2 of it?

That depends on the original aspect ratio. If it’s 4:3, like my Aura frame, the new ratio is 3:2, or 4.5:3, close enough. If we want the exact same ratio, a little high school algebra tells us the ratio should be β2 : 1. This is very close to 7:5, a ratio offered by most image editing software.

Note: this is NOT the golden ratio – the golden ratio is 1.618, whereas β2 is 1.414. Any number of articles and books and articles will tell you the golden ratio is the key to artful images, but I’m suggesting something different. Which ratio is used in the Monna Lisa? Read on …

**Inside every landscape image …**

… is a portrait image struggling to get out.

As I said, I don’t view landscape images on my Aura but landscape images are omnipresent on the web. I was looking at the (all landscape) images in the New York Times. I was struck by the fact that almost all of them would benefit by a 1/2 crop to portrait.

In cropping portrait, you by default take the upper half. With landscape, the default is the middle half (with 1/4 removed on either side). But this may vary if the interesting part is not in the middle.

Here is the result of taking the image of the photographer at the beginning cropped down to the middle half.

I think it’s a better image, it really focuses on the camera and gives more prominence to the hands.

I could give dozens of examples of applying just the default rules but you get the idea.

**Testing on the Monna Lisa**.

No discussion of images is complete without talking about the Monna Lisa (correct Italian spelling), the most famous image in the world. How do my theories hold up? Quite well, as it turns out.

For a start, it’s in portrait mode. Can you think of a landscape image which is famous and the best landscape image in the world? I can’t.

And consider its aspect ratio. Recall the ideal ratio for cropping is 7:5. The Monna Lisa is 77cm x 53cm. Its aspect ratio is1.45, while 7/5 is 1.40 and β2 is 1.41. This is NOT the golden ratio.

Notice how the centre of attention, Monna Lisa’s face is solidly in the upper half. But the bottom half is not empty. We have a hint of cleavage, the robe, and the hands. It follows the formula exactly.

So let’s extract the upper half. The result is on the right.

I sort of like it. Her eyes and smile stand out more. Yet you never see this image, only the full portrait.

**Inside every portrait …**

… is a smaller portrait struggling to get out. And inside every landscape is another landscape …

You’ve probably noticed an interesting corollary of my rules. Portraits can be cropped to landscapes, but they in turn can be cropped to portraits. In other words, two stages of cropping can reduce a portrait to a smaller portrait.

Let’s try that with the cropped Monna Lisa landscape. The result is on the left. I like it too, again because it focuses on the eyes and the smile even more. I feel she’s looking right at me.

Let’s try 1/2 cropping on the portrait version of the photographer. It gives us an interesting take,

In fact we can 1/2 crop again, and we get the second portrait on the left.

The image is starting to get square because the original aspect ratio was not close to β2 . The aspect ratios of the crops oscillates.

**Take better pictures**

These considerations aren’t just of intellectual interest. They can help you take better pictures (or drawings, or paintings).

For portrait mode,

- decide what is the interesting part of the image
- get close enough so that the interesting part almost fills the frame from right to left
- point camera down so that the interesting part is right at the top
- make sure there is something interesting in the bottom half (foreground)

For landscape

- place the interesting part by default in the center
- make sure it fills the frame from top to bottom (get close)
- have an interesting background (e.g. a hedge or a sea view)

These rules aren’t logically complete – you can take good pictures not following them. Also, I’ve been told that the real pros don’t need rules, they just go by gut feel. But for the gutless rest of us, rules are better than nothing.

]]>In other words, (seemingly) long, and boring. Like so many people’s technical talks. What can you do?

What you can do is follow these simple rules I’m going to give. They’re not all my own, you can find most elsewhere. The problem is, most people think they’re impractical and don’t follow them. Result? Bo-ring!

Let’s call my student “Sam” (not their real name). The first mistake Sam made was to waste a slide giving their name and affiliation, which everybody knew. So the first rule is:

1 – **Don’t waste the audience’s time telling them things they already know.**

“But”, I hear you say, “slides are cheap!”. No they’re not. You pay in Β terms of the audience’s *attention*. You start with a certain amount, and if it drops to 0, they stop listening.Β

My late dear friend Ed Ashcroft told people about “Ashcroft’s constant”, which, he’d tell us, was eight. Eight what’s? Eight is the maximum number of slides you should give in a talk, whether 20 minutes or 50 minutes.

When I told Sam about this you should have seen the look on their face. I’m sure they were thinking of 20-30 slides. Think of the preparation!

Actually, if you follow my rules you might get away with 12. At the most, So the second rule is,

**2 – Present at most 12 slides, no matter the length of the talk**

Again, this seems impractical, but the rules will eliminate many slides.

The second mistake Sam made was to waste another slide with an overview of their talk. You’ve all seen them, “Introduction, Background, Objectives … Conclusions”. There’s no real information here, it cost Sam a lot of attention, and sent the message “this talk is going to be tedious”. So

**3 – Do not have an overview slide.**

To make things worse, Sam then proceeded to *read* the overview slide (while people started looking at their watches). This violates the rule about telling the audience what they already know – the overview is right in front of them. Also, it violates a more general (and very important) rule, namely

**4 – Do not read text from a slide.**

There are many reasons why this is a very bad idea. It means telling people what they (now) already know. It means giving the audience two forms of the same information. Invariably, their brains will compare the two forms looking for a difference. Or ignore them both. It means you as a speaker will invariably look away from the audience and look at the slide. Disastrous.

Sam’s talk was about analyzing Lucid programs. The next slide should have been an actual program. Instead it was more text about the analysis – read by Sam, of course. The relevant rule is

**4.5 – Visual, visual, visual**

Pay no attention to people who claim “I’m not a visual learner”. Everyone with functioning eyesight is a visual learner.

For example, Sam could have left the program up and talked about various properties of Lucid. And explained the statements. And the kind of analysis required. And the difficulties involved.

The best part of Sam’s presentation was a slide of a dataflow network (a simple one, to generate the Fibonacci numbers) which Sam explained, including the operation of *first*, *next*, and *fby*.

Given that you don’t want to present a lot of text, my general rule of thumb is

**5. Make each slide an image or a quote, possibly with captions.**

In general, people prefer (rightly so) examples over general rules. So.

**6 – Present examples before general rules**

And for heaven’s sake, keep the examples really simple.

**7 – No complex slides, whether text or images**Β

The worst offenders are slides of mathematical formulas and equations. One math slide is ok, if the formula or equation is simple and in big type. Of course, if the talk is about mathematical research, more, simple math slides are allowable.

The last mistake Sam made was to go for 24 minutes, when they were allotted only 20 minutes.

**8 – Do not ever, EVER, go over time.**

It’s futile because your audience will be looking at their watches, wondering when you’re going to stop, rather than paying attention to you. Even if you’ve done well up to now, leave them wanting more. Did I mention, don’t go over time?

Well I eat my own dogfood. If this were a talk every rule would be a slide, and that’s eight. Of course these rules aren’t 100% rigid, but violate them only if you know what you’re doing.

My last word is, don’t assume the audience’s attention is infinite, don’t waste it, increase it.

And don’t assume a technical talk has to be boring!

Β

]]>— Ted Nelson

Ted Nelson invented hypertext but not the web. He thinks it hasn’t fulfilled its real potential, and I agree.

One of his good ideas that the web doesn’t really support is stretchtext – text that expands or contracts in response to the reader’s (dis)interest.

In my opinion it’s sorely needed. I spend a lot of time reading on the web and most of it is skimming/speedreading. For example, I often scan down text reading only the first sentences of each paragraph, unless I hit something interesting.

I won’t try to present Nelson’s exact ideas, I won’t try to speak for him. Instead I’ll take the basic idea of stretch text as a starting point and propose various of my own ideas. If some of them are poor, it’s my fault not Nelson’s.

**Drop text**

Probably the simplest and most obvious form of stretch text is what I call *drop text.* You sometimes see it on the Web. You have a text divided into sections, each with a section heading. But you don’t see the sections, just the headings. However at the beginning of each heading there is (say) an arrow pointing to the right. If you click the arrow, the arrow points down and the section appears (drops).

Drop text saves you having to read/skim the hidden sections — the visible title should be enough to determine whether you’re interested. It could also save you from having to look at material that is clearly not relevant e.g. “Obtaining a work visa” if you already have one.

You occasionally see variants of drop text, most commonly instead of an arrow a plus in a box (when the section is hidden) or a minus in a box when the section has dropped.

There are variations of drop text (currently very rare) that would be useful. For example, clicking on the arrow could rotate it down to only 45 degrees, causing only the first line or two of the section to appear. This would give you a better idea of whether you want to see the whole section. If you do, you click (say) below the arrow to turn it down to 90 degrees. Otherwise you click above it to send it back and hide the section.

**Kill text**

Sometimes even just the headlines are distracting and you ‘d like to make them go away. That’s the idea of *kill text.*

To the left of the drop text arrow we could have (say) a big X. Clicking on the X would cause the whole headline to disappear, along with the arrow and the X itself. Nothing left.

When you come across a list of headings you can quickly zap the ones you’re not interested in then try out the remains ones. Clearing out the irrelevant (to you) material saves you time and attention and allows you to focus on what’s important. Furthermore, once you’re finished reading a section you could kill it too. Eventually you could kill off everything for one reason or the other leaving nothing left – you’d know you’re done with that (say) page.

Images could also be killable (or shrinkable, while we’re at it). Ads too, though there would be strong resistance to such a feature. It would be useful to to be able to kill off individual paragraphs in an article, or delete everything up to the end.

Using kill text on an article is the literary equivalent of cropping a photo. As experienced photographers know, cropping often makes a major improvement in the quality of a picture, by removing uninteresting parts. The same could be true of kill text.

**Killing and growing**

If stretch text means making unwanted text disappear, kill text is its logical conclusion. Another way to use it is while reading. Once you’ve read a paragraph, zap it and let the remaining part of the article scroll up.

Of course you don’t have to read the whole paragraph before killing it. They could all be preset at 45 degrees giving you the choice of expanding it or killing it. This is basically what happens when you skim an article, you decide paragraph by paragraph whether to read or skip.

There is another notion complementary to kill text, namely what I call *grow text*. At the bottom of the page you have a logo, say the image of a seed. If you click it it ‘grows’ into a paragraph or so, with another seed below it. You could click away at the seeds till you’ve lined up some reading material, then kill it off gradually as you read it. This could be an interesting way to present a lecture (on big screen with stretch text enabled).

Of course you don’t have to kill off everything. You could leave important parts behind, perhaps temporarily collapsed. By so doing you create your own summary of the document that you could save for future reference.

**Shallow text**

One idea Nelson had in mind was text that could be expanded or contracted by changing a ‘depth’ lever. At one end of its range you would get the full document, at the other a brief summary.

We could perhaps enlist AI to do the summarizing but it’s more realistic to expect the author to prepare the summarization. One idea is to use special parentheses (e.g. curly braces) to delimit text to be omitted in the shorter versions. So if they write

*Kill text {and grow text} can make reading easier*

that specifies that the phrase *and grow text* be omitted from the shorter version. One refinement is to allow replacement text in the shorter version, e.g.

*Kill text {etc | and grow text} can make reading easier*

specifies that the shorter version is

*Kill text etc can make reading easier.*

Another idea is to allow nested curly brackets to specify more than one level of shortening (note that the parenthesized phrases must be collapsed from the inside out).

Length is just one measure of shallowness/depth. We could have other dimensions, like reading level or technical content. This you could order the full length article but with lower technical content (e.g., omitting math formulas). Authoring such multidimensional texts could be a problem.

**Persistence**

Can you save transformed texts? I think that’s very important.

Suppose a Physics student has a stretch text enabled version of the book for the course. As she reads it she kills and shrinks. The first to go is the forward (one reading is enough) and then the introduction. Then she zaps background material (like Newton’s laws) that she already knows, together with the pictures of Newton and his famous tree.

She has a list of topics for the course so she kills off whole chapters that aren’t required. As she works through the remaining chapters she reads then eliminates the informal explanations and some of the diagrams. But keeps all the formulas.

In this way she ends up wanting to retain a much more concise document. It’s vital that she be able to save it, because it’s her own personal summary text without all the dross between the important stuff.

She’d also keep a selection of the harder exercises, each terminated with a seed that grows into the answer.

**Once I built a railroad … [… buddy, can you spare a dime?]**

The obvious question is, why don’t I eat my own dog food? This article is rigid, unstretchable, unzapple when it could clearly benefit from some of the features described.

The problem is that currently none of this is implemented. It used to be – I had an authoring system using the MMP macro processor. Some of the macros were pretty intricate to write but easy to use and you could quickly mark up drop text, kill text, grow text and shallow text. However the machine it all ran on was eventually put out to pasture, and now I got nuthin’.

One of the important features of MMP was versioning. As the reader kills, grows, drops etc these decisions are recorded in a version expression, essentially an (eventually) large collection of parameter settings. As long as you have the original document and the version, you can recreate the final result of the reading process. Thus only the version need be saved.

Well, I plan to resurrect MMP somehow and then we can try out these Ted Nelson inspired schemes.

]]>PyFL now has type checking – without type declarations. Instead the type is produced by evaluating the program over the domain of types.

In PyFL gone are all the things that ordinary people find difficult or downright weird: monads, mandatory currying, post- or prefix notation, pattern matching, etc. Instead infix notation and *f(x,y,z) *syntax for function application. The weird stuff has it’s proponents but PyFL proves it’s not inherently part of functional programming.

Gone in particular are cumbersome, verbose type declarations. In most examples of programming in Haskell it all begins with these declarations. In PyFL you skip this stage.

That doesn’t mean you can e.g. multiply strings without hearing about it. PyFL is dynamically typed and checks at runtime if calculations make sense.

However runtime calculations mean you have to actually run the program to find these type errors – and be lucky enough to encounter them. They won’t necessarily reveal themselves every time out.

As one retired professional pointed out, you may release the software and then have tens of thousands of users running afoul of the dynamic type checker.

So there’s still a need for static analysis to avoid runtime errors. A need for compile time type checking.

But how do you type check a language with no type declarations? I really don’t want to add them – that would be a big retreat from the “for the rest of us” principle.

Fortunately there’s a solution – type inference. That means analyzing the program and deducing at least some of the types, without bothering the programmer.

For example in

```
a + f(b)
where
a = 4;
b = a+9;
f(n) = if n<2 then 1 else 3*f(n-1)+1 fi;
end
```

it’s obvious that *a* is an integer, and from that that *b* is as well. These facts follow from the basic type rule that the sum of two integers is an integer. There is no need for a declaration *b:int*.

It also appears that *f(b)* is an integer although at this stage it’s not clear how you would formally justify it. That’s the problem we’ll solve.

Inferring a type is like calculation except you discard the actual data and combine types instead. Using rules like

```
int + int = int
num + int = num
int * int = int
int / int = num
num < num = bool
if bool then int else int fi = int
```

Of course you lose information; in particular, you don’t know which arm of a conditional will be selected. This means evaluating recursive programs is problematic because they use conditionals to trigger termination.

**Subtypes**

Notice that this type scheme has both *int* (integer) and *num* (numeric). And notice that *int* is also numeric, e..g. *num+int=num*. Here *int* is a subtype of *num*. There are two other subtypes: *intlist* is a subtype of *numlist* which in turn is a subtype of *list*. The head of an *intlist* is an *int*, the tail of a *numlist* is a *numlist*, and so on.

Implementing this partial order required a lot of coding and I skipped having separate *stringlist* and *wordlist* types, not to mention *listlist*. I wrote a function *sb(p,q)* which tests if *p* is a subtype of *q*, and a function *lub(p,q)* which gives the least upper bound of *p* and *q*.

Clearly my domain falls far short of that necessary to imitate the fine distinctions of Haskell type declarations. For example, in Haskell you can declare a variable to be a list of lists of integers. I’d need an infinite partial order. But it works on a broad range of programs and catches a broad range of errors.

**Calculating types**

So the plan I came up with is to take the PyFL evaluator and modify it so that instead of producing the actual data output by a program, it outputs the types produced by ‘running’ the program over the abstract domain of types. This is the basic idea of abstract interpretation and is hardly original with me.

One of the advantages of this scheme is that not only does it avoid programmer type declarations, it avoids function types. Here is a higher order variant of the program given above

```
a + f(b)
where
db(g) = lambda (x) g(g(x)) end;
inc(u) = u+2;
f = db(inc);
a = 4;
b = 5;
end
```

It evaluates to *int*, in spite of the involvement of the second order function *db*.

**Handling recursion**

The problem with recursion is that the evaluation doesn’t terminate. This stumped me for a while.

For example, consider the program

```
f(7)
where
f(n) = if n<2 then 1 else n*f(n-1) fi;
end
```

When run with the usual evaluator, it quickly gives the correct answer, 5040. The recursion terminated when the *if-then-else-fi* finally selected the first alternative. But when running with the type evaluator, the *if* condition was merely *bool* and both alternatives had to be explored. This resulted in an infinite recursion, the python runtime stack overflowed, there was a segmentation fault and python crashed. Needless to say, no type information was produced

In the meantime I’d added metrics to the evaluator, to see how much computation was generated. One of these metrics was the number of calls to the evaluation function. These additions were carried over to the alternate types-only evaluator.

Finally it dawned on me that I could force termination by putting a cap on this metric – say, 30. I could see what the result was, then increase the cap, look at the new result, increase the cap again, until everything settled down.

This worked even better than I expected. I tried it on a similar scheme for computing dimensionalities in pylucid and it turned out even small values of the cap gave correct answers.

The puzzle here is that this scheme seems obvious yet I’ve never seen it in print. Or any discussion of how you avoid nontermination while evaluating over an alternate domain. Yet evaluating over an alternate domain is the basic idea of abstract interpretation. If anyone has any insight, please share with me.

So what I did was set a cap of 30 on the number of calls to the evaluate function. It turned out that 30 is a lot and I could have gotten away with a much smaller cap, but so what?

Now, when I run the type evaluator, it quickly halts and produces … *int*, correct. It has succeeded in deducing that the factorial of an integer is an integer. Without programmer input.

**The type none**

The question arose, what does an evaluation return when it’s throttled? I guessed that it should return the bottom element of the type domain. Since there wasn’t a bottom element (yet) I added one: the type *none*. The type *none* is not the type of any data object. If you think of a type as a set (the set of all objects of that type), then *none* is the empty set.

I had to work out the rules for calculating with *none*. Since *none* is a subtype of, say, *num*, it can function as a *num*, so we have the rule *none+num=num*. (And *none+none=none*.)

Also if we are evaluating *f(x,y)* and *f* evaluates to *none*, the result should be *none*. I admit this was guesswork but it gives the right answers.

This required rewriting *sb* and *lub*, which got fairly complex. If I want to expand my domain I’ll have to come up with something more systematic.

**Y, the ultimate test**

The factorial program works because recursion is built in to PyFL (and Haskell), the evaluator evaluates the definition of *fac* in the same environment that it’s defined. But what if recursion wasn’t built in?Could we still define factorial? This was the problem facing the developers of the lambda calculus (which doesn’t have built-in recursion).

At first it seemed unlikely. But then Curry invented the Y combinator. Here it is

`\f (\x f(x x)) (\x f(x x))`

It’s just a small \-expression with the magic property that Yf reduces to f(Yf). It has a rather nice definition in PyFL notation, namely

`Y(f) = g(g) where g(x) = f(x(x)) end;`

In this form it’s easy to see that it works. We substitute g for x in the equation and get g(g) = f(g(g)).

This works fine in PyFL: the program

```
fac(7)
where
Y(f) = g(g) where g(x) = f(x(x)) end;
a(f) = lambda (n) if n<2 then 1 else n*f(n-1) fi end;
fac = Y(a);
end
```

evaluates to 5040, the right answer.

Note that this program is nonrecursive: no variable is defined directly or indirectly in terms of itself.

This program cannot be written in Haskell. It uses self application which can’t be typed. Obviously we can write factorial in Haskell, but not this way. We have to resort to Haskell’s built-in recursion, say with the definition *Y(f) = f(Y(f))*.

Then the question arises, what about PyFL’s type checking? What happens when we evaluate the type of this program?

The answer is … *int*! Yes, PyFL has typed an untypable program!

To be honest, I wasn’t sure this would work. But in retrospect, if PyFL can handle the definition of Y over the integers, why not over types?

There’s more to type checking than Haskell’s rigid declarations.

]]>