~ Georg Cantor

Recently NETFLIX released a documentary on mathematical concept of infinity, titled *A Trip to Infinity*. NETFLIX’s trip is a bad one.

The first twenty minutes or so are reasonable but after that it descends into mysticism, sensationalism, speculation and mathematical nonsense. The average intelligent viewer will be left confused and intimidated and will have a poorer understanding of infinity than they had before watching *Trip.*

The worst mistake they make is referring to infinity in the singular, as if there is only one infinity. In fact in the first twenty minutes they make it clear that there is more than one, then return to talking about “infinity” instead of “infinities”.

**Many infinities**

When people talk about “infinity” they think first of the natural numbers and the set {0,1,2,3,…}. But there is also the set of points on the real line between 0 and 1. It’s not obvious but these infinities are not the same.

The study of infinities was initiated by Georg Cantor more than 100 years ago. He was the first to realize that there are many infinities and that they are linearly ordered by size – given two infinities one is at least as big as another.

He had a very simple rule for comparing infinite sets: S1 ≤ S2 (S2 has at least as many elements as S1) if there is a one-to-one comparison between S1 and a subset of S2. In particular, he proved that is S1 ≤ S2 and S2 ≤ S1, there is a one-to-one correspondence between elements of S1 and S2.

For example, if *E* is the set of even numbers and *S* the set *S* of square numbers then *E* *≤* *S* because the correspondence *2n ⟷ n ^{2} *is a one-one correspondence between

Similar arguments show that *S* and *E* have the same number if elements as {0,1,2,3,…} (this set is called 𝜔). In fact any infinite set whose elements can be enumerated has the same number of elements as 𝜔. The number of elements of a set is called its *cardinality*. The cardinality of 𝜔 is the smallest infinity and is called ℵ_{0}

Is it the only one? It’s not hard to see that any other infinity must be larger than ℵ_{0}.

This is where *Trip* goes off the rails. We need to compare 𝜔 and the set ℛ of real numbers.

**The cardinality of ℛ**

Cantor proved that the cardinality of ℛ is greater than ℵ_{0} with a famous proof that has been modified and adapted many times.

Suppose we could enumerate the elements of ℛ. We could lay out the enumeration in a two dimensional table, like this

1 .3 1 4 1 5 ..

2 .2 7 1 8 2 …

3 .3 3 3 3 3 …

4 .1 4 1 5 9 …

5 .6 9 3 1 4 …

…

Now consider the diagonal number . 37354 … . This number might be somewhere in the list. But let’s take it and change each digit, say by adding 1 mod 10, giving .48465 … . This number can’t be in the list because it differs from the n_{th} number in the n^{th} digit (we have to take care of repeating 9s). So the set of real numbers between 0 and 1 can’t have cardinality ℵ_{0}.

This is Cantor’s diagonal argument and it shows ℵ_{0} < the cardinality of ℛ. There are at least two infinities. The cardinality of ℛ is called ℶ_{1}.

By this point *Trip* goes back to talking about “infinity” and the experts are staring at what looks like a billiard ball which its supposedly the universe …. or something.

Meanwhile back in mathland it gets even more interesting. ℵ_{0} and ℶ_{1} are good for very many sets. The set of pairs of natural numbers and in fact the set of finite sequences of natural numbers has cardinality ℵ_{0}. Thus there are only ℵ_{0} many polynomials with integer coefficients.

The same is true of the set *Q* of rational numbers and the set of finite sequences of rational numbers. That means only ℵ_{0} many polynomials with rational coefficients

Similar results hold for ℛ. There are ℶ_{1} polynomials with real coefficients.

**The cardinals ℶ _{2}, ℶ_{3}, ℶ_{4}, ,…**

But now consider drawings on the plane. If every pair of rationals is a pixel which can be black or white, there are ℶ_{1} possible images. And if every pair of reals is a pixel, there are more than ℶ_{1} possible images.

How many more? Cantor’s diagonal argument can be generalized to show that the set of subsets of a set has a bigger cardinality than that of the set. If the set has cardinality 𝜅 then the powerset *℘(S) *(the set of all subsets of *S*) has cardinality 2^{𝞳}, which is bigger than 𝜅.

An image on the plane is a subset of ℛxℛ and the set of such images has cardinality 2^{ℶ1}. So now there are three infinities. (2^{ℶ1} is called ℶ_{2})

We can continue taking power sets and generate a sequence ℶ_{1}, 2^{ℶ1} (=ℶ_{2}), 2^{ℶ2}(=ℶ_{3}), … of bigger cardinalities. So there are at least ℵ_{0} many infinities.

Nothing of this, which is mind boggling, is described in *Trip*. Instead they’re speculating about an orange in a box … which supposedly disintegrates then reassembles itself.

**The continuum hypothesis**

Back to the math, which really gets interesting. When Cantor discovered the power-of-two series ℶ he naturally started wondering if they were all the cardinalities. In particular, he wondered if there is a set of reals whose cardinality is greater than ℵ_{0} but less than ℶ_{1}. (Incidentally ℶ_{1} = 2^{ℵ0}).

He tried for many years to settle the question but never succeeded, neither finding such a set nor proving that none exists. He was forced to leave what he called the “continuum hypothesis” (CH) unresolved.

It wasn’t till 1938 that any progress was made. The famous logician Kurt Gödel proved that it’s not possible to refute the continuum hypothesis. So it’s either true or else …

… the *or else* was demonstrated in 1963 by Paul Cohen. He showed that it is not possible to *prove* the continuum hypothesis either. In other words, CH is *independent *of the usual axioms of set theory (which are taken to be the axioms of modern math).

A lot of progress was made after Cohen’s proof but nothing decisive. It turns out that the axioms of math have little to say about the cardinalities between ℵ_{0 }(=ℶ_{0}) and ℶ_{1}. There could be a couple or ℵ_{0 } many or more.

This is interesting, even mind boggling, but none of it shows up in *Trip*. At this point they’re speculating that eventually the heat death of the universe will kill off humanity.

**Constructibility**

Since Cantor, mathematicians have been searching for a plausible extra axiom that will settle the continuum hypothesis. There is no consensus.

In my opinion, however, there is one obvious candidate, namely the *axiom of constructibility*.

This axiom, sometimes written “V=L”, says that every set is “constructible”. Roughly speaking, the constructible sets are those that are definable in terms of simpler constructible sets, and aren’t just introduced arbitrarily at random.

Gödel introduced the constructible sets to prove his partial independence results. V=L implies the general continuum hypothesis (ℵ_{𝛂} = ℶ_{𝛂} for all 𝛂) and the axiom of choice plus a whole lot of other results.

My championing of V=L is probably due to my computer science background, where we encounter many recursive definitions. As a general rule we take the meaning of a recursive definition X = f(X) to be the *least fixed point* of the equation. And we calculate the least fixed point by starting with nothing (∅) and iterating ∅,f(∅),f(f(∅)),f(f(f(∅))),… then taking the limit.

The ordinary axioms of math imply that any family F of sets is closed under definitions; so the family V of all sets satisfies the recursive definition V = ∅ ∪ 𝒟(V), where 𝒟(X) = sets definable from X. If we apply the iterative/cumulative procedure described above, we get L, the family of constructible sets.

I’m sure that this is the ‘right’ thing to do and that the axiom of constructibility should be considered as ‘true’.

**Beware of Pop Science**

So *Trip to Infinity* is bad news. Unfortunately it’s not alone. Trip veered into physics and almost every pop explanation of physics is just as bad.

Speculation, sensationalism, paradoxes, misinformation. Blackholes, wormholes, time travel and the like.

One frequently repeated example of false facts is the claim that bodies are collections of atoms with nothing in between. In reality the space between atoms is filled with *fields*: electromagnetic, gravitational, who knows what else. Every physicist knows this, but only too many are willing to go before the public and declare otherwise.

This rotten pop science is basically a plot to make people feel stupid. They’re intimidated because what they see doesn’t make sense and they conclude they’re not smart enough for science.

A good example in *Trip* comes when they discuss cardinal arithmetic (cardinals can be added, multiplied etc like integers). They show the equation ∞ + 1 = ∞, which is true if ∞ is an infinite cardinal,

then proceed to subtract ∞ from both sides, giving 0=1, a howling contradiction. And that’s where they leave it.

What is the viewer supposed to make of this? That infinity is a contradiction?

In fact, all it means is that not all the rules of finite arithmetic apply to cardinal arithmetic. No big deal. There’s nothing wrong with your brain and 0 is not equal to 1.

Pop science has tried to scramble your brains.

]]>–

*[All the images in this post were produced with generative AI – Midjourney,**DALL-E 2, Stable diffusion.]*

I’d like to give you my thoughts on the recent amazing developments in AI (Artificial Intelligence).

I’m a retired (emeritus) professor of computer science at the University of Victoria, Canada. I ought to know a bit about AI because I taught the Department’s introduction to AI course many times.

All I can say is thank God I’m retired. I couldn’t have kept up with the breakthroughs in translation, game playing, and especially generative AI.

When I taught AI, it was mainly Good Old Fashioned AI (GOFAI). GOFAI is largely searching of trees and graphs. I retired in 2015, just before the death of GOFAI. I dodged a bullet.

I am in awe of NFAI (New-Fangled AI) yet I still don’t completely understand how it works. But I do understand GOFAI and I’d like to share my awe of NFAI and my understanding of why GOFAI is not awesome.

**Seek and Ye Shall Find**

For a long time AI was almost a joke amongst non-AI computer scientists. There was so much hype but the hyped potential breakthroughs never materialized. One common quip was that AI was actually natural stupidity.

Many departments, like my own, basically boycotted the subject, maybe only offering a single introductory course

The heart of GOFAI is searching – of trees and, more generally, graphs. For many decades the benchmark for tree searching was chess. Generations (literally) of AI researchers followed the program first proposed by Norbert Wiener in the 1940s, based on searching the chess game tree. Every ten years AI evangelists would promise that computer chess mastery was only ten years away

Wiener’s idea, described in his pioneering book *Cybernetics,* was a min/max search of the game tree, resorting to a heuristic to evaluate positions when the search got too deep.

The chess game tree gets big very quickly and it wasn’t until decades later (the late 1990’s) that IBM marshalled the horsepower to realize Wiener’s dream. They built a special purpose machine, Deep Blue, capable of examining 100 million positions per second. Deep Blue eventually won, first a game, then a whole match, against Gary Kasparov, the world champion.

Deep Blue was the high water mark of GOFAI and there was no real followup. Deep Blue’s successor, Watson, could win at Mastermind but commercial applications never materialized.

**AlphaGo and AlphaZero**

I was impressed by Deep Blue but wondered about the game of Go (Baduk, Wei-chi). The board is 19×19 and the game tree is incomparably bigger than that of chess. If you’d asked me at the time I would have said Go mastery was inconceivable (which, if we had to use GOFAI, was true).

Then in 2016 the unthinkable occurred: a program, called “AlphaGo”, started beating human Go champions. It did not use Wiener’s approach; instead it used Machine Learning (ML).

AlphaGo trained by playing millions of games against itself. Originally it was given hundreds of thousands of expert level human games but its successor, AlphaZero, dispensed with them and simply taught itself. It took only a few hours to reach expert level, which for humans took hundreds of years. Variants of the software mastered chess, checkers, and shogi in a similar fashion.

**Google Translate**

About the same time users of Google Translate noticed a sudden dramatic increase in the quality of its translations, although Google at the time said nothing. It turned out that Google had silently switched from statistical translation to a neural net based approach.

In statistical translation there are typically many possibilities for translating a particular phrase. The machine use the neighbouring phrases to compute the probability of each translation, then takes the most likely.

The problem is that the information which determines which translation is correct may lie far away and the local phrases may be irrelevant. For example, there are four French translations of the English word *important: important, importante, importants,importantes. *Which one to choose depends on what the adjective refers to, and the reference may be far away. Statistical translation never produced reliably good results.

Neural networks, on the other hand, are connected in a sequence of layers, and every node on each layer can in principal be connected to any node in the previous layer. Thus the translation of any phrase can be influenced by any words anywhere in the text, not just neighbouring words. At least that’s how I understand it.

Currently Google’s translations are spectacularly good (and Deepl’s even better). In translating into French, for example, it usually gets the pronouns right as well a well as the gender/number agreements. If you want perfect output you still have to at least proofread the output because occasionally there are mistakes, usually minor.

Here is a translation from *Le Monde*. It’s perfect.

Since the beginning of the conflict, Russia has refused to describe its invasion of Ukraine as a war, calling it a “special military operation”. Shortly after the conflict broke out, the authorities banned the Russian media from using any information other than official statements on the subject and banned the use of words such as “war” and “invasion”, under penalty of being punished by the law.

GOFAI never produced really good translations, relying as it did on facts and rules and probabilities. I remember the whole class getting a good laugh out of the mistakes. Once, translating a news article about Iraq from German, it claimed Saddam “warned the US before an attack” when what it should have said was “Saddam warned the US *of* an attack”. Modern systems would never make such a basic mistake.

**Generative Speech**

Then comes GPT 3 (and more recently, GPT 4) and the other systems that generate speech. Originally they were merely predictors that would complete incomplete sentences. But now they run on and produce whole articles. Which (usually) make sense. Here’s an example:

*Author Prompt:** Write me an article about creativity please *

*GPT-3 Response:** I’ll be honest. I don’t know much about creativity. I’m not a psychologist and I’m not a creative myself. I’m just a guy who likes to read about psychology and creativity. What I can do, though, is write you an article about creativity that’s based on what I have read. That way you get a proper article, but you don’t have to pay for the expertise of a psychologist or a creative person.*

You often hear that generative text is just sophisticated auto-complete, that something like GPT-3 is repeatedly predicting the next few words. This can’t be the whole story. GPT-3 texts typically have a beginning, middle, and end, and a blinkered what-comes-next algorithm wouldn’t produce this kind of structure.

What’s misleading is that readers may assume that the next few words are produced based on the *last few words* but that’s not how the AI chatbots work. The next few words are based on the whole document so far and the neural network allows distant parts to be taken into account.

GOFAI would content itself with using only the last few words and never achieved anything along these lines. But then my mind was well and truly boggled by …

**Generative Art**

Along came DALL-E and DALL-E 2. But it wasn’t till Stable Diffusion was released that I started paying attention. Of course there was the pictures of astronauts on horseback and cats wearing sunglasses. But what really impressed me was pictures *in the style of* well known artists. Here are two of my favourites :

The first is an abstract image in the style of Picasso. I can’t find the original but MidJourney’s version is just marvellous. I’d have no hesitation to print it, frame it, and hang it on my wall.

My second favourite is a wonderful portrait of Superman – ‘by’ Rembrandt! As one observer commented, “those eyes have seen some stuff!”

But even the cheesy astronaut image is impressive.

The striking fact is that *you can’t see the astronaut’s left leg*. The image generator seems to understand that you can’t see through opaque objects (namely, the horse).

GOFAI would need literally hundreds of rules just about what to do when bodies overlap, what to show, what objects are transparent and to what degree etc etc.

**On reflection**

OK let’s go all in – let’s look at a cat wearing sunglasses. Ew, cheesy – but there’s something remarkable about the image.

It’s the reflections in the lenses of the sunglasses. Not only are they visible, but the reflections are, correctly, the same. How does Midjourney coordinate the images in separate parts of the picture?

My guess is it’s the same reason neural net translators can coordinate different parts of a text – neural nets can combine different parts of an image. They”re not limited to purely local computations.

**A closer look**

When I see this image I have to ask, where did all this come from? Midjourney is trained on 5 billion images but condenses this training to 5 GB. So there’s not enough room to include exact copies of images found in the training set. We can assume that this (apparent) photo does not exist as-is on the internet.

In particular what about the blue feathers on either side of the subject’s neck (they are not mirror images). Where did they come from? Did one of the training images have them?

The mystery is that this image is the result of combining training set images, but how are they put together? The best GOFAI could do is chop up the training images and put them together like a badly fitting crossword puzzle with visible seams and limited symmetry.

**The social implications of AI technology**

*It is questionable if all the mechanical inventions yet made have lightened the day’s toil of any human being.*

~ *John Stuart Mill*

There is a lot of controversy over Midjourney and other generative image programs.

The first question is, are these images art? I think some of the images presented here are definitely art, even good art. If you’re not convinced, have another ‘Rembrandt’.

The second question is, is imitating the *style* of certain artists fair? I don’t know, but there seems no way to stop it. Currently nothing stops a human artist from studying other human artists and imitating their styles. Midjourney etc are just especially good at this.

In a sense, this imitation broadens the exposure of the imitated artists. Now everyone can have, say, a Monet of their own.

Finally, a vital question is, how will this affect today’s working artists? Here the answer is not so optimistic.

Generative AI is not the first disruptive technology. There’s photography, the closest analog, digital art in general, the telephone, the automobile, the record player, the printing press, and so on.

Each of these had the effect of obsoleting the skills of whole professions. It didn’t wipe them out, but the vast increase in productivity put large numbers out of work. And those that remained had to acquire and use the new tools. Because of economic competition they had to work harder than ever to keep up.

Labor-saving technology inevitably becomes profit-generating technology. The tractor is an example. Initially it (and farm machinery in general) were marketed as labor-saving. But eventually competition forced every farmer to get machinery or sell out (which most had to do). The result was the same or more food produced by a fraction of the former number of farmers, working their butts off.

So I predict generative AI will indeed be a real threat to the careers and livelihoods of working commercial artists. Why should an editor commission an artist to produce an illustration for an article on Superman when typing in a prompt – a short paragraph – can produce an image like the one above?Almost instantly. For free.

I know this sounds pessimistic but it’s totally in line with the history of other disruptive technologies (aren’t all new technologies disruptive?). When the camera was perfected, it put portrait painters out of work. Why commission a portrait when a semiskilled person can just point a camera and press a button?

But it’s not all bad news. Photography was not as simple as it seemed. Soon everyone realized you needed skilled photographers to take really good pictures. Similarly generative art is not that simple – you need people with a flair for prompts to produce good results.

Furthermore, photography made it possible for every family to have portraits, not just well-off families. Soon everyone could take pictures, and some got very good at it. Painting did not die out, many artists repurposed to landscapes and the like, meant for the general public. The result was more good art for everyone.

In short the downside is that many existing artists will be forced out of the profession while those that stay will be forced to learn the new tools and inevitably work harder with them.

The upside is vastly more art to enjoy, most of it original. It will be like drinking from a firehose.

]]>**The Stone Age of computing.**

When I got into computing, in 1965 at UBC in Vancouver, it was all very primitive. I took a numerical analysis course while doing a math degree.

I learned Fortran and later, that summer, IBM 7040 assembler, which I got pretty good at. However when I moved on to Berkeley to do a math Phd I almost completely dropped computing because I thought it was primitive – punched cards and Fortran – and not really going anywhere. But I kept my hand in.

**LISP**

While I was at Berkeley I was introduced to LISP and again my mind was boggled. I was very taken by recursively defined algorithms working on hierarchical structures. FORTRAN didn’t support recursion nor did it have hierarchical data structures.

Given that FORTRAN was my first language these omissions could have scarred me for life but instead I really took to LISP. In retrospect LISP was a major milestone because it took recursion into the mainstream.

However the first LISP systems were pretty sluggish and I mistakenly dismissed LISP as being impractical. I continued my math research into what would be known as Wadge degrees.

I was wise to pursue my interest in computing because by the time I finished my PhD the math job market had collapsed. My first teaching position was in computer science, at the University of Waterloo. In those days there weren’t enough CSC PhDs to staff a department so they hired mathematicians, physicists, engineers etc.

**Time Sharing**

Between my time at UBC and my arrival at Waterloo there was one big milestone – time sharing. Dozens of people could share connections to mainframes, which had gotten bigger and more powerful. This was mind-boggling, because before you had to wait your turn for computer access. At UBC I punched my program on to cards and left the deck at reception. I’d come back one or two hours later (the “turnaround time”) when the programs in the “batch” had been run and get the output. Which would typically contain an error report.

It took forever to produce a correct program. Time sharing cut turnaround time to seconds and greatly speeded up the software development process. Thus was born “interactive” computing.

Time sharing needed a lot of supporting technology. An operating system with files. Terminals for users, running some form of file browser, compilers and file editors.

**UNIX**

All this technology quickly arrived with the UNIX operating system, a non-official project at Bell Labs developed on an abandoned computer. UNIX conquered the world and still dominates to this day.

When UNIX arrived in a department it was like Christmas because there was a whole tape full of goodies. You got the C language and its compiler, the vi screen editor, and utilities like yacc, awk, sed, and grep. UNIX was a huge leap forward and an unforgettable milestone.

Soon there was a whole ecosystem of software written in C by UNIX users. Unix was a real game changer.

**Terminals**

One of the UNIX utilities was the shell (sh), a simple CLI. It was easy to set up a terminal, like a vt00, to run the shell. The shell allowed the user to define their own shell commands, and allowed recursive definitions. An irreplaceable command was vi, the screen editor. It replaced line editors, which were very difficult to use.

The end result was that anyone with a UNIX terminal had at their disposal the equivalent of their own powerful computer. At first the terminals were put in terminal rooms but gradually they were moved into individual offices.

**Digital typesetting**

About the time UNIX showed up, departments started receiving laser printers that could print any image. In particular they could print beautifully typeset documents, including ones with mathematical formulas. The only question was, how to author them?

UNIX had the answer, a utility called (eventually) troff. To author a document you created a file with a mixture of your text and what we now call markup. You ran the file through troff and sent the output to the printer.

The only drawback was that you had to print the document to see it – a vt100 could only display characters. This problem was solved by the next milestone, namely the individual (personal) computer.

**Workstations**

A personal computer (often called a “workstation”) was a stand-alone computer designed to be used by one individual at a time – a step back from timesharing. The workstations were still networked and usually ran their own UNIX. the crucial point is that they had a graphics screen and could display documents with graphs, mathematics, and even photographs and other images.

As workstations became more common, timesharing decreased in importance till only communication between stations was left.

**EMAIL**

Originally communication was by file transfer but this was quickly replaced by email as we know it. At first senders had to provide a step-by-step path to the receiver but then the modern system of email addresses was introduced.

At this point many computer people relaxed, thinking we’re finally reached a point where there are few opportunities for innovating – boy were they wrong!

**The Web**

At CERN in Switzerland Tim Berners-Lee decided that email lists were inefficient for distributing abstracts and drafts of physics papers. He devised a system whereby each department could mount a sort of bulletin board. This was the origin of the Web.

Unfortunately at first no one used it, until Berners-Lee put the CERN phone directory on the Web and its popularity took off. Soon there were thousands of web sites opening every day.

If anything the Web was a more significant milestone than even UNIX or timesharing. I remember playing around and discovering Les Tres Riches Heures du Duc de Berry – medieval images in glorious color on the Web (via a Sun workstation). My mind was, of course, boggled, and obviously not for the first time.

**Minor milestones**

After Timesharing, UNIX and the Web there are other important milestones that seem almost minor by comparison. There’s Microsoft’s Office suite including the spreadsheet Excel. Then there’s the iPhone and other smart phones, People with a phone carry around a computer thousands of times more powerful than the old 7040 I learned computing on.

And yet the innovation doesn’t stop. For a long time AI was a bit of a joke amongst computer scientists-we called it “natural stupidity”. The translations were of poor quality and although they mastered Chess a more complex game like Go (Baduk) was beyond their reach. Periodically industries and granting agencies would get discouraged and an “AI winter” would set in.

**AI Summer**

Then a very few years ago this changed dramatically. Go was solved and the translations became almost perfect (at least between European languages). What I call the “AI Summer” had arrived.

This was all thanks to employing new strategies (ML, neural nets) and using the vast stores of data on the internet.

And now we have Midjourney etc and chatGPT. Very significant milestones but not, as we have seen, the first – or the most significant.

]]>Everyone’s heard about chatGPT, the latest and most sophisticated chatbot to date.We all know it can bs proficiently about ‘soft’ topics like English literature. I decided to quiz it about a hard topic, mathematics. As you probably know, I have a PhD in math, so I won’t go easy.

Let’s start with some basic high school maths.

What is the Euclidean distance between two points (a,b) and (x,y)?

*In Euclidean geometry, the distance between two points (a,b) and (x,y) is given by the square root of the difference of the squares of the x-coordinates plus the difference of the squares of the y-coordinates. This can be expressed mathematically as:*

`√((x - a)^2 + (y - b)^2)`

*For example, the distance between the points (3,4) and (6,8) would be:*

`√((6 - 3)^2 + (8 - 4)^2) = √(3^2 + 4^2) = √25 = 5`

*It is important to note that this formula only applies in Euclidean geometry, where the notion of distance is based on the Pythagorean theorem. In other types of geometry, the concept of distance may be different, and the formula for calculating distance will vary accordingly.*

Should be “squares of the differences.”. Otherwise OK. It even came up with an example.

Calculate an approximation to the square root of 10 using Newton’s method.

*To find the square root of 10 using Newton’s method, we can follow the steps outlined above. Here is an example of how this might work:*

*Start with an initial guess for the square root of 10. Let’s say we guess that the square root of 10 is 3.**Use the initial guess to calculate a new approximation for the square root of 10. The formula for this is: x1 = (x0 + (n / x0)) / 2*

*Substituting our values into this formula, we get: x1 = (3 + (10 / 3)) / 2 = 5*

It screwed up the arithmetic. Not 5, but 3.166.

*Use the new approximation to calculate a better approximation for the square root of 10. The formula for this is the same as before: x2 = (x1 + (n / x1)) / 2*

*Substituting our values into this formula, we get: x2 = (5 + (10 / 5)) / 2 = 3.4*

*Repeat this process until the desired accuracy is achieved. In this case, we can stop after just two iterations, since our approximation is already accurate to within 0.1.*

*Therefore, our final approximation for the square root of 10 is 3.4. This is only an approximation, but it is a very good one, and it is much faster and easier to calculate than the exact square root of 10.*

3.4 is not a very good approximation.

What is the integral of 1/x for x=1 to x=2?

*The integral of 1/x for x=1 to x=2 is the area under the curve of the function 1/x from x=1 to x=2. This area can be found by using the definite integral, which is a mathematical tool used to calculate the area under a curve.*

*The definite integral of a function from a to b is defined as the limit of a sum of areas of rectangles under the curve of the function as the widths of the rectangles approach 0. In other words, it is the limit of the sum of the areas of many very thin rectangles that cover the area under the curve of the function from a to b.*

*In order to find the definite integral of 1/x from x=1 to x=2, we can use the formula for the definite integral:*

*∫f(x)dx = lim(Δx→0) ∑f(x)Δx*

*where f(x) is the function we are finding the integral of, x is the variable, and Δx is the width of the rectangles.*

*In this case, our function is 1/x, so we can plug this into the formula to get:*

*∫(1/x)dx = lim(Δx→0) ∑(1/x)Δx*

*We can then use this formula to calculate the definite integral of 1/x from x=1 to x=2 by summing the areas of many very thin rectangles under the curve of 1/x from x=1 to x=2. The result of this calculation is the exact value of the definite integral of 1/x from x=1 to x=2.*

*In general, the definite integral of 1/x from x=1 to x=2 is equal to the natural logarithm of 2, which is approximately 0.693. This means that the area under the curve of 1/x from x=1 to x=2 is approximately 0.693.*

The explanation is irrelevant but the answer is correct.

What is the integral of 1/(1+x^2)?

*The integral of 1/(1+x^2) is the inverse tangent function, denoted as atan(x) or tan^(-1)(x). To see why this is true, we can start by rewriting the fraction as the difference of two squares:*

1/(1+x^2) = (1 – x^2)/(1+x^2)

(This is just wrong.)

*Next, we can use the formula for the derivative of the inverse tangent function to simplify this further:*

d/dx atan(x) = 1/(1+x^2)

*Therefore, the integral of 1/(1+x^2) is the inverse tangent function.*

This is correct.

What is a measure? (Not on the high school curriculum.)

*In the context of mathematics, a measure is a way of assigning a numerical value to a set in order to capture its size or extent. For example, the measure of a line segment might be its length, the measure of a rectangle might be its area, and the measure of a set of points might be the number of points it contains. Measures are often used in integration and probability theory, among other branches of mathematics.*

Totally correct.

Does every set have a measure?

*Not every set has a measure. In order for a set to have a measure, it must satisfy certain properties, such as being countably infinite or having a well-defined notion of “size” or “length.” For example, the set of real numbers has a measure (namely, its Lebesgue measure), but the set of all subsets of the real numbers does not have a measure. In general, only certain types of sets can be assigned a measure in a meaningful way.*

Completely correct.

What is the measure of the union of two sets?

*The measure of the union of two sets is the sum of the measures of the individual sets. This is true because the union of two sets can be thought of as the sum of the elements in the sets, where each element is counted only once, even if it appears in both sets. For example, if the measure of set A is 3 and the measure of set B is 5, then the measure of the union of A and B is 8.*

This is wrong.

Only if they are disjoint

*The statement “the measure of the union of two sets is the sum of the measures of the individual sets” is only true if the two sets are disjoint. If the two sets have one or more elements in common, then the measure of their union will be the sum of the measures of the individual sets minus the measure of the intersection of the sets. This is because the intersection of the sets will be counted twice in the union, so we need to subtract it out in order to avoid overcounting. For example, if the measure of set A is 3, the measure of set B is 5, and the measure of the intersection of A and B is 2, then the measure of the union of A and B is 6 (the sum of the measures of the individual sets minus the measure of the intersection).*

GPT corrects itself. This is right.

What is the measure of the set of real numbers between 0 and 1?

*The measure of the set of real numbers between 0 and 1 is 1. This is because the set of real numbers between 0 and 1 is a continuous interval on the number line, and the measure of any continuous interval is equal to the difference between its upper and lower bounds. In this case, the upper bound is 1 and the lower bound is 0, so the measure of the set of real numbers between 0 and 1 is 1.*

Absolutely right.

What is the measure of the set of irrational numbers between 0 and 1?

*The measure of the set of irrational numbers between 0 and 1 is also 1. This is because the set of irrational numbers between 0 and 1 is a subset of the set of real numbers between 0 and 1, and the measure of a subset is always less than or equal to the measure of the set it is a subset of. In this case, the measure of the set of irrational numbers between 0 and 1 is equal to the measure of the set of real numbers between 0 and 1, which is 1.*

Right, although the reasoning is incorrect.

What is the measure of the set of rational numbers between 0 and 1?

*The measure of the set of rational numbers between 0 and 1 is 0. This is because the set of rational numbers between 0 and 1 is a countable set, which means that it can be put into one-to-one correspondence with the set of natural numbers (i.e. the set of whole numbers including 0: {0, 1, 2, 3, …}). Because the set of natural numbers has measure 0 (it is a countable set and the measure of any countable set is 0), the measure of the set of rational numbers between 0 and 1 must also be 0.*

Right again.

What is the measure of the set of algebraic numbers between 0 and 1

*The measure of the set of algebraic numbers between 0 and 1 is 0. This is because the set of algebraic numbers is a countable set, which means that it can be put into one-to-one correspondence with the set of natural numbers (i.e. the set of whole numbers including 0: {0, 1, 2, 3, …}). Because the set of natural numbers has measure 0 (it is a countable set and the measure of any countable set is 0), the measure of the set of algebraic numbers between 0 and 1 must also be 0.*

Darn! It knows about the algebraic numbers! (A number is algebraic if it’s a root of a polynomial with rational coefficients.)

What is the measure of the intersection of two sets of measure 0?

*The measure of the intersection of two sets of measure 0 is 0. This is because the measure of the intersection of two sets is always less than or equal to the measure of each individual set. Since the measure of each individual set is 0, the measure of their intersection must also be 0*

Perfect reasoning!

What is the measure of the intersection of two sets each of measure 1/2?

*The measure of the intersection of two sets is the size of the overlap between the two sets. The measure of a set is a number that indicates the size or magnitude of the set. In the case of two sets each with a measure of 1/2, the measure of their intersection would be 1/2, since the measure of a set is a property of the set itself and is not affected by the presence of other sets.*

Wrongity wrong wrong wrong. If the sets are disjoint, the measure of the intersection is 0. If they overlap at all, the measure could be anything between 0 and 1/2.

Well, I’m going to quit while I’m ahead. Chat GPT did much better than I expected, although it’s obviously not infallible. I was impressed by two things: the way it manipulated the variables in the distance example, and several instances of apparently using reasoning. For example, it seems to have reasoned as follow: the algebraic numbers are countable; any countable set has measure 0; therefore the algebraic numbers have measure 0.

Based on my teaching experience, I’d say chatGPT scored like a second year math major. Except for the arithmetic errors! Two cheers for chatGPT!

]]>The axiom of choice (AC) seems harmless enough. It says that given a family of non empty sets, there is a *choice function* that assigns to each set an element of that set.

AC is practically indispensable for doing modern mathematics. It is an existential axiom that implies the existence of all kinds of objects. But it gives no guidance on how to find examples of these objects. So in what sense do they exist?

**A voting scheme **

*Don’t buy a single vote more than necessary. – Douglas William Jerrold*

Voting is in the news today as are various voting schemes. However when it comes to voting a proposal up or down, there’s only one simple criterion: majority rule. If the voters are v0, v1, v2, … vn-1 then the proposal is accepted if at least n/2+1 are in favour.

But what if there are infinitely many voters v0, v1, v2, … ? What does it mean for there to be a majority in favour? AC implies that there exists an infinitary voting scheme (this is not obvious) but supplies not a hint about how it could work.

One possibility is to pass the proposal if infinitely many voters voted Aye, but it’s possible that at the same time infinitely many voted Nay.

A vote is an infinite sequence (e..g.) Aye, Nay, Nay, Aye,… and a voting scheme is a function which assigns either Aye or Nay to each vote. A *majority* is a set of voters such that any motion for which they all vote Aye, passes.

Then we require the following properties to hold:

Given any set of voters, either it or its complement is a majority, but not both

If everyone switches their vote, the result switches

The result of adding any number of voters to a majority is a majority

The result of removing a single voter from a majority is still a majority.

The intersection of two majorities is a majority

There are a number of consequences of these properties we can immediately derive. Let’s call the complement of a majority a *minority*.

A motion passes iff the set of those who voted Aye is a majority and fails if it is a minority

The set of all voters is a majority and the empty set is a minority

Any finite set is a minority

Any cofinite set is a majority

If the vote is unanimous, the result follows the vote

The union of two minorities is a minority

Sounds simple? Well don’t try to invent a particular voting scheme because you won’t succeed. Any scheme that can be defined concretely will fail at least one of these properties. In fact it’s been shown that it’s consistent with the axioms of set theory and a weak form of the axiom of choice that there is no voting scheme.

Here is a simple argument against the existence of a voting scheme. Let E be the even numbers and O the odds. Either E or O is a majority, say E. But how is O smaller than E?They are isomorphic,

Nevertheless full AC says a voting scheme exists. The question is, exists in what sense?

**Non measurable set**

You’ve probably heard of the Banach-Tarski result. AC implies that it is possible to divide a unit sphere into five pieces then reassemble them into two unit spheres (using only translations and rotations, that should preserve volume). This is like proving that 1=1+1 except that four of the pieces can’t be assigned volumes: they are non measurable sets.

AC implies the existence of non measurable sets – e.g. solids that don’t have a volume – but don’t ask to see an example of one. Any set you can describe precisely will be measurable – for example, any Borel set. Analysis texts devote a lot of space to proving that the results of various operations, like countable union, preserve measurability.

These texts could be greatly simplified if they just assumed that all sets are measurable. Of course if they also assumed AC they’d be in trouble but you can do most of analysis with weaker forms of choice, like countable choice, that don’t imply the existence of non measurable sets.

**An indeterminate game**

You’re all familiar with finite discrete games like checkers and chess. Simplifying a bit, they have the following properties:

Players I and II alternate, I moving first, until the game ends

On each move the mover has a finite choice of moves

On each move the mover plays a natural number

Each player knows the sequence of moves up to the current position

If the last move produces a winning position, the mover wins

If the last move produces a losing position, the mover loses

If the last move produces neither, the game ties

It’s actually quite tricky to make this precise, and I’ll skip the details. The crucial concept is that of a *strategy*, which is simply a function which given the moves so far, returns the next move for the player concerned. A *winning* strategy is one that always eventually puts its user in a winning position.

Zermelo showed more than a century ago that if a game always ends, then either one player has a winning strategy, or both have a tying strategy.

But we can also define infinite games. In fact it’s simpler. Given any subset G of the Baire space (set of sequences of natural numbers):

Players I and II alternate, I moving first.

On each move each player plays a natural number

If the resulting infinite sequence a0,b0,a1,b1,a2,b2, … is in G, II wins, otherwise I wins.

A winning strategy for II is a function which takes a0,b0,…an and yields bn. A strategy for I takes a0,b0,…an,bn and yields an+1.

Since there are no ties we might expect that one of the players has a winning strategy (in which case the game is said to be *determinate*). Not so fast.

AC implies the existence of a nondeterminate game. An example? Don’t look for one, for the usual reasons. Any game you can define will be determinate. In particular, if G is a Borel set, then G is determinate.

There are a number of weaker versions of AC that can be proved directly in ZF. Suppose, for example, that each choice set has exactly two reals. Then choose the smallest! More generally the same idea works if each choice set has a finite set of reals.

**A well ordering of the reals**

But what if a choice set has an infinite number of reals? Say, all those greater than 0? There is no smallest.

However we could use another ordering of the reals, in which any subset has a least element. This is called a well-ordering. Then any family of sets of reals would have a choice function.

The natural numbers are well ordered (by arithmetic ordering), so every family of nonempty sets of natural numbers has a choice function,

Is there a well-ordering of the reals? AC implies there is, but don’t try to find an example. No ordering you can define will be a well-ordering, otherwise you could prove AC from ZF, which has been shown to be impossible (if ZF is consistent so is ZF+¬AC)

Yet another mathematical object that does not exist in any practical sense.

**An infinitesimal **

One important application of AC is to prove the *compactness* property of first order logic. This says that if every finite subset of a set of first order formulas is consistent (has an interpretation), the whole set has an interpretation (this is nontrivial because different finite subsets my have different interpretations).

One application is to prove the possible existence of *infinitesimals*. An infinitesimal is a number that is greater than 0 but less than 1/n for every natural number n.

For a long time calculus was based on infinitesimals. The derivative f’ of a function f was ‘defined’ as (f(x+𝛆)-f(x))/𝛆. Engineers still think in terms of infinitesimals dx and dy.

In the 1800s infinitesimals were declared inconsistent and abolished, replaced by the 𝛆-𝛅 formalism.

However in the 1960s Abraham Robinson pointed out that compactness implies the existent of an extension of the reals with infinitesimals. If we take all true first order properties of the reals (such as x+y=y+x or log(xy) = log(x) + log(y) and add the formulas

𝛆<1

𝛆<1/2

𝛆<1/3

𝛆<1/4

…

Then every finite subset is consistent. Hence, by compactness, the whole set is consistent. Robinson called the resulting structure the hyperreals and it’s a model of the first order theory of the reals, with infinitesimals.

The only snag is, what is 𝛆, the infinitesimal whose existence is guaranteed by compactness? Don’t ask, because there is no hope of defining it. If 𝛆 works so does 𝛆/2, or 𝛆**2, or √𝛆. There is no infinitesimal distinguished in the same sense that i and -i are distinguished in the complex numbers. So the hyperreals have infinitesimals but don’t try to choose a particular one.

**Degrees of Existence**

One thing is clear, existence is not a straight forward binary property. It’s a spectrum. On one end there’s the existence of integers like 42 and recursively defined functions like factorial. On the other end is the voting scheme, which seems like pure vapourware.

AC by itself implies only vapourware. However these zombie-like objects like the voting scheme are necessary for the smooth functioning of the mathematical universe. Without them the universe is chaotically irregular.

The universe with AC is like an all-conquering army with straight ranks but ranks filled in part with zombies. Their presence fills us with discomfort but without them we’re lost.

]]>When the late Ed Ashcroft and I invented Lucid, we had no idea what we were getting in for.

**La dee dah**

At first we (or at least I) thought it would be pretty straight forward. The idea was to replace assignments like

```
I := 1;
...
while ...
I := I+1;
...
end
...
```

with equations like

```
first(I) = 1;
next(I) = I+1;
```

then, later,

`I = 1 fby I+1;`

The original motivation was that proving properties of programs would be easier if the program statements were equations.

As for implementing the language, the idea was that we could compile the equational form into machine code using conventional technology.

**Oh, shoot**

However the devil was in the dots … . It dawned on us/me that since the semantics was based on infinite sequences (which we thought of as histories of computations) this implied that the computations went on forever.

That was fine as long as it was supposed to be a continuously operating program, for example to list the squares of the natural numbers, But what if it is supposed to eventually halt and produce a single result?

For example, the following imperative program calculates an approximation to the square root of 2, outputs it, and terminates.

```
a := 1;
err := 1;
while err > 0.0001
err := abs(2-a*a);
a := (a+2/a)/2;
end;
root2 := a;
write(root2);
```

The equations for *a *and *err* are obviously

```
a = 1 fby (a+2/a)/2;
err = abs(2-a*a);
```

but what about *root2*? What is its equation? And it’s not even a stream, it’s a single value.

**Extract the root**

Eventually we devised an operator *as_soon_as* that *extracts* a value from a stream. The operator *as_soon_as* (shortened to *asa*) takes two arguments and returns the value of its first argument that corresponds to the first time its second argument is true. Thus if *X* is *<x0,x1,x2,x3,…>* and *P* is *<f,f,f,t,…>* then *X asa P *is *<x3,x3,x3,x3,…>*.

The full Lucid program was

```
I = 1 fby I+1;
a = 1 fby (a+2/a)/2;
err = abs(2-a*a);
root2 = a așa err<0.0001;
```

(and the order of the equations doesn’t matter).

We were therefore forced to resort to lazy evaluation: you compute the value of an expression only if you need it. In particular, you don’t keep computing values of the sequence denoted by *a.*

**Compiling is out**

However compiling became very complicated. You had to analyze the program to figure out which values will be actually needed. We just waved our hands and said it could be done. (We also needed to analyze the program to figure out that variables like root2 are ‘really’ just constants and need be output only once. We only recently solved this problem.)

So we tried another strategy, namely compiling the program into a network of dataflow filters connected by pipelines. David May (yes, that David May, then a grad student) thought this was a great idea and began working on an implementation along these lines. Then one fatal Monday we met in the University of Warwick (UK) Arts Centre cafe for a cheese sandwich lunch. “It doesn’t work” he announced.

**Dataflow is out**

The problem was *if-then-else-fi* and other primitives that don’t need (and may discard) some inputs. Pipeline dataflow filters wait for data tokens to arrive on all input lines, consume one taken from each line, then produce an output token.

This causes problems if there is an* if-then-else-fi *filter. If the input streams are P, X, and Y, the filter should wait for tokens *pn, xn*, and yn to show up, then iff *pn* is true, send on *xn *and discard *yn*, otherwise send on *yn* and discard xn.

Sounds simple enough but there is a fatal issue: waiting for values that you don’t need and will discard.

What if the unneeded values show up late, or not at all? Then we will have delayed the computation for no reason. We can tweak the operation so that as soon as the needed token shows up, it is passed on. But we’re still stuck waiting for the unneeded value and there’s no way to skip it. If it never shows up (because of deadlock upstream) we’re in trouble because our own output deadlocks prematurely.

The problem is that the semantics of Lucid requires

* if t* *then x else y fi = x if f then x else y fi = y*

If we let ⊥ denote a nonterminating (deadlocking) stream, we have

* if t then x else ⊥ fi = ⊥ if f then ⊥ else y fi = ⊥*

which violate the basic rules for *if-then-else-fi* given above.

Furthermore, if we adopt wait-and-discard we may waste computing resources (those spent computing unneeded values) and these resources could be significant.

**Kludges** **are out**

There are various kludges we could try, like sending ‘kill’ tokens upstream to cancel unneeded computations, but these run in to trouble if there are cycles in the network upstream. All a giant headache.

For that reason all simple-minded dataflow models lack a three-input one-output conditional filter. Instead they typically have a one-input two-output filter that sends a token down one of the output lines. It’s unnatural to program with such a primitive and anyway the result is two lines with different and unpredictable data rates.

In other words, David May was perfectly justified in declaring that pipeline dataflow “doesn’t work” as an implementation of Lucid. Luckily, he had a solution.

**Demand results**

His solution (Tom Cargill and others independently came up with the same idea) was to systematically use demand-driven evaluation. The interpreter demands the value of *output*, which generates a demand for the value of *root2,* which in turn generates demands for the values of *a* and *err*.

We can demand the value of *root2*, because it’s just one number, but *a* and *err* are (potentially) infinite sequences that can’t be returned as the answer to a single demand. The clever idea is to allow us to demand specific indexed components of these streams, e.g. the value of *a* when time=2 or the value of *err* when time=3.

Indexed demands propagate, so that the demand for a component of one variable at a given timepoint generates demands for possibly different variables at possibly different timepoints.

The propagation rules are very simple:

A demand for *A+B* at time t generates demands for A at time t and B at time t,

and the result is the sum of the two results (other data operations work similarly)

A demand for *first A* at time t produces a demand for A at time 0, and the result is the answer.

A demand for *next A* at time t generates a demand for *A* at time t+1, and returns the answer as the result.

A demand for *A fby B* at time 0 returns the answer to a demand for *A* at time 0; while a demand for A* fby B *at time t+1 returns the answer to a demand for *B* at time t.

A demand for* X asa P* at time t generates demands for P at times 0, 1, 2, … until the answer is *true* (say at time r), and then returns the answer to the demand for *X* at time r.

**Some observations**

First notice that all four primitives discard data … pipeline dataflow doesn’t implement any of them safely or efficiently.

Also, ‘time’ is just a formal parameter, it has no necessary connection to wall clock time. The time values do not necessarily increase as the computation proceeds. Thus we may demand the time 8 value of a variable, then the time 5 value, even of the same variable.

**Anachronistic programs**

In fact it is possible to write programs that recurse into the ‘future’, like the following that computes the factorial of 7:

```
first f
where
n = 7 fby n-1;
f = if n<1 then 1 else n * next f fi;
end
```

The variable f is defined in terms of its own future yet the demand-driven interpreter produces 5040, the right answer.

**Minimum solution**

It can be shown that the demand driven interpreter is 100% faithful to the statements-as-equations semantics. Every set of equations has a unique minimum solution and that’s what the interpreter computes. Usually there is a unique solution, but “minimum” means having the ‘most’ ⊥’s – in other words, the least ‘defined’ solution. Another way of putting it is that no actual value appears out of nowhere.

For example, the equation* I = 1 fby next I* has I = <1,7,7,7,…> as a solution but where did 7 come from? The minimum solution is I = <1,⊥,⊥,⊥,…>, and this is what the interpreter produces. In other words, if you demand I at time 0 you get 1, but if you demand I at time t>0, the computation fails to terminate. As it should, if it’s faithful to the semantics.

**Eduction**

Ed Ashcroft loved words and wordplay. One day, browsing through a dictionary or thesaurus, he came across the word “eduction”. I remember the definition he found was something like “the act of drawing forth or eliciting … results … from the data …”. Perfect! This is what we should call it! And we did.

The question arose, is eduction dataflow? Not pipeline data-push dataflow, that’s for sure. We decided to stake a claim and defined eduction as “tagged demand-driven dataflow”. Eduction is briefly described (but not so named) in the *Lucid, the Dataflow Language* (1985) book.

Nevertheless the book does, on the other hand, explain the pipeline dataflow model even though it admits that it cannot serve as a general implementation scheme. There are two reasons for this.

First, pipeline dataflow serves as an excellent *heuristic* for writing and understanding programs. The eduction model is not usually a very good guide – everything seems to happen backwards.

Secondly, pipeline dataflow works fine for many unsophisticated programs that process data in a straightforward way. It’s much simpler than eduction and has much lower overhead. Program analysis could automatically identify programs that are eligible for a pipeline implementation.

Nevertheless, eduction is a general technique so we need to investigate what’s required to make it work.

**Implementing eduction**

The first thing we notice its that it doesn’t use storage (apart from cells on the invisible stack that implements recursive calls to the evaluation routine).

Not using storage is a bad idea, because it means that the interpreter recomputes demands that are repeated. This can get very expensive in time if not in space.

The solution is a cache, which we traditionally call the *warehouse*. Every time we calculate the value of variable V at time t, we record this fact. The warehouse is an associative store indexed by the pair (V,t). In modern languages like Python or Swift the warehouse can be implemented in a straightforward yet efficient manner using the built-in *dictionary* primitive.

The second problem is that the warehouse can in theory fill up as the computation proceeds. This is less of an issue with modern computers – even consumer laptops – that have ridiculous amounts of storage. For example, the older MacBook Air I’m using can store over a *billion* numbers. The PyLucid interpreter stores everything and never runs into trouble with the modest programs that appear in this blog.

Nevertheless a completely general implementation needs to manage warehouse storage. Tony Faustini and I came up with an effective heuristic we called the retirement plan. Briefly, it sweeps the warehouse periodically and discards values that haven’t been used recently.

**Tags**

The next complication involves storing and fetching data. In the simple root 2 program, we generate a demand for *root2* at time 0 and store the value with tag *(root2,0)*. So far so good.

Now suppose we extend the program and when we evaluate it we get demands for *root2* at times 3 and 5. What do we do?

We can store the same approximation to √2 with tags *(root2,3)* and *(root2,5).* Now we’re wasting warehouse space by storing the same data just with different tags. Depending on the extended program, this could be very expensive in terms of space. We were lucky originally that only time 0 was demanded. But we can’t count on luck.

Now suppose we have a demand for *root2* at time 7. We look in the warehouse using tag *(root2,7)* and find nothing. As a result we recompute *root2*. This is wasteful of time.

The only way we can avoid wasting space or time is to find out that *root2* is constant – is insensitive to the time parameter. This requires static program analysis as described in the time sensitivity blog post.

**Extra dimensions**

One of the advantages of treating the time index as a formal parameter is that it suggests other dimensions can also be treated as formal parameters. In other words, eduction opens the door to multidimensional dataflow.

PyLucid has two dimensions, time (t) and space (s). As that post explains, it means we can write programs employing time-varying arrays. We can still use the pipeline heuristic, but we must imagine infinite arrays travelling down the pipes.

Eduction has no trouble handling multiple dimensions. In the simplest case, we just have slightly more elaborate demands, say (X,t=3,s=4). However with many dimensions, passing around coordinates is cumbersome. Instead we have ‘registers’ (special global variables) that hold the coordinates. Then to evaluate next X, for example, we increment the time register by one, demand the value of X, then decrement the time regiser by one.

**A multidimensional warehouse**

There is one catch, and that involves accessing the warehouse. Suppose we demand the value of X and get 27 as the result. With what keys do we store 27 in the warehouse?

We could attach the values of all the registers but that in general would result in wasting space on duplicate entries – the same problem as with time sensitivity, as described above, but much more serious.

We could keep track of the registers actually examined in computing the demanded value of *X*, and use them as the keys, but what about on the other end when we have to search for a demanded value of X? A priori we have no way of knowing what dimensions entered into producing the value we are looking for.

The only general solution is *dimensional analysis*, the process of discovering which dimensions might enter into the production of a given variable. Upper bounds are enough. For example, we may discover that dimensions s and t are enough to get aa value for X, but that Y may need dimension h as well. Then when we search for a value of X, we use the current contents of registers s and t as keys. But for a value of Y, we also include the current contents of the h register.

Dimensional analysis was the main technical obstacle holding back the development of Lucid, and it is solved in Shennat’s dissertation.

**User defined ****function**s

So far we have talked about only 0-order programs – programs that have no functions other than built-ins like *next*. What does it mean to demand, say, the value of *fac(n+1) *where the user has written their own definition of *fac*. (David May’s implementation did not support user defined functions.)

This caused some head scratching till we got a hold of Calvin Ostrum’s interpreter, which did support user defined functions. Upon examining the code we discovered that he introduced an extra dimension, that we called the *place* dimension, that specified where in the function calling tree a demand was being made. Ali Yaghi, then a PhD student at Warwick, revised and extended Ostrum’s scheme and formalized it in terms of intensional logic. The result is what we call Yaghi Code.

The point of Yaghi code is that it magically reduces a first order program to a 0-order program, to which we can apply eduction. The only cost is two extra intensional operators,* call* and *actuals*, and an extra “place” dimension. We’ve already seen that extra dimensions are not a serious problem for eduction.

**Higher order functions**

For a long time Lucid was strictly first order and this worried us because we liked to call it a *functional* dataflow language. For a long time we couldn’t see how to extend it. Then P. Rondogiannis and I came up with a solution that in hindsight seems obvious: more dimensions!

The idea is that one place dimension reduces a first order program to a 0-order program. The same procedure can reduce a second order program to a first order program, then adding another place dimension produces a 0-order program, which can be educed.

In general, some – some – nth order programs can be translated into 0-order programs that use n place dimensions and n families of *call/actuals* operators.This is not a general solution because only programs with certain function types can be translated. In particular, we cannot translate programs that employ partial application; in other words, with functions that return other functions as result.

A number of smart people have tried to fix this, without success. My hunch is that it can’t be done, though I don’t know why.

**Advantages**

At this point the reader might start wondering, what is the point of all this? Programmers often find the side effects-free style of Lucid programming constraining, because they can’t just tell the computer what to do. Furthermore, implementing Lucid is quite a challenge because you can’t simply turn Lucid code into machine code.

In fact there are huge advantages to writing in Lucid and implementing the program with eduction. To begin with

**You can understand programs**.

The statements in a Lucid program really are mathematical equations. Inside a *where* clause the order is unimportant and the result of a where clause is derived from the (usually unique) solution of these equations. Lucid has evolved but the statements-as-equations principle has remained nonnegotiable. For example, we do not allow compound expressions on the left hand side because that can undermine the basic principle.

For a start we can apply the rules of algebra exactly because there are no side effects. If X = A+B and A and B are both small integers, we can conclude that X is also an integer. And if A and B both increase with time, we can conclude that so does X. Static analysis of Lucid programs is vastly simpler than that of imperative languages like Python.

**We can transform programs**

Also, we can safely apply the transformation rules of conventional algebra. For example, if *Y = P*Q+R* we can add an equation *V=P*Q* (V not already in use) and change the definition of Y to *Y = V+R.* The expression *X*X* can be replaced by *X**2*; no side effects. Or if *F(R,S)* is defined to be *R – 2*S*, then *F(G+H,G-H*) can be replaced by* (G+H) – 2*(G-H)* and then by* 3*H-G*.

The PyLucid compiler proceeds by applying meaning-preserving transformations. In the end (after introducing Yaghi code) the entire program is reduced to a (large) unordered set of ‘atomic’ equations By ‘atomic’ we mean that each equation consists of a variable equated to an expression which is either a data constant or a single operation applied to variables.

The atomic form is ready for eduction. But it is still a Lucid program, is human readable and can be saved in a simple text file. It is semantically equivalent to the original. The atomic form is also amenable to program analysis, for example determining dimensionalities.

**Eduction can be distributed**

Once we have the atomic form of the program, we can store it on different machines and have a number of warehouses also on different machines. We could divide up the work according to variables, e.g. have one machine evaluating A, B, and C, and another X, Y, and Z.

Of course a demand for say, B could generate a demand for, say, Z but this demand could be sent across the network. Program analysis could tell us how to split up the work so as to minimize network traffic.

There would be no problem duplicating warehouse entries because you are never going to have discrepancies – you can use whichever warehouse you want.

**It’s fault tolerant**

With eduction the program does not change – unlike systems based, say, on combinator reduction. If a value goes missing for whatever reason, it can be recomputed (although this may be expensive in terms of time). For this reason the warehouse strategy can be only a heuristic, like the retirement plan.

Fault tolerance is vital for a distributed implementation because it means communications don’t have to be 100% reliable.

**It could be very fast**

Eduction has plenty of provision for parallelism. There is no inherent contention between demanding a value of X and demanding a value of Y, unless (say) the demand for X generates a demand for Y (at the same coordinate). There are no races because there are no side effects.

The GLU project used Lucid as a coordination language to link pieces of legacy C code. They achieved modest speed up, typically an order of magnitude. This was back in the 90s when computers were pathetically weak (in terms of speed and storage) compared to today. Surely we can do much better.

]]>Pascal had *while* loops and we managed to do iteration with equations. However in Pascal you can nest *while* loops and have iterations that run their course while enclosing loops are frozen. This was a problem for us.

To see the problem consider the following Pascal-like program which repeatedly inputs a number, computes an approximation to its square root using Newton’s method, and prints it out

```
N = input() ;
while N ne eod
a = 1 ;
err = abs(N-1)
while err > 0.0001
a = (a+N/a)/2 ;
err = abs(N-a*a) ;
end
end
```

We can naively rewrite this as Lucid giving

```
output
where
N = input;
output =
a așa err < 0.0001
where
a = 1 fby (a+N/a)/2
err = abs(N-a*a)
end;
end
```

All very good but it doesn’t work – it’s output is garbage. The problem is that N continues changing in sync with the approximation a. The Newton iteration is chasing a moving target and may not even terminate.

The difference between Lucid and Pascal is that with Pascal by default nothing changes whereas with Lucid everything changes by default.

Obviously, if Lucid wants to be a general purpose language it needs nesting. We needed some way to ‘freeze’ the current value of N while the inner loop is running. We came up with the “*is current*” declaration, an ad hoc solution.

The program becomes

```
output
where
n = input;
output =
a așa err < 0.0001
where
N is current n;
a = 1 fby (a+N/a)/2
err = abs(N-a*a)
end;
end
```

There were many problems with this solution, starting with the fact that N is no longer defined by an equation. The i*s current *statement was unpopular with Lusers (Lucid users) and tended to be dropped. GLU did not have nesting and instead used temporary extra ‘throw away’ dimensions.

Can we do better? I think so, and I’m going to outline a proposal. The idea is to use operations (I call them”hyper filters”) that work on streams (I call them “hyperstreams”) that are functions of a whole sequence t0,t1,t2,… of time parameters, not just t0 as with ordinary streams. The idea is that t0 is inner, local time, t1 is time in the enclosing loop, t2 time in the second outer loop, and so on.

I’m also going to correct original Lucid’s biggest mistake, which was to try to get away with only one type of *where* clause. The semantics were a mess. Instead we have a plain *where* clause which simply hides definitions, and *whereloop*, used for nesting. Then we’ll define *whereloop* in terms of a simple translation into conventional *where*.

I’m not satisfied with simply reviving the original simple ‘freezing’ form of nesting. I’m proposing a more general framework that allows, for example, several rounds of the inner loop to produce one value for the outer loop. This framework uses two general purpose multi (time) dimensional operators, *active* and *contemp* (orary).

Although these two hyperfilters act on infinite dimensional hyperstreams we can understand them in terms of one- and two- dimensional streams extended point wise to the other dimensions. A *whereloop* implicitly applies *active* to all its globals and *contemp* to its result.

The operator *active* takes a stream and duplicates it over t0.

In other words, if a is the argument stream, *active(a)* repeatedly starts from scratch in each invocation of the inner loop.

The operator *contemp*, on the other hand, takes a two-dimensional stream and samples it.

In other words, if *contemp(w)* is the contemporary value of w, the value in the current enclosing time.

It’s easy to check that *contemp* and *active* are dual, that *contemp(active(a)) = a.*

If we want simple freezing we use the operator *current*, defined as

current(<g0,g2,g3,…>) = <<g0,g0,g0,…>,<g1,g1,g1,…>,<g2,g2,g2,…>,…>

The square root program becomes

```
output
where
n = input;
output =
a așa err < 0.0001
whereloop
N = current n;
a = 1 fby (a+N/a)/2
err = abs(N-a*a)
end;
end
```

After translating the whereloop we get

```
output
where
n = input;
output =
contemp(a așa err < 0.0001)
where
N = current active(n);
a = 1 fby (a+N/a)/2
err = abs(N-a*a)
end;
end
```

Inside the inner where n might look like <2,9,3,…>, *active n *will be

<<2,9,3,…>,<2,9,3,…>,<2,9,3,…>,…>

and *current active n* will be

<<2,2,2,…>,<9,9,9,…>,<3,3,3,…),…>

and now a is no longer chasing a moving target. The hyperstream a might look like

<<1,1.6666,1.478,…,1.414…,1.414…,>,<1,2.5,2.9,…,3,3,3,…>,<1,1.2..,1.6…,1.73…,1.73…,>,…>

and *a așa err<0.0001* will be

<<1.414…,1.414…,1..414…,>,<3,3,3,…>,<1.73…,1.73…,1.73…,…>,…>

and *contemp(a asa err<0.0001 )* is

<1.414…,3,1.73…,…>

which is what we want.

So much for the simplest form of nesting, where outer values are frozen during the inner computation. Can we do better? Yes.

Suppose we want to produce a single value that repeatedly combines several values from the outer loop (not possible in traditional Lucid). To be specific, suppose that m is a series of positive integers interrupted by 0’s. We want to produce the stream of sums of the numbers up to each 0. For example, if m is of the form

<3,2,4,0,6,0,3,8,5,0,…>

then the value of the loop should be of the form <9,6,16,…>.

Here is the program

```
m
where
n = input;
m =
sum asa y eq 0
whereloop
sum = 0 fby sum+y;
y = n Fby y after y eq 0;
end;
end
```

It uses the (ordinary) filter *after* and the hyperfilter *Fby*. The former returns that part of its first argument after its second argument is true for the first time. For example, if P begins <f,f,f,t,…> and X begins <x0,x1,x2,x3,x4,…> then *X after P is* <x4,x5,x6,…>. The definition of *after* is

```
X after P = if first P then next X
else first X fby next X after next P
fi;
```

This is standard one dimensional Lucid. But *Fby* is two dimensional; it’s *fby* in the time dimension t1. If V is <<x0,x1,x2,…>,<y0,y2,y3,…>,<z0,z1,z2,…>,…> and W is <<a0,a1,a2,…>,<b0,b1,b2,…>,<c0,c1,c2,…>,….> then *V Fby W* is

<<x0,x1,x2,…>,<a0,a1,a2,…>,<b0,b1,b2,…>,<c0,c1,c2,…>,…>

(Incidentally there does not seem to be any way to define *Fby* in terms of simpler primitives.)

Once we translate the whereloop we get

```
m
where
m = input;
n =
contemp(sum asa y eq 0)
where
sum = 0 fby sum+y;
y = active(m) Fby y after y eq 0;
end;
end
```

If m is as above then y is

<3,2,4,0,6,0,3,8,5,0,…>,<6,0,3,8,5,0,…>,<3,8,5,0,…>,…>

*sum* is

<<0,3,5,9,9,15,15,…>,<0,6,6,9,17,…>,<0,3,11,16,16,…>,…>

*sum așa y eq 0* is

<<9,9,9,…>,<6,6,6,…>,<16,16,16,…>,…>

and *contemp(sum așa y eq 0*) is <9,6,16,…>, as required.

On the other end of the spectrum, we want a loop which produces a number of values for every step in the enclosing loop. Suppose that n is a stream of natural numbers, say <2,5,8,3,…> and we want to produce a stream m that enumerates the binary digits of the components of n separated by “;”, so that m will be of the form

<0,1,”;”,1,0,1,”;”,0,0,0,1,”;”,1,1,”;”,…>

The following program does the job

```
m
where
n = input;
m =
(";" fby digit) Until (false fby k eq 0)
whereloop
k = current n fby k/2;
digit = k mod 2;
end;
end
```

The operator Until is defined recursively as

` x Until p = if First first p then (Next x Until Next p) else First first x fby (next x Until next p) fi`

Here *First* and *Next* are the t1 versions of first and next.

Does this work? I believe so, but I leave it to you as a (nontrivial) exercise to check it. Let me know iff it doesn’t work out.

It all looks hunky-dory but there is a problem: global functions. I explained that *whereloop* is translated by applying *active* to all the global (individual) variables. But what about a function f that is defined outside the *whereloop* but is called inside? The hitch is that the definition of f may contain a global g that normally should be *active*-ated. If we do nothing we end up ‘smuggling’ g inside the *whereloop* with unpredictable consequences because we have not turned it into a two dimensional stream.

We can’t just ignore global function calls. What do we do with them? We can try Yaghi code, which seems to work, but I’ll leave the details for a later time. In the meantime, there’s no problem using globally defined functions that have no globals.

]]>I’m going to set things right by releasing an up to date version of PyLucid (Python-based Lucid) and up to date instructions on how to use it to run PyLucid programs. The source can be found at pyflang.com and this blog post is the instructions. (The source also has a brief README text file.)

Go to pyflang.com and download the file pylucid.1.2.zip and unzip it. There will appear three objects: a directory *source*, a command *repl*, and a file *README.txt*. This is all you need.

The source directory contains about 30 .py files (python code). You don’t have to know anything about them if all you want to do is run PyLucid programs.

The README.txt is basically a condensed version of this post.

The repl command launches the PyLucid Read-Evaluate-Print-Loop. You repeatedly enter one-character commands (possibly with arguments) and that’s how you interact with the interpreter. No UNIX commands other than repl itself.

Actually, “evaluate” is a bit misleading because you don’t enter PyLucid code on the command line. Instead you manipulate a program stored in an invisible buffer. The program stored in the buffer is called the current “import”, and when you first launch the repl it will inform you that the current import is “etothex”. It’s a simple program that calculates e using the power series for e**x with x=1.

To view (the program in) the buffer, use the “b” command. To edit it, use “v” (which launches vi). To evaluate (run) the program, use “e”.

The e command expects the buffer to contain a *where* clause, which consists of a *subject* (the value of the clause) and an (unordered) set of equations defining variables and functions. The variables denote two-dimensional datasets and the functions denote transformations (filters) on such datasets.

The e command evaluates the subject in the context of the definitions (the *body*) of the where clause. Note that the right hand sides of the definitions can be arbitrarily complex and may, for example, contain nested where clauses.

The e command evaluates the program and displays the results. However in PyLucid the value of a program is in general two dimensional – varying in both space and time. The interpreter uses the horizontal dimension for space, and the vertical dimension for time. Both dimensions are a priori infinite so it is necessary to limit the display in both directions.

The simplest way to do this is by defining the *parameters* (distinguished variables) “rows” and “columns”. For example, if the body includes the definitions “rows=10” and “columns=3” the display will be 10 (vertically) by 3 (horizontally), showing the values for time going from 0 to 9 and space going from 0 to 2.

More sophisticated effects can be achieved using the end-of-data and end-of-space special values.

The remaining commands are straightforward and are documented by the h[elp] command. However it may not be obvious how to create a new program, import it, and then remove it.

Suppose you want to create a program called “bleen”. The command “v bleen” will create it and “i bleen” will import it. At the moment there’s no way to remove a program but that will soon be fixed.

Finally, I should mention that there are two features currently not supported because bugs showed up during testing. One is nested iteration, the other is variable binding operators. I’m working on fixing them but in the meantime I thought it was more important to get PyLucid out there.These features and others will be available in future releases Real Soon Now.

These future releases will have new features, not just bug fixes. There will be more example programs. And I will be upgrading I/O so that in particular you can specify input and output prompts.

]]>*The problem he tackled was that of dimensional analysis of multidimensional Lucid programs. This means determining, for each variable in the program, the set of relevant dimensions, those whose coordinates are necessary for evaluating individual components.*

**Objective:** to design Dimensional Analysis (DA) algorithms for the multidimensional dialect PyLucid of Lucid, the equational dataﬂow language.

*Dataflow is hardly an unknown concept but most dataflow systems are stream based – there is only one dimension, time. Lucid, by contrast allows multiple dimensions. Evaluation is demand-driven in which demands are for values of variables at given coordinates. These demands generate demands for possibly different variables at possibly different coordinates. Values computed are cached, labeled by the coordinates needed.*

**Significance:** DA is indispensable for an efficient implementation of multidimensional Lucid and should aid the implementation of other data flow systems, such as Google’s TensorFlow.

*DA is indispensable because to retrieve a value from the cache we need to know which coordinates will form the label to be searched for. Without DA the cache would fill with duplicate entries labeled by irrelevant dimensions.*

Dataflow is a form of computation in which components of multidimensional datasets (MDDs) travel on communication lines in a network of processing stations. Each processing station incrementally transforms its input MDDs to its output, another (possibly very different) MDD.

MDDs are very common in Health Information Systems and data science in general. An important concept is that of *relevant dimension.* A dimension is relevant if the coordinate of that dimension is required to extract a value. It is very important that in calculating with MDDs we avoid non-relevant dimensions, otherwise we duplicate entries (say, in a cache) and waste time and space.

Suppose, for example, that we are measuring rainfall in a region. Each individual measurement (say, of an hour’s worth of rain) is determined by location (one dimension), day, (a second dimension) and time of day (a third dimension). All three dimensions are *a priori*relevant.

Now suppose we want the total rainfall for each day. In this MDD (call it N) the relevant dimensions are location and day, but time of day is no longer relevant and must be removed. Normally this is done manually.

**Research question:** can this process be automated?

We answer this question affirmatively by devising and testing algorithms that produce useful and reliable approximations (specifically, upper bounds) for the dimensionalities of the variables in a program. By *dimensionality* we mean the set of relevant dimensions. For example, if M is the MDD of raw rain measurements, its dimensionality is {location, day, hour}, and that of N is {location, day}. Note that the dimensionality is more than just the *rank*, which is simply the number of dimensions.

**Background**: There is extensive research on dataflow itself, which we summarize. However, an exhaustive literature search uncovered no relevant previous DA work other than that of the GLU (Granular LUcid) project in the 90s. Unfortunately the GLU project was funded privately and remains proprietary – not even the author has access to it.

*The GLU project was funded at SRI in Stanford by Mitsubishi. The GLU project did carry out some DA.*

**Methodology**: We proceeded incrementally, solving increasingly difficult instances of DA corresponding to increasingly sophisticated language features. We solved the case of one dimension (time), two dimensions (time and space), and multiple dimensions.

*These algorithms proceed by accumulating approximations. For example, to determine which variables are time sensitive, we start with those that obviously are: those defined by a fby expression. Then we add those that are defined in terms of a time sensitive variable by a data operation or a next operator. Any variable defined by first is definitively not time sensitive.*

*The accumulation process continues until it settles down and no new time sensitive variables are discovered. At that point we have to assume that all variables discovered are actually time sensitive but we can be sure that variables not added are time constant.*

We also solved the difficult problem (which the GLU team never solved) of determining the dimensionality of programs that include user defined functions, including recursively defined functions. We do this by adapting the PyLucid interpreter (to produce the DAM interpreter) to evaluating the entire program over the (finite) domain of dimensionalities.

*This is tricky because this evaluation normally will not terminate. So we instrument the evaluator by e.g. counting calls to the evaluation function. Then we cap this count, evaluate, increase the cap, re-evaluate, etc., until the values settle down. (In practice very small caps suffice.)*

**Results**: Experimentally validated algorithms that produce useful upper bounds for the dimensionalities of the variables in multidimensional PyLucid programs, including those with user defined functions.

*Our results are purely experimental, we do not provide formal proofs. But the experiments were 100% successful.*

I like to take pictures (you’ll see some here). Some of them turn out good but I’m not in the same league as the real professionals.

I’m very curious about what makes a good picture and am amused by newbie mistakes. Like not getting close enough. Or having the sun at your back so your subjects are squinting.

Or taking all your pictures in landscape mode.

You can hardly blame the newbies, cameras are set up to be used in landscape mode. To take a portrait you need the awkward hand-over-the-camera maneuver pictured above. Practically a sign of a better-than-average photographer.

I love portrait mode. Recently my wife gave me an Aura Frames smart photo frame. It’s brilliant, starting with the images. (I have no investment in AF).Your images are stored in the cloud, so you can have tons of them. They are presented as a side show, with you controlling the interval.

One of their best ideas is to allow the frame to operate in portrait mode – just place it upright on the surface, as in the picture on the left. And that brought up the question, which mode to use?

Initially I set it up in landscape mode and loaded a bunch of my pictures into the cloud. This was disappointing since pictures that weren’t landscape were crudely cropped and peoples’ feet and heads disappeared into the edge. Same with portrait, this time arms disappeared.

Aura allows you to upload entire Mac photos albums so my next step was to create two albums, with pictures (cropped properly) into landscape and portrait modes, respectively.

There were good pictures in both albums, but I soon preferred the portrait album. There was something more impressive or active about them, whereas the landscape album was laid back.

For example, political posters (and posters for movies and concerts) are invariably in portrait mode. They are designed to be energizing.

So I decided to stick with portrait only for my frame and began going through my pictures and editing them to portraits (mainly copying and resizing). I even ransacked the landscape album and was able to convert most of them to decent portrait images.

**Portrait mode in history**

It has to be said, that historically the portrait mode has been dominant. The earliest writing was done on stone or clay tablets, and they were much taller than they were wide.

Throughout history important documents designed to impress have been produced in portrait mode. Hollywood aside, we don’t know about the ten commandments but we know for sure about the Gutenberg Bible, which, like almost all books, is made of portrait pages.

Famous declarations, like the French revolutionary Rights of Man and Citizen, were portraits.

Why did portrait mode dominate for so long? A partial explanation is that text is much easier to read in portrait, because the lines are shorter. With landscape it you have to follow the long lines carefully and have trouble going back to find where the next line begins. The Gutenberg Bible is not only portrait, each page has two tall columns.

For that reason letters, reports, academic papers and in general Word documents are portrait.

**The Changing Landscape**

Why did Landscape emerge as such a strong contender? The short answer is TV, in the early 1950’s.

Portrait is good for showing one or two people, but if you want to show a group you need landscape, because groups of humans spread sideways. TV shows are almost all about groups of humans. Also TV shows have a lot of literal landscapes (like the westerns that were popular in the 50’s).

So there was really no choice about which mode to use for television, and that had a knock-on effect for other media.The same can probably be said about cameras, which in the consumer market are mainly used to take pictures of groups of people (family).

It’s now hard to imagine, but the first personal computers (like the Xerox Parc Alto) had portrait screens. They were designed to produce, edit, and display documents, so portrait was the obvious choice.

This started changing when computers began being used for other purposes, like games and, later, video. For a while there were even ‘bimodal’ monitors that could be rotated between landscape and portrait but they soon disappeared. Nowadays we’re surrounded by rigid landscape screens, with Aura Frames being an outstanding exception.

**The Aesthetics of Landscape and Portrait**

As I said, I prefer portrait. I noticed that when I edited a picture into portrait, I got better images than when I edited them to landscape. In landscape, there tended to be uninteresting spaces on either side.

Hold on then, aren’t there wasted spaces in portrait images? On the whole, no. Top and bottom are equally important. Sometimes newbies waste the top, typically by pointing the camera up so that, for example the subjects’ faces appear in the middle of the image. (This is a very common error).

Sometimes the newbies point the camera down, so that the bottom of the image is empty foreground (sometimes they achieve both).

The key to successful portrait mode pictures is to have the interesting parts in the upper half but also have something to look at in the lower part so that it’s not totally barren (there are other patterns as well).

For example, in the image on the right, the important part, the flower bowl, is at the top, while the pillar beneath it, with the flowers at the base, is also attractive if not the main event.

Similarly, in my brother’s flowers picture the important part, the blossoms, are at the top but the vase and the drooping stems keep the bottom interesting.

I don’t know of any such formula to avoid boring parts of landscape photos.

**Inside Every Portrait Image …**

… is a landscape image trying to get out. After watching my portrait photo slide show over and over I realized that this pattern was very common (and violated in images that I felt for the most part were badly composed).

For example, this picture of Berkeley’s Sather Gate looks a bit off. The top half is mostly empty blue sky, of no interest. The interesting part is in the bottom half.

Eventually I became curious about the result of cropping the image and removing the bottom, less interesting part. And keeping the top part which should be in landscape mode. The result, it turned out was in general a reasonable landscape (though not necessarily better) image.

Here are the landscape images lurking in the column and flowers pictured above.

What about Sather Gate? Extracting the top half would be pointless. The bottom half doesn’t work our either,, because the top of the Gate gets clipped. We need to crop a bit short of the bottom.

The result is a much better image.

There is no wasted space in the image.

In this case the cropping improves the image.

The question arises, what will be the aspect ratio of the cropped landscape image, assuming we take exactly 1/2 of it?

That depends on the original aspect ratio. If it’s 4:3, like my Aura frame, the new ratio is 3:2, or 4.5:3, close enough. If we want the exact same ratio, a little high school algebra tells us the ratio should be √2 : 1. This is very close to 7:5, a ratio offered by most image editing software.

Note: this is NOT the golden ratio – the golden ratio is 1.618, whereas √2 is 1.414. Any number of articles and books and articles will tell you the golden ratio is the key to artful images, but I’m suggesting something different. Which ratio is used in the Monna Lisa? Read on …

**Inside every landscape image …**

… is a portrait image struggling to get out.

As I said, I don’t view landscape images on my Aura but landscape images are omnipresent on the web. I was looking at the (all landscape) images in the New York Times. I was struck by the fact that almost all of them would benefit by a 1/2 crop to portrait.

In cropping portrait, you by default take the upper half. With landscape, the default is the middle half (with 1/4 removed on either side). But this may vary if the interesting part is not in the middle.

Here is the result of taking the image of the photographer at the beginning cropped down to the middle half.

I think it’s a better image, it really focuses on the camera and gives more prominence to the hands.

I could give dozens of examples of applying just the default rules but you get the idea.

**Testing on the Monna Lisa**.

No discussion of images is complete without talking about the Monna Lisa (correct Italian spelling), the most famous image in the world. How do my theories hold up? Quite well, as it turns out.

For a start, it’s in portrait mode. Can you think of a landscape image which is famous and the best landscape image in the world? I can’t.

And consider its aspect ratio. Recall the ideal ratio for cropping is 7:5. The Monna Lisa is 77cm x 53cm. Its aspect ratio is1.45, while 7/5 is 1.40 and √2 is 1.41. This is NOT the golden ratio.

Notice how the centre of attention, Monna Lisa’s face is solidly in the upper half. But the bottom half is not empty. We have a hint of cleavage, the robe, and the hands. It follows the formula exactly.

So let’s extract the upper half. The result is on the right.

I sort of like it. Her eyes and smile stand out more. Yet you never see this image, only the full portrait.

**Inside every portrait …**

… is a smaller portrait struggling to get out. And inside every landscape is another landscape …

You’ve probably noticed an interesting corollary of my rules. Portraits can be cropped to landscapes, but they in turn can be cropped to portraits. In other words, two stages of cropping can reduce a portrait to a smaller portrait.

Let’s try that with the cropped Monna Lisa landscape. The result is on the left. I like it too, again because it focuses on the eyes and the smile even more. I feel she’s looking right at me.

Let’s try 1/2 cropping on the portrait version of the photographer. It gives us an interesting take,

In fact we can 1/2 crop again, and we get the second portrait on the left.

The image is starting to get square because the original aspect ratio was not close to √2 . The aspect ratios of the crops oscillates.

**Take better pictures**

These considerations aren’t just of intellectual interest. They can help you take better pictures (or drawings, or paintings).

For portrait mode,

- decide what is the interesting part of the image
- get close enough so that the interesting part almost fills the frame from right to left
- point camera down so that the interesting part is right at the top
- make sure there is something interesting in the bottom half (foreground)

For landscape

- place the interesting part by default in the center
- make sure it fills the frame from top to bottom (get close)
- have an interesting background (e.g. a hedge or a sea view)

These rules aren’t logically complete – you can take good pictures not following them. Also, I’ve been told that the real pros don’t need rules, they just go by gut feel. But for the gutless rest of us, rules are better than nothing.

]]>