The Intensional Spreadsheet [8000 views]

A while back Weichang Du and I designed a spreadsheet based on intensional logic, the logic of values that vary over a coordinate space.

3cbb8c11b8e378eb2a3f0da1c0bcb66c Spreadsheets are a natural fit for ‘intensifying’ because a sheet is already a two-dimensional intension, varying over the horizontal (A, B, C, …) and vertical (1, 2, 3, …) dimensions. But we can do better than just redo Excel with intensional semantics. Intensionality opens up some interesting possibilities; like user-defined operators, time varying sheets, or nested sheets.

This post is based on the IEEE Software article Du and I wrote, A Three Dimensional Spreadsheet, but with a different notation (to simplify the presentation).

The Fibonacci Numbers

Let’s start with a simple example – the Fibonacci numbers. Suppose we want to compute them and display them in row 1. In Excel it’s easy peasy, we put 0 in cell A1 and 1 in cell B1; then A1+B1 in cell C1, B1+C1 in cell D1, C1+D1 in cell E1, etc. The formulas in the cells are

A   B     C       D       E      F
-----------------------------------
0 | 1 | A1+B1 | B1+C1 | C1+D1 | ...

and the values displayed are

A   B   C   D   E    F
-----------------------------------
0 | 1 | 2 | 3 | 5 | ...

Notice that the formulas in the cells are all different; although if you know Excel, you know that they are all ‘copies’ of the formula in C2. Not literally copies, but functional copies. The value in cell E1 bears the same relationship to the two values on the left as C1 does to the two values on its left.

Confused? Never mind, it all works out more simply with the intensional spreadsheet.

The basic idea is that we define a sheet (which needs a name, say F) by putting expressions in cells. The difference is that they are in the language of two dimensional intensions. Let’s call the horizontal dimension h and the vertical dimension v. We need space versions of the traditional Lucid temporal operators first, next, and fby. (Plus prev).

Let’s call next.h right and prev.h left (for obvious reasons). Then we put the following formulas in the cells A1, B1, C1, …

A   B          C                      D                       E                
------------------------------------------------------------------------------
0 | 1 | left F + left left F | left F + left left F  | left F + left left F  | ...

The first thing to notice is that apart from the first two, the formulas are exactly the same. Copying in the intensional spreadsheet is just that.

Also evaluation is just that. To get (for example) the numerical value of the cell D1, we evaluate the formula stored in D1. This evaluation will entail evaluating the formulas in C1 and eventually in A1 and B1, the base cases of the recursion. Note that this is eduction (more below).

[It’s been pointed out that you can get something close to this with RC notation. R[-2]C is the cell two columns to the left on the same row. This notation is not as general; for example we can write left(F+left F).]

Absolute References

It should be clear that most spread sheets are easily formulated in the new notation. The only question is how to do what are known as ‘absolute references’, that are not altered by traditional copying. One approach is to use expressions like A2 as absolute coordinates; so that e.g. F.A2 is the contents of exactly the A2 cell of F (in the above, 1) and it gets copied as exactly F.A2.

As a special case, we have operators side and top that take us to the side and top of the sheet, respectively. So side F is F.A and top F is F.0.

Default Formulas

The fact that all but the first two formulas in the Fibonacci sheet are the same suggests some shortcuts. The user interface could allow us to select a set of cells and put a single formula into each of them.

Another idea is to allow us to give a default formula for an entire row (or column), with the understanding that putting formulas in individual cells overrides the default.

As another example we could define a sheet P by putting 1 in every cell in row 1, 1 in every cell in column A, and the default formula

left P + up P

in all the other cells of P. The result is to make every column a running total of the column to the left. The result is Pascal’s triangle (tilted to the left)

---------------------
| 1 | 1 | 1 | 1 | 1 ...
---------------------
| 1 | 2 | 3 | 4 | 5 
---------------------
| 1 | 3 | 6 |10 |15 
---------------------
| 1 | 4 |10 |20 |35 
---------------------
| 1 | 5 |15 |35 |70
---------------------

Global Definitions

Our spreadsheet design allows for definitions that apply globally. For example, if we add the global definition

F2 = left F + left left F

then we can put F2 in cells C1, D1, E1 etc. We can have global user defined functions like

left2(X) = left left X

and define F2 as

left F + left2(F)

Functions can of course be defined recursively; for example

leftsum(X,i) = if i eq 0 then 0 else left X + leftsum(left X,i-1) fi

defines e.g. leftsum(F,10) to be the sum of the 10 values to the left. If we want to sum all the values to the left we can add the global definition

leftsumall(X) = leftsum(X,index.h)

where index.d is the coordinate in the d dimension.

Spreadstreams

Since we already have dimensions h and v it’s no technical problem to allow the traditional Lucid time dimension t. The result is a three-dimensional spreadsheet which could be understood as a time varying two dimensional spreadsheet. For example, a traditional spreadsheet might display sales (S) of car models by salesman. With the time dimension we can do this for every month, starting with (say) January as month 0.

Then we can have extra cells (along the side and the bottom) which display totals for model and salesman (for the current month). But with the time dimension we can also display percentage change over the last month. For example of D10 displays Bob’s total month sales, we can put the formula

100*(S – prev S)/(prev S)

in D11 which will now display the percentage change in Bob’s sales.

Another example of a time varying spreadsheet is that for the relaxation method. It is used to give approximate solutions to the Poisson equation ∇² P = 0. It is an iterative process where each (interior) cell value is repeatedly replaced by the average of the values above and below and on either side; for example, the value of F8 is replace by the average of the values of F7, F9, E8 and G8.

We can program this by placing the formula

prev (left P + right P + up P + down P)/4

in all the interior cells.

Nested spreadsheets

The intensional approach allows another generalization: nested spreadsheets. By this we mean that any cell of the main sheet can (but not necessarily does) contain a whole sub-spreadsheet. And that any cell of the sub-spreadsheet might contain a sub-sub-spreadsheet, and so on.

Indexing makes this simple: any cell of the sub-spreadsheet is indexed by two letter-number combinations. For example, if F8 has a sub-sheet, its cells have coordinates like F8.C10 or F8.H7. In the same way a cell in a sub-sub-sheet might have an index F8.H7.K21.

Nested sheets would allow a very natural way of representing hierarchical data. The top row of the main sheet could hold sales per year and month (two dimensions) and each cell could hold data of sales per month/salesperson (two more dimensions).

We would need extra intensional operators which take us down one level or across to neighbouring sheets. We haven’t worked out the details.

Demand-driven evaluation

This all sounds extremely ambitious and even impractical; and would be if we tried to implement it with conventional techniques. For example, if the user can open sub-sheets on the fly, how do we know how many cells to allocate?

It is all made possible by a simple technique for evaluation of intensional systems, namely demand driven (lazy) evaluation; also called eduction.

Briefly, it proceeds as follows: to determine that value that should go in a cell c, with coordinates θ, we first determine the defining formula φ_θ that goes in c. That’s not necessarily straight forward since we may have to use θ take defaults into account. Once we have determined φ_θ, we can cache it labelled by θ.

The next step is to evaluate φ_θ at θ, giving value v_θ. This may require us to evaluate other φ’s at other θ’s, giving other v’s. When we’re done, we can cache v_θ labelled by θ.

We repeat the procedure for all the cells we’re interested in. When we’re done, we’ve produced an intension Δ that has at each coordinate θ value v_θ. Then Δ has the property that at each coordinate θ the value of the formula at φ_θ given the intension Δ is the value stored in the cell at coordinate θ. This is the denotational semantics of the spreadsheet.

Evaluation order

With a conventional spreadsheet a formula is evaluated as soon as it is entered. We can do the same, but there is another option: to wait till all the formulas are entered (including by defaults) then evaluate. The demand driven algorithm determines the order of evaluation. Not that it matters – no matter what order you ask for values to be computed, you’ll end up with the same results (that’s one thing the semantics tells us).

Of course if an entry or default is changed then all the current values are potentially incorrect. The easiest way to deal with this is to recompute them all. However it’s possible to do dependency analysis and maybe reduce the scope of the required recalculation. In so doing we must not forget the caches, as some of their entries will also be incorrect.

A confession

You’ve probably noticed that the explanations are a bit vague. This is because the intensional spreadsheet is vaporware – it is not currently implemented (though Du produced a proof-of-concept implementation of an early version).

The main reason for the vapor status is that implementing a usable spreadsheet is an immense task. For a start it means implementing a programming language, the language of formulas. Plus a graphical user environment. And then add charting, formatting and the like. For a spreadsheet to be successful, it has to compete with Excel, which is huge.

The open office experience shows that it is possible, but only as the result of hundreds of contributors in an open source software project.

We would be happy for any of you to take up these ideas any way you want. The intensional spreadsheet is in many ways simpler than a conventional one, and at the same time has more power and potential. Let’s hope it becomes more than just promising vapor.

2 Responses to The Intensional Spreadsheet [8000 views]

Derek Colley says:

April 20, 2020 at 6:55 pm

It strikes me that your model is somewhat similar to the way the relational algebra works. For example, when you speculate on how multiple dimensions can be represented in an Excel-like spreadsheet, you are referring (in relational parlance) to a theta-join relationship between one relation R and another S, linked on some common attribute key of R and S. There is a whole pile of theory and practice (including the use of intension) underpinning the proper specification of these tabular schemas, and a set of rules – axiomatic set theory – that date back to Cantor. Likewise, recurring (or recursive) functions are implemented both in theory and practice.

I love the theoretical direction your work is going in, but I suspect that if you proceed down this path you will end up with something very much like the relational model. Are there any significant differences that you see?

Loading...

Laurence says:

April 21, 2020 at 8:07 am

This is an interesting write up but I think you’re underplaying the similarities between what you’re describing and R1C1. I appreciate your edit which says:

> It’s been pointed out that you can get something close to this with R1C1 notation. R[-2]C is the cell two columns to the left on the same row. This notation is not as general; for example we can write left(F+left F).

But doing so isn’t useful because you are plotting backwards from a known absolute reference anyway – thus you will already know what the end cell you want is in absolute terms (ie A1 reference). I believe Excel does even allow you to mix A1-style absolute references with R1C1 formulas too.

Loading...