A Functional Approach to Dictionaries

Dictionaries are extremely useful data objects. A dictionary is an associative store, basically a set of key-value pairs. A dictionary takes a key and returns the associated value.

Python has dictionaries and the Python engineers have put a lot of effort into making them efficient in both space and time. The PyFL interpreter makes extensive use of dictionaries and they are responsible for major improvements in performance. (PyFL is my experimental functional language.)

The obvious conclusion is that PyFL, too, should have dictionaries. A PyFL dictionary could simply be a Python dictionary wrapped in a functional interface.

In fact this is not so simple as it seems. The Python dictionary interface is imperative: the basic primitive allows a given value to be assigned to a given key. The dictionary is altered in place.

However, in a functional language data objects are immutable. Instead, we have a function which takes a dictionary and a key and a value and returns a new dictionary that differs from the original (at most) in that the given value is associated with the given key.

We also need a primitive to return the value associated with a given key, and a predicate to check that a given key has a value associated with it.

The basic primitives

It all begins with an empty dictionary. In our case, with dict0("student","grade"). This dictionary is empty but it has student and grade as the component names of any future entries.

Adding entries is done with the update function. Thus

d1 = update(dict0,'Tom',83)

defines d1 to be (denote) a dictionary with one key value pair, with 83 associated with the string (denoted by) 'Tom'.

If in addition to we have

d2 = update(d1,'Dick',94)
d3 = update(d2,'Harry',75)
marks = update(d3,'Sally',91)

we can think of marks as denoting a marks table for a small class.

Individual marks are retrieved using the infix operator atkey. Thus marks atkey 'Dick' evaluates to 94 and marks atkey 'Sally' evaluates to 91.

We can check if a dictionary has a given key using the haskey infix operator. To check if 'Tom' is a key, we evaluate marks haskey 'Tom', which yields true. On the other hand, marks haskey 'John' gives false.

We don’t have to build up a dictionary by adding entries one at a time, We can use the extend operator, whose second argument is a whole list of entries,

We represent individual entries as structs, with in this case student and grade components. For example,

<< student:'Tom' grade: 83 >>

The second argument of extend is a list of such entries. We could have defined marks as

marks = dict0("student","grade") extend
          [%
             << student: 'Tom'   grade: 83 >>,
             << student: 'Dick'  grade: 91 >>,
             << student: 'Harry' grade: 75 >>,
             << student: 'Sally' grade: 94 >>
           %]

The dictionary produced has all the new entries. If the original dictionary had entries for the given keys, they are overwritten.

The operations map and filter

There are also two important operations, map and filter, which allow us to avoid inefficient one-entry-at-a-time computations.

Suppose, for example, that we need to increase each class mark by 5. We could laboriously extract each mark, add 5, then update the dictionary. The first step would be

tommark = marks atkey 'Tom'
marks1 = update(marks,'Tom',tommark+5)

and this would be repeated three more times, creating three more dictionaries in the process.

Instead, we can use the map operator. It takes a dictionary and a function and returns the dictionary which results if we apply the function to each of the original dictionary’s values. Thus

newmarks = marks map lambda (m) m+5 end

or equivalently

marks map upgrade
where
 upgrade(m) = m+5
end

Now suppose we want to look at only those who got A+ by having a mark at least 90. Once again we could laboriously extract individual entries. Instead we use the filter operator. It takes a dictionary and a predicate and returns the dictionary formed by taking those entries in the original dictionary which satisfy the predicate. Then

apmarks = marks filter lambda (e) e.grade >= 90 end

or equivalently

marks filter aplus
where
 aplus(e) = e.grade >= 90
end

(Here the variable e stands for an entry, i.e. a struct with component names student and grade.)

VBOs

There are many more calculations that map and filter can’t perform. For example, suppose we want to compute the average mark. The operations map and filter are of no use. We do know how to compute the average component of a list L, we use the avg vbo: avg x for x in L end. Our idea is to use the same approach for dictionaries.

We do this by allowing dictionaries to be ranges of VBO expressions. The following expression

avg e.grade for e in dict marks end

computes the average mark. Here the variable e ranges over the entries of marks.

If we want the average of the A+ students we just add a condition:

avg e.grade for e in dict marks:e.grade >= 90 end

This gives us the set of names of the A+ students:

collect e.student for e in dict marks: e.grade >= 90 end

and it evaluates to the set denoted by {'Dick','Sally'}.

To calculate the top mark and the names of those who earned it

those e.student : e.grade == topgrade for e in dict marks end
where
 topgrade = max e.grade for e in marks
end

This evaluates to the set denoted by the constant {'Sally'}.

Fessing up

Now a confession: most of this is vaporware, I’ve only implemented the basic primitives.

But the basic primitive don’t make for much of a blog post, so I faked the rest of the features.

I intend to implement them but not real soon (now). I’m currently working on a comprehensive PyFL manual (with Vaios Michalakis). When it’s done I’ll post it here.

Unknown's avatar

About Bill Wadge

I am a retired Professor in Computer Science at UVic.
This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.