Lambda Calculus

Lambda calculus is a model of computation. It was introduced in the 1930s by Alonzo Church. He was trying to develop formal foundations for mathematics. His attempts were unsuccessful, as several of his systems turned out to be inconsistent. Nevertheless, he invented lambda calculus as a fragment of his formal system. Using this calculus, he solved the famous Hilbert's Entscheidungsproblem by showing that there are questions that cannot be solved algorithmically. Later, lambda calculus became a formal tool for analyzing computability and functional programming languages.^[1]

Lambda calculus is probably the simplest universal programming language. Many of the key concepts that make functional programming so powerful today were invented for, or are inspired by lambda calculus. This includes for example currying, recursion, and closures. You will also see were all this parenthesis in Racket come from!

Lambda calculus can be viewed as a minimal programming language in which computations can be described and as a mathematical object about which rigorous statements can be proved. It serves as a core language that allows one to specify and study programming language features by translating them into the core language. We will discuss only the untyped variant of lambda calculus. There is also a typed version used in the study of type systems.^[2]

Even though the syntax and semantics of lambda calculus are extremely simple, lambda calculus is Turing complete. In other words, it is possible to simulate any Turing machine with a program in lambda calculus and vice versa.

Syntax

The syntax of lambda calculus has only three types of terms: a variable (denoted by lowercase letters $x, y, z, \dots$ ), the abstraction of a variable $x$ from a term $t$ defining a function (denoted $λ x . t$ ), and the application of a term $t_{1}$ to a term $t_{2}$ . Formally, we define lambda terms by the following grammar:

\begin{aligned} term & \to var ∣ func ∣ app \\ func & \to (λ var . term) \\ app & \to (term term) \end{aligned}

The abstraction $λ x . t$ defines an anonymous function as in Racket. The variable $x$ is its parameter, and $t$ is its body. The application represents a function call.

Note that each abstraction and application introduce parentheses. To simplify the notation, we use several conventions to remove some of the parentheses.

We often leave the outermost parentheses.
The application is left-associative, e.g. $e_{1} e_{2} e_{3} e_{4}$ is $(((e_{1} e_{2}) e_{3}) e_{4})$
The bodies of functions extends to the right as far as possible.

Using the above conventions, we can simplify the following term

(λ x . ((λ y . ((x y) x)) z))

as follows:

λ x . (λ y . x y x) z

The red parentheses are the only ones that remain because they determine the body of the anonymous function defined by $λ y$ .

The parentheses are necessary if we want to represent lambda terms as sequences of characters. Alternatively, we avoid parentheses if we understand lambda terms as abstract syntax trees. The above lambda term $λ x . (λ y . x y x) z$ corresponds to the following tree (the application nodes are denoted by the symbol @):

Before focusing on semantics, we need to introduce the scopes of variables. An occurrence of a variable $x$ is said to be bound if it occurs in the body $t$ of an abstraction $λ x . t$ . The term $t$ is called scope of the binder $λ x$ . An occurrence of $x$ is free if it is not bound. To find out whether an occurrence of $x$ is bound or free, we can also track the path in the abstract syntax tree from the occurrence of $x$ to the root. The occurrence is bound if we find the node $λ x$ on the path. If not, it is free. In the above term $λ x . (λ y . x y x) z$ , all the occurrences are bound except for the only occurrence of $z$ .^[3] A lambda term is called closed (aka combinator) if all its variable occurrences are bound. Otherwise, it is called open.

Semantics

In lambda calculus, programs (i.e., lambda terms) consist of anonymous functions created by abstraction and function calls. The computation of such a program is the simplification process reducing the program to a lambda term that cannot be reduced anymore. This term is the value computed by the program.

A lambda term $λ x . t$ represents an anonymous function with an argument $x$ and body $t$ . The term $λ x . t$ corresponds to the following expression in Racket:

racket

(lambda (x) t)

We can call the function $λ x . t$ by applying it to another term $e$ : $(λ x . t) e$ . A lambda term of the form $(λ x . t) e$ is called a redex because it is reducible by a substitution rule known as $β$ -reduction. When we reduce a redex $(λ x . t) e$ , we substitute $e$ for all free occurrences of $x$ in $t$ . The resulting term is denoted $t [x := e]$ .

The $β$ -reduction can be applied to any redex occurring in a lambda term. It represents a single step in the computation. We denote the relation that a term is $β$ -reducible to another term by $\to^{β}$ .^[4] For example, $(λ x . t) e \to^{β} t [x := e]$

Let us see some examples. We denote the equality on lambda terms by $\equiv$ .

(λ x . x) (λ y . y) \to^{β} x [x := (λ y . y)] \equiv (λ y . y)

\begin{aligned} (λ x . x x) (λ y . y) & \to^{β} (x x) [x := (λ y . y)] \equiv (λ y . y) (λ y . y) \\ \to^{β} y [y := (λ y . y)] \equiv (λ y . y) \end{aligned}

(λ x . x (λ x . x)) y \to^{β} (x (λ x . x)) [x := y] \equiv y (λ x . x)

Note that we substitute only for the free occurrence of $x$ in the last example.

The substitution rule has one more caveat regarding the variable names. Consider a redex $(λ x . t) e$ . A free occurrence of a variable in $e$ may become bound after the $β$ -reduction. For example,

(λ x . (λ y . x y)) y \to^{β} λ y . x y [x := y] \equiv λ y . y y

The above reduction step is not valid. The reason is that functions do not depend on the names of their arguments. Consider the following Racket program:

racket

(define y 5)
(define (cadd x) (lambda (y) (+ x y)))

If we now call

racket

(cadd y)

the resulting function cannot be

racket

(lambda (y) (+ y y))

which would be doubling its argument. Instead, it should be the function adding $y = 5$ to its argument. To see that, we can simple rename the argument $y$ , for instance, to $z$ :

racket

> (define (cadd x) (lambda (z) (+ x z)))
> (cadd y) => (lambda (z) (+ y z))
(lambda (z) (+ 5 z))

Thus the correct reduction of the above lambda term should proceed as follows:

(λ x . (λ y . x y)) y \equiv (λ x . (λ z . x z)) y \to^{β} λ z . x z [x := y] \equiv λ z . y z

Whenever we want to substitute a term $e$ for a variable in a term $t$ , we must first check if any free occurrence in $e$ becomes bound. If so, we must rename the clashing bound variable in $t$ to a new name. The renaming of the function arguments is known as $α$ -conversion. We will consider two lambda terms that are the same up to renaming a bound variable equivalent. For example,

λ x . x z \equiv λ y . y z

Formally, we can define the substitution by induction as follows:

\begin{aligned} x [x := e] & = e \\ y [x := e] & = y if y \neq x \\ (t_{1} t_{2}) [x := e] & = (t_{1} [x := e] t_{2} [x := e]) \\ (λ x . t) [x := e] & = (λ x . t) \\ (λ y . t) [x := e] & = (λ y . t [x := e]) if y \neq x and y is not free in e \\ (λ y . t) [x := e] & = (λ z . t [y := z] [x := e]) if y \neq x and y is free in e; z is a fresh variable \end{aligned}

The last case is the one where we need to do the $α$ -conversion and rename the function argument to a fresh name $λ z . t [y := z]$ before we proceed with the substitution $[x := e]$ .

Church-Rosser Theorems

The computation in lambda calculus is performed by $β$ -reduction. Given a lambda term, we reduce it as much as possible by repetitive applications of $β$ -reduction. The resulting irreducible lambda term is the computed value. However, a lambda term can contain several redexes, for example:

\underset{redex}{\underset{⏟}{(λ x . x) (\overset{redex}{\overset{⏞}{(λ y . y) z}})}}

This leads to natural questions: Does it matter which redex we reduce first? Can different reduction orders lead to different final values? Can some reduction orders terminate, whereas some do not? The answers to these questions are provided by the Church-Rosser theorems.

Before we state the Church-Rosser theorems, let us discuss a few examples showing what might happen. First of all, it may happen that the reduction process does not terminate no matter which reduction order is used! Consider the following term containing only a single redex:

(λ x . x x) (λ x . x x) \to^{β} (λ x . x x) (λ x . x x)

The above lambda term corresponds, in fact, to an infinite loop.

For some expressions, the reduction process terminates for some reduction orders and diverges for others. For example, the term

(λ x . y) ((λ x . x x) (λ x . x x))

contains two redexes. The inner one whose reduction does not terminate. If we reduce the redex applying $(λ x . y)$ to the rest of the expression, the reduction stops immediately:

(λ x . y) ((λ x . x x) (λ x . x x)) \to^{β} y

A lambda term is said to be in normal form if it is irreducible, i.e., it contains no redexes. Reduction orders are also called evaluation strategies. Several of them were investigated. Let us introduce two of them:

Normal order reduces the leftmost outermost redex first at each step.
Applicative order reduces the leftmost innermost redex first at each step.

We say a redex is to the left of another redex if its lambda appears further left. The leftmost outermost redex is the leftmost redex not contained in any other redex. The leftmost innermost redex is the leftmost redex, not containing any other redex. For example, consider the following lambda term:

(λ y . y) ((λ z . z z) x) ((λ z . (λ a . a) z) (λ y . (λ z . z) x))

It has five redexes depicted in the following figure by the red color.

The leftmost outermost redex is $(λ y . y) ((λ z . z z) x)$ . On the other hand, the leftmost innermost redex is $(λ z . z z) x$ .

Let's see how the above lambda term would be evaluated by the evaluation strategy following the normal order. The reduced redexes are denoted by the line:

\begin{aligned} \underset{―}{(λ y . y) ((λ z . z z) x)} ((λ z . (λ a . a) z) (λ y . (λ z . z) x)) \\ \underset{―}{((λ z . z z) x)} ((λ z . (λ a . a) z) (λ y . (λ z . z) x)) \\ x x (\underset{―}{(λ z . (λ a . a) z) (λ y . (λ z . z) x)}) \\ x x (\underset{―}{(λ a . a) (λ y . (λ z . z) x)}) \\ x x (λ y . \underset{―}{(λ z . z) x}) \\ x x (λ y . x) \end{aligned}

Finally, we get to the Church-Rosser theorems. The first theorem states that no matter which reduction order we choose, we will always get the same normal form provided the reduction process terminates. The second theorem states that the normal order always terminates, provided a sequence of $β$ -reduction steps leads to a normal form.

Church-Rosser Theorems:

Normal forms are unique (independently of evaluation strategy). Consequently, a lambda term cannot be reduced to two different normal forms.
Normal order always finds a normal form if it exists.

Programming in lambda calculus

Lambda calculus has no numbers, arithmetic, Booleans, etc. It has only anonymous functions and function calls. Thus it might look pretty useless. However, lambda calculus is Turing complete. So it should be no surprise that we can encode all the things like numbers and arithmetic. We will build a few basic encodings from scratch in the following sections. Besides numbers and arithmetic, the most interesting thing is how to represent recursive functions in lambda calculus. That is not straightforward, considering there are only anonymous (nameless) functions.

Although lambda cannot internally introduce names for terms or functions, we can do it in our exposition to simplify the notation. We will denote specific lambda terms (usually combinators) with uppercase letters. For instance, the identity function is denoted by $I$ :

I \equiv λ x . x

We will also introduce one more convention for writing lambda terms. Each function defined by abstraction is unary. However, we can have functions of higher arity due to currying. For instance, a function of two arguments $x, y$ is defined by $λ x . (λ y . t)$ . To simplify this notation, we often group all the arguments under a single $λ$ . For example, we replace $λ x . (λ y . t)$ with $λ x y . t$ .

Booleans

We start with Boolean values and operations. Since Boolean values often appear in conditional expressions, we model them so that they work directly as the if-then-else expression. The if-then-else expression consists of a condition (i.e., a term whose value is either to true or false) and two expressions to be evaluated depending on the result of the condition. Thus we model Boolean values as binary projection functions:

\begin{aligned} T & \equiv λ x y . x \\ F & \equiv λ x y . y \end{aligned}

Boolean values are functions of two arguments returning one of them. More precisely, $T$ representing "true" returns the first argument, whereas $F$ representing "false" returns the second. Consequently, if we have a lambda term $c t_{1} t_{2}$ , where $c$ is a condition term that is evaluated either to $T$ or $F$ , we get the behavior of the if-then-else expression:

\begin{aligned} T a b & \to^{β} a \\ F a b & \to^{β} b \end{aligned}

Considering the encoding of Boolean values, it is easy to encode basic Boolean operations conjunction $\land$ , $\lor$ , and negation $\neg$ . Recall that the conjunction of two arguments $x, y$ is false if the $x$ is false. If $x$ is true, the result of $x \land y$ is just $y$ . This can be modeled as follows:

\land \equiv λ x y . x y F

Let us check that the encoding works correctly. For an arbitrary lambda term $t$ , we have

\begin{aligned} \land F t & \equiv (λ x y . x y F) F t \to^{β} F t F \to^{β} F \\ \land T t & \equiv (λ x y . x y F) T t \to^{β} T t F \to^{β} t \end{aligned}

Similarly, we can define disjunction that is true if the first argument is true and the value of the second argument otherwise.

\lor \equiv λ x y . x T y

For an arbitrary lambda term $t$ , we have

\begin{aligned} \lor T t & \equiv (λ x y . x T y) T t \to^{β} TT t \to^{β} T \\ \lor F t & \equiv (λ x y . x T y) F t \to^{β} FT t \to^{β} t \end{aligned}

Finally, it is straightforward to encode negation:

\neg \equiv λ x . x FT

\neg T \equiv (λ x . x FT) T \to^{β} TFT \to^{β} F, \neg F \equiv (λ x . x FT) F \to^{β} FFT \to^{β} T

Numbers and arithmetic

To encode numbers and arithmetic operations, we use the so-called Church numerals. They encode a natural number $n$ as a binary function applying the first argument $n$ -times to the second.

\begin{aligned} 0 & \equiv λ s z . z \\ 1 & \equiv λ s z . s z \\ 2 & \equiv λ s z . s (s z) \\ 3 & \equiv λ s z . s (s (s z)) \\ ⋮ \end{aligned}

Note that $0 \equiv F$ . It is usual in programming that a single value might have two different meanings depending on its context. For instance, the value $65$ can represent the number $65$ or the uppercase letter $A$ .

Using the above encodings for numbers, one can easily define the successor function $n \mapsto n + 1$ .

S \equiv λ n x y . x (n x y)

The input number $n$ in $(n x y)$ just applies $n$ -times $x$ to $y$ , i.e., this expression is equivalent to $n$ . Next, we add one more application of $x$ , i.e., $x (n x y)$ . Let us compute the successor of $1$ :

\begin{aligned} S 1 & \equiv (λ n x y . s (n x y)) (λ s z . s z) \\ \to^{β} λ x y . x ((λ s z . s z) x y) \\ \to^{β} λ x y . x (x y) \equiv 2 \end{aligned}

Once we have the successor function $S$ , we can define the addition because we can get the result of $n + m$ by applying $n$ -times $S$ to $m$ . Thus we can model addition by the term:

A \equiv λ n m . n S m

For example, $2 + 3$ is computed as follows:

\begin{aligned} A 23 & \to^{β} 2 S 3 \equiv (λ s z . s (s z)) S 3 \to^{β} S (S 3) \to^{β} 5 \end{aligned}

Multiplication of two numbers $n, m$ can be represented by the term

M \equiv λ n m s z . n (m s) z

Since $n$ is a numeral, applying it to $(m s)$ and $z$ results in a term applying $n$ -times $(m s)$ to $z$ , i.e., $(m s) ((m s) (\dots ((m s) z) \dots))$ . Analogously, each application of $(m s)$ results in $m$ -many applications of $s$ . Altogether, we get $n \cdot m$ many applications of $s$ to $z$ . Let us see an example:

\begin{aligned} M 23 & \equiv (λ n m s z . n (m s) z) 23 \to^{β} λ s z . 2 (3 s) z \to^{β} λ s z . (3 s) ((3 s) z) \\ \to^{β} λ s z . (3 s) (s (s (s z))) \to^{β} λ s z . (s (s (s (s (s (s z)))))) \equiv 6 \end{aligned}

The multiplication term $M$ can be further simplified to

M^{'} \equiv λ n m s . n (m s)

We can remove the variable $z$ because $M$ and $M^{'}$ behave identically. To see that, consider an abstraction $λ x . (t x)$ for a term $t$ . It defines a function of an argument $x$ applying $t$ to $x$ . If we apply it to any expression $e$ , we obtain $(λ x . (t x)) e \to^{β} t e$ . Thus we can directly replace $λ x . (t x)$ just with the term $t$ . This simplification is known as $η$ -reduction. Utilizing $η$ -reduction to

M \equiv λ n m s z . n (m s) z \equiv λ n m s . (λ z . (n (m s)) z),

we end up with $M^{'}$ .

In the following section on recursive functions, we will need the predecessor function on natural numbers $n \mapsto n - 1$ ( $0$ is mapped to $0$ ). Unfortunately, encoding it is more complex than encoding the successor function $S$ . We first need to introduce an encoding of pairs. Given two terms $t, s$ , we define the pair consisting of $t$ and $s$ as follows:

⟨ t, s ⟩ \equiv λ z . z t s

Thus the pair $⟨ t, s ⟩$ is a function of argument $z$ that is applied to the pair's components. Note that this is exactly how we encoded a 2D point as a function closure in Lecture 3.

racket

(define (point x y)
  (lambda (m) (m x y)))

Given a pair, we can access its components by applying it to the projection functions $T$ and $F$ .

\begin{aligned} ⟨ t, s ⟩ T & \equiv (λ z . z t s) T \to^{β} T t s \to^{β} t \\ ⟨ t, s ⟩ F & \equiv (λ z . z t s) F \to^{β} F t s \to^{β} s \end{aligned}

To create the predecessor function, we need to find a term such that if we apply it $n$ -times to another term, it gets evaluated to $n - 1$ . We first define a function mapping a pair $⟨ n, m ⟩$ to $⟨ n + 1, n ⟩$ .

Φ \equiv λ p z . z (S (p T)) (p T)

The function $Φ$ takes a pair $p$ , extracts its first component $p T$ , computes its successor $S (p T)$ , and returns a pair consisting of the successor and the first component. The reason why $Φ$ is important is that when we apply it $n$ -times to the pair $⟨ 0, 0 ⟩$ , we get the pair $⟨ n, n - 1 ⟩$ .

Φ ⟨ 0, 0 ⟩ \to^{β} ⟨ 1, 0 ⟩, Φ ⟨ 1, 0 ⟩ \to^{β} ⟨ 2, 1 ⟩, Φ ⟨ 2, 1 ⟩ \to^{β} ⟨ 3, 2 ⟩, \dots

Consequently, we can define the predecessor function as the function applying $n$ -times $Φ$ to $⟨ 0, 0 ⟩$ and extracting the second component:

P \equiv λ n . n Φ ⟨ 0, 0 ⟩ F

Zero test

To implement a recursive function, we need a condition when to stop the recursion. In the example we will discuss in the next section, we test whether a given number is zero. To check whether a given number is zero, recall that $0 \equiv F$ is just the projection to the second argument. Consequently, $0 t T \to^{β} T$ for each lambda term $t$ . Thus we need to find a term $t$ such that when applied to $T$ at least once, it returns $F$ . Such a term $t$ is the constant function always returning $F$ , i.e., $λ x . F$ . Altogether, we define

Z \equiv λ n . n (λ x . F) T

So we have

Z 0 \equiv (λ n . n (λ x . F) T) 0 \to^{β} 0 (λ x . F) T \to^{β} T

and for $N > 0$

ZN \equiv (λ n . n (λ x . F) T) N \to^{β} N (λ x . F) T \to^{β} (λ x . F) (\dots ((λ x . F) T) \dots) \to^{β} F

Recursive functions

Finally, we are getting to the most exciting construction of how to encode recursive functions if we have only anonymous functions in lambda calculus. Recursive functions can be defined through the so-called $Y$ -combinator.

Y \equiv λ y . (λ x . y (x x)) (λ x . y (x x))

Note that it is an expanded version of the term $(λ x . x x) (λ x . x x)$ representing the infinite loop as $Y$ applies a further argument $y$ to $x x$ . Let us see what happens if we apply $Y$ to any lambda term $R$ .

YR \to^{β} (λ x . R (x x)) (λ x . R (x x)) \equiv R^{'}

We obtained a term denoted $R^{'}$ , which is something like a recursive version of $R$ . To see that, note that $R^{'}$ can produce as many applications of $R$ as we wish:

\begin{aligned} R^{'} & \equiv (λ x . R (x x)) (λ x . R (x x)) \\ \to^{β} R ((λ x . R (x x)) (λ x . R (x x))) \equiv R R^{'} \\ \to^{β} R (R ((λ x . R (x x)) (λ x . R (x x)))) \equiv R (R R^{'}) \\ \to^{β} \dots \end{aligned}

Whether this process stops or not depends on $R$ . If it contains a leftmost outermost redex, whose reduction discards $R^{'}$ , the computation following the normal order terminates. However, the applicative order need not terminate as it can indefinitely reduce $R$ '.

To see an example, we implement a function computing for a given natural number $n$ the sum $\sum_{i = 0}^{n} i$ . We can define the function recursively using the fact $\sum_{i = 0}^{n} i = n + \sum_{i = 0}^{n - 1} i$ . In Racket, we would implement such a function as follows:

racket

(define (sum-to n)
  (if (= n 0)
      0
      (+ n (sum-to (- n 1)))))

We must apply the $Y$ -combinator in lambda calculus instead. We define a function corresponding to the body of sum-to, but we replace the recursive call with a call of a function given as an argument.

R \equiv λ r n . Z n 0 (n S (r (P n)))

The function tests if $n$ is zero by $Z n$ . If it is zero, $Z n$ evaluates to $T$ , which consequently returns its first argument, i.e., $0$ . Otherwise, $Z n$ evaluates to $F$ , which returns $n S (r (P n))$ . This expression is nothing else than $n + r (n - 1)$ .

Now it remains to turn $R$ into its recursive version by the $Y$ -combinator. We define

R^{'} \equiv YR

Let us see if it correctly sums up all the natural numbers to $3$ . Recall that $R^{'} \to^{β} {RR}^{'}$ .

\begin{aligned} R^{'} 3 & \equiv YR 3 \to^{β} R R^{'} 3 \equiv Z 30 (3 S (R^{'} (P 3))) \\ \to^{β} F 0 (3 S (R^{'} (P 3))) \to^{β} 3 S (R^{'} (P 3)) \\ \to^{β} 3 S (R^{'} 2) \to^{β} 3 S ({RR}^{'} 2) \\ \to^{β} 3 S (Z 20 (2 S (R^{'} (P 2)))) \to^{β} 3 S (2 S (R^{'} 1)) \\ \to^{β} 3 S (2 S ({RR}^{'} 1)) \to^{β} 3 S (2 S (1 S (R^{'} 0))) \\ \to^{β} 3 S (2 S (1 S ({RR}^{'} 0))) \equiv 3 S (2 S (1 S (Z 00 (0 S (R^{'} (P 0)))))) \\ \to^{β} 3 S (2 S (1 S (T 0 (0 S (R^{'} (P 0)))))) \to^{β} 3 S (2 S (1 S 0)) \to^{β} 6 \end{aligned}

Note that the expression in the last line contains several redexes. The one hidden in $R^{'}$ can be reduced forever. On the other hand, if we reduce $T 0 (0 S (R^{'} (P 0))) \to^{β} 0$ , the term $R^{'}$ disappears. This is why the normal order terminates in this computation, whereas the applicative one does not.

If you are interested in the history, check the entry on Alonzo Church in the Stanford Encyclopedia of Philosophy. ↩︎
Benjamin C. Pierce: Types and programming languages. MIT Press 2002, ISBN 978-0-262-16209-8, pp. I-XXI, 1-623. ↩︎
Note that the terminology on bound and free occurrences is completely analogous to the terminology used in first-order logic, where variables can be bound by quantifiers instead of $λ$ . ↩︎
We will abuse the notation and write $t \to^{β} s$ even if several $β$ -reductions are needed to get from $t$ to $s$ . ↩︎

Lambda Calculus ​

Syntax ​

Semantics ​

Church-Rosser Theorems ​

Programming in lambda calculus ​

Booleans ​

Numbers and arithmetic ​

Zero test ​

Recursive functions ​

Lambda Calculus

Syntax

Semantics

Church-Rosser Theorems

Programming in lambda calculus

Booleans

Numbers and arithmetic

Zero test

Recursive functions