Skip to content

Lambda Calculus

Lambda calculus is a model of computation. It was introduced in the 1930s by Alonzo Church. He was trying to develop formal foundations for mathematics. His attempts were unsuccessful, as several of his systems turned out to be inconsistent. Nevertheless, he invented lambda calculus as a fragment of his formal system. Using this calculus, he solved the famous Hilbert's Entscheidungsproblem by showing that there are questions that cannot be solved algorithmically. Later, lambda calculus became a formal tool for analyzing computability and functional programming languages.[1]

Lambda calculus is probably the simplest universal programming language. Many of the key concepts that make functional programming so powerful today were invented for, or are inspired by lambda calculus. This includes for example currying, recursion, and closures. You will also see were all this parenthesis in Racket come from!

Lambda calculus can be viewed as a minimal programming language in which computations can be described and as a mathematical object about which rigorous statements can be proved. It serves as a core language that allows one to specify and study programming language features by translating them into the core language. We will discuss only the untyped variant of lambda calculus. There is also a typed version used in the study of type systems.[2]

Even though the syntax and semantics of lambda calculus are extremely simple, lambda calculus is Turing complete. In other words, it is possible to simulate any Turing machine with a program in lambda calculus and vice versa.

Syntax

The syntax of lambda calculus has only three types of terms: a variable (denoted by lowercase letters x,y,z,), the abstraction of a variable x from a term t defining a function (denoted λx.t), and the application of a term t1 to a term t2. Formally, we define lambda terms by the following grammar:

termvarfuncappfunc(λvar.term)app(term term)

The abstraction λx.t defines an anonymous function as in Racket. The variable x is its parameter, and t is its body. The application represents a function call.

Note that each abstraction and application introduce parentheses. To simplify the notation, we use several conventions to remove some of the parentheses.

  • We often leave the outermost parentheses.
  • The application is left-associative, e.g. e1e2e3e4 is (((e1e2)e3)e4)
  • The bodies of functions extends to the right as far as possible.

Using the above conventions, we can simplify the following term

(λx.((λy.((xy)x))z))

as follows:

λx.(λy.xyx)z

The red parentheses are the only ones that remain because they determine the body of the anonymous function defined by λy.

The parentheses are necessary if we want to represent lambda terms as sequences of characters. Alternatively, we avoid parentheses if we understand lambda terms as abstract syntax trees. The above lambda term λx.(λy.xyx)z corresponds to the following tree (the application nodes are denoted by the symbol @):

Before focusing on semantics, we need to introduce the scopes of variables. An occurrence of a variable x is said to be bound if it occurs in the body t of an abstraction λx.t. The term t is called scope of the binder λx. An occurrence of x is free if it is not bound. To find out whether an occurrence of x is bound or free, we can also track the path in the abstract syntax tree from the occurrence of x to the root. The occurrence is bound if we find the node λx on the path. If not, it is free. In the above term λx.(λy.xyx)z, all the occurrences are bound except for the only occurrence of z.[3] A lambda term is called closed (aka combinator) if all its variable occurrences are bound. Otherwise, it is called open.

Semantics

In lambda calculus, programs (i.e., lambda terms) consist of anonymous functions created by abstraction and function calls. The computation of such a program is the simplification process reducing the program to a lambda term that cannot be reduced anymore. This term is the value computed by the program.

A lambda term λx.t represents an anonymous function with an argument x and body t. The term λx.t corresponds to the following expression in Racket:

racket
(lambda (x) t)

We can call the function λx.t by applying it to another term e: (λx.t)e. A lambda term of the form (λx.t)e is called a redex because it is reducible by a substitution rule known as β-reduction. When we reduce a redex (λx.t)e, we substitute e for all free occurrences of x in t. The resulting term is denoted t[x:=e].

The β-reduction can be applied to any redex occurring in a lambda term. It represents a single step in the computation. We denote the relation that a term is β-reducible to another term by β.[4] For example, (λx.t)eβt[x:=e]

Let us see some examples. We denote the equality on lambda terms by .

(λx.x)(λy.y)βx[x:=(λy.y)](λy.y)(λx.xx)(λy.y)β(xx)[x:=(λy.y)](λy.y)(λy.y)βy[y:=(λy.y)](λy.y)(λx.x(λx.x))yβ(x(λx.x))[x:=y]y(λx.x)

Note that we substitute only for the free occurrence of x in the last example.

The substitution rule has one more caveat regarding the variable names. Consider a redex (λx.t)e. A free occurrence of a variable in e may become bound after the β-reduction. For example,

(λx.(λy.xy))yβλy.xy[x:=y]λy.yy

The above reduction step is not valid. The reason is that functions do not depend on the names of their arguments. Consider the following Racket program:

racket
(define y 5)
(define (cadd x) (lambda (y) (+ x y)))

If we now call

racket
(cadd y)

the resulting function cannot be

racket
(lambda (y) (+ y y))

which would be doubling its argument. Instead, it should be the function adding y=5 to its argument. To see that, we can simple rename the argument y, for instance, to z:

racket
> (define (cadd x) (lambda (z) (+ x z)))
> (cadd y) => (lambda (z) (+ y z))
(lambda (z) (+ 5 z))

Thus the correct reduction of the above lambda term should proceed as follows:

(λx.(λy.xy))y(λx.(λz.xz))yβλz.xz[x:=y]λz.yz

Whenever we want to substitute a term e for a variable in a term t, we must first check if any free occurrence in e becomes bound. If so, we must rename the clashing bound variable in t to a new name. The renaming of the function arguments is known as α-conversion. We will consider two lambda terms that are the same up to renaming a bound variable equivalent. For example,

λx.xzλy.yz

Formally, we can define the substitution by induction as follows:

x[x:=e]=ey[x:=e]=yif yx(t1t2)[x:=e]=(t1[x:=e]t2[x:=e])(λx.t)[x:=e]=(λx.t)(λy.t)[x:=e]=(λy.t[x:=e])if yx and y is not free in e(λy.t)[x:=e]=(λz.t[y:=z][x:=e])if yx and y is free in ez is a fresh variable

The last case is the one where we need to do the α-conversion and rename the function argument to a fresh name λz.t[y:=z] before we proceed with the substitution [x:=e].

Church-Rosser Theorems

The computation in lambda calculus is performed by β-reduction. Given a lambda term, we reduce it as much as possible by repetitive applications of β-reduction. The resulting irreducible lambda term is the computed value. However, a lambda term can contain several redexes, for example:

(λx.x)((λy.y)zredex)redex

This leads to natural questions: Does it matter which redex we reduce first? Can different reduction orders lead to different final values? Can some reduction orders terminate, whereas some do not? The answers to these questions are provided by the Church-Rosser theorems.

Before we state the Church-Rosser theorems, let us discuss a few examples showing what might happen. First of all, it may happen that the reduction process does not terminate no matter which reduction order is used! Consider the following term containing only a single redex:

(λx.xx)(λx.xx)β(λx.xx)(λx.xx)

The above lambda term corresponds, in fact, to an infinite loop.

For some expressions, the reduction process terminates for some reduction orders and diverges for others. For example, the term

(λx.y)((λx.xx)(λx.xx))

contains two redexes. The inner one whose reduction does not terminate. If we reduce the redex applying (λx.y) to the rest of the expression, the reduction stops immediately:

(λx.y)((λx.xx)(λx.xx))βy

A lambda term is said to be in normal form if it is irreducible, i.e., it contains no redexes. Reduction orders are also called evaluation strategies. Several of them were investigated. Let us introduce two of them:

  1. Normal order reduces the leftmost outermost redex first at each step.
  2. Applicative order reduces the leftmost innermost redex first at each step.

We say a redex is to the left of another redex if its lambda appears further left. The leftmost outermost redex is the leftmost redex not contained in any other redex. The leftmost innermost redex is the leftmost redex, not containing any other redex. For example, consider the following lambda term:

(λy.y)((λz.zz)x)((λz.(λa.a)z)(λy.(λz.z)x))

It has five redexes depicted in the following figure by the red color.

The leftmost outermost redex is (λy.y)((λz.zz)x). On the other hand, the leftmost innermost redex is (λz.zz)x.

Let's see how the above lambda term would be evaluated by the evaluation strategy following the normal order. The reduced redexes are denoted by the line:

(λy.y)((λz.zz)x)((λz.(λa.a)z)(λy.(λz.z)x))((λz.zz)x)((λz.(λa.a)z)(λy.(λz.z)x))xx((λz.(λa.a)z)(λy.(λz.z)x))xx((λa.a)(λy.(λz.z)x))xx(λy.(λz.z)x)xx(λy.x)

Finally, we get to the Church-Rosser theorems. The first theorem states that no matter which reduction order we choose, we will always get the same normal form provided the reduction process terminates. The second theorem states that the normal order always terminates, provided a sequence of β-reduction steps leads to a normal form.

Church-Rosser Theorems:

  1. Normal forms are unique (independently of evaluation strategy). Consequently, a lambda term cannot be reduced to two different normal forms.
  2. Normal order always finds a normal form if it exists.

Programming in lambda calculus

Lambda calculus has no numbers, arithmetic, Booleans, etc. It has only anonymous functions and function calls. Thus it might look pretty useless. However, lambda calculus is Turing complete. So it should be no surprise that we can encode all the things like numbers and arithmetic. We will build a few basic encodings from scratch in the following sections. Besides numbers and arithmetic, the most interesting thing is how to represent recursive functions in lambda calculus. That is not straightforward, considering there are only anonymous (nameless) functions.

Although lambda cannot internally introduce names for terms or functions, we can do it in our exposition to simplify the notation. We will denote specific lambda terms (usually combinators) with uppercase letters. For instance, the identity function is denoted by I:

Iλx.x

We will also introduce one more convention for writing lambda terms. Each function defined by abstraction is unary. However, we can have functions of higher arity due to currying. For instance, a function of two arguments x,y is defined by λx.(λy.t). To simplify this notation, we often group all the arguments under a single λ. For example, we replace λx.(λy.t) with λxy.t.

Booleans

We start with Boolean values and operations. Since Boolean values often appear in conditional expressions, we model them so that they work directly as the if-then-else expression. The if-then-else expression consists of a condition (i.e., a term whose value is either to true or false) and two expressions to be evaluated depending on the result of the condition. Thus we model Boolean values as binary projection functions:

Tλxy.xFλxy.y

Boolean values are functions of two arguments returning one of them. More precisely, T representing "true" returns the first argument, whereas F representing "false" returns the second. Consequently, if we have a lambda term ct1t2, where c is a condition term that is evaluated either to T or F, we get the behavior of the if-then-else expression:

TabβaFabβb

Considering the encoding of Boolean values, it is easy to encode basic Boolean operations conjunction , , and negation ¬. Recall that the conjunction of two arguments x,y is false if the x is false. If x is true, the result of xy is just y. This can be modeled as follows:

λxy.xyF

Let us check that the encoding works correctly. For an arbitrary lambda term t, we have

Ft(λxy.xyF)FtβFtFβFTt(λxy.xyF)TtβTtFβt

Similarly, we can define disjunction that is true if the first argument is true and the value of the second argument otherwise.

λxy.xTy

For an arbitrary lambda term t, we have

Tt(λxy.xTy)TtβTTtβTFt(λxy.xTy)FtβFTtβt

Finally, it is straightforward to encode negation:

¬λx.xFT¬T(λx.xFT)TβTFTβF,¬F(λx.xFT)FβFFTβT

Numbers and arithmetic

To encode numbers and arithmetic operations, we use the so-called Church numerals. They encode a natural number n as a binary function applying the first argument n-times to the second.

0λsz.z1λsz.sz2λsz.s(sz)3λsz.s(s(sz))

Note that 0F. It is usual in programming that a single value might have two different meanings depending on its context. For instance, the value 65 can represent the number 65 or the uppercase letter A.

Using the above encodings for numbers, one can easily define the successor function nn+1.

Sλnxy.x(nxy)

The input number n in (nxy) just applies n-times x to y, i.e., this expression is equivalent to n. Next, we add one more application of x, i.e., x(nxy). Let us compute the successor of 1:

S1(λnxy.s(nxy))(λsz.sz)βλxy.x((λsz.sz)xy)βλxy.x(xy)2

Once we have the successor function S, we can define the addition because we can get the result of n+m by applying n-times S to m. Thus we can model addition by the term:

Aλnm.nSm

For example, 2+3 is computed as follows:

A23β2S3(λsz.s(sz))S3βS(S3)β5

Multiplication of two numbers n,m can be represented by the term

Mλnmsz.n(ms)z

Since n is a numeral, applying it to (ms) and z results in a term applying n-times (ms) to z, i.e., (ms)((ms)(((ms)z))). Analogously, each application of (ms) results in m-many applications of s. Altogether, we get nm many applications of s to z. Let us see an example:

M23(λnmsz.n(ms)z)23βλsz.2(3s)zβλsz.(3s)((3s)z)βλsz.(3s)(s(s(sz)))βλsz.(s(s(s(s(s(sz))))))6

The multiplication term M can be further simplified to

Mλnms.n(ms)

We can remove the variable z because M and M behave identically. To see that, consider an abstraction λx.(tx) for a term t. It defines a function of an argument x applying t to x. If we apply it to any expression e, we obtain (λx.(tx))eβte. Thus we can directly replace λx.(tx) just with the term t. This simplification is known as η-reduction. Utilizing η-reduction to

Mλnmsz.n(ms)zλnms.(λz.(n(ms))z),

we end up with M.

In the following section on recursive functions, we will need the predecessor function on natural numbers nn1 (0 is mapped to 0). Unfortunately, encoding it is more complex than encoding the successor function S. We first need to introduce an encoding of pairs. Given two terms t,s, we define the pair consisting of t and s as follows:

t,sλz.zts

Thus the pair t,s is a function of argument z that is applied to the pair's components. Note that this is exactly how we encoded a 2D point as a function closure in Lecture 3.

racket
(define (point x y)
  (lambda (m) (m x y)))

Given a pair, we can access its components by applying it to the projection functions T and F.

t,sT(λz.zts)TβTtsβtt,sF(λz.zts)FβFtsβs

To create the predecessor function, we need to find a term such that if we apply it n-times to another term, it gets evaluated to n1. We first define a function mapping a pair n,m to n+1,n.

Φλpz.z(S(pT))(pT)

The function Φ takes a pair p, extracts its first component pT, computes its successor S(pT), and returns a pair consisting of the successor and the first component. The reason why Φ is important is that when we apply it n-times to the pair 0,0, we get the pair n,n1.

Φ0,0β1,0,Φ1,0β2,1,Φ2,1β3,2,

Consequently, we can define the predecessor function as the function applying n-times Φ to 0,0 and extracting the second component:

Pλn.nΦ0,0F

Zero test

To implement a recursive function, we need a condition when to stop the recursion. In the example we will discuss in the next section, we test whether a given number is zero. To check whether a given number is zero, recall that 0F is just the projection to the second argument. Consequently, 0tTβT for each lambda term t. Thus we need to find a term t such that when applied to T at least once, it returns F. Such a term t is the constant function always returning F, i.e., λx.F. Altogether, we define

Zλn.n(λx.F)T

So we have

Z0(λn.n(λx.F)T)0β0(λx.F)TβT

and for N>0

ZN(λn.n(λx.F)T)NβN(λx.F)Tβ(λx.F)(((λx.F)T))βF

Recursive functions

Finally, we are getting to the most exciting construction of how to encode recursive functions if we have only anonymous functions in lambda calculus. Recursive functions can be defined through the so-called Y-combinator.

Yλy.(λx.y(xx))(λx.y(xx))

Note that it is an expanded version of the term (λx.xx)(λx.xx) representing the infinite loop as Y applies a further argument y to xx. Let us see what happens if we apply Y to any lambda term R.

YRβ(λx.R(xx))(λx.R(xx))R

We obtained a term denoted R, which is something like a recursive version of R. To see that, note that R can produce as many applications of R as we wish:

R(λx.R(xx))(λx.R(xx))βR((λx.R(xx))(λx.R(xx)))RRβR(R((λx.R(xx))(λx.R(xx))))R(RR)β

Whether this process stops or not depends on R. If it contains a leftmost outermost redex, whose reduction discards R, the computation following the normal order terminates. However, the applicative order need not terminate as it can indefinitely reduce R'.

To see an example, we implement a function computing for a given natural number n the sum i=0ni. We can define the function recursively using the fact i=0ni=n+i=0n1i. In Racket, we would implement such a function as follows:

racket
(define (sum-to n)
  (if (= n 0)
      0
      (+ n (sum-to (- n 1)))))

We must apply the Y-combinator in lambda calculus instead. We define a function corresponding to the body of sum-to, but we replace the recursive call with a call of a function given as an argument.

Rλrn.Zn0(nS(r(Pn)))

The function tests if n is zero by Zn. If it is zero, Zn evaluates to T, which consequently returns its first argument, i.e., 0. Otherwise, Zn evaluates to F, which returns nS(r(Pn)). This expression is nothing else than n+r(n1).

Now it remains to turn R into its recursive version by the Y-combinator. We define

RYR

Let us see if it correctly sums up all the natural numbers to 3. Recall that RβRR.

R3YR3βRR3Z30(3S(R(P3)))βF0(3S(R(P3)))β3S(R(P3))β3S(R2)β3S(RR2)β3S(Z20(2S(R(P2))))β3S(2S(R1))β3S(2S(RR1))β3S(2S(1S(R0)))β3S(2S(1S(RR0)))3S(2S(1S(Z00(0S(R(P0))))))β3S(2S(1S(T0(0S(R(P0))))))β3S(2S(1S0))β6

Note that the expression in the last line contains several redexes. The one hidden in R can be reduced forever. On the other hand, if we reduce T0(0S(R(P0)))β0, the term R disappears. This is why the normal order terminates in this computation, whereas the applicative one does not.


  1. If you are interested in the history, check the entry on Alonzo Church in the Stanford Encyclopedia of Philosophy. ↩︎

  2. Benjamin C. Pierce: Types and programming languages. MIT Press 2002, ISBN 978-0-262-16209-8, pp. I-XXI, 1-623. ↩︎

  3. Note that the terminology on bound and free occurrences is completely analogous to the terminology used in first-order logic, where variables can be bound by quantifiers instead of λ. ↩︎

  4. We will abuse the notation and write tβs even if several β-reductions are needed to get from t to s. ↩︎