Lambda Calculus
Lambda calculus is a model of computation. It was introduced in the 1930s by Alonzo Church. He was trying to develop formal foundations for mathematics. His attempts were unsuccessful, as several of his systems turned out to be inconsistent. Nevertheless, he invented lambda calculus as a fragment of his formal system. Using this calculus, he solved the famous Hilbert's Entscheidungsproblem by showing that there are questions that cannot be solved algorithmically. Later, lambda calculus became a formal tool for analyzing computability and functional programming languages.[1]
Lambda calculus is probably the simplest universal programming language. Many of the key concepts that make functional programming so powerful today were invented for, or are inspired by lambda calculus. This includes for example currying, recursion, and closures. You will also see were all this parenthesis in Racket come from!
Lambda calculus can be viewed as a minimal programming language in which computations can be described and as a mathematical object about which rigorous statements can be proved. It serves as a core language that allows one to specify and study programming language features by translating them into the core language. We will discuss only the untyped variant of lambda calculus. There is also a typed version used in the study of type systems.[2]
Even though the syntax and semantics of lambda calculus are extremely simple, lambda calculus is Turing complete. In other words, it is possible to simulate any Turing machine with a program in lambda calculus and vice versa.
Syntax
The syntax of lambda calculus has only three types of terms: a variable (denoted by lowercase letters ), the abstraction of a variable from a term defining a function (denoted ), and the application of a term to a term . Formally, we define lambda terms by the following grammar:
The abstraction defines an anonymous function as in Racket. The variable is its parameter, and is its body. The application represents a function call.
Note that each abstraction and application introduce parentheses. To simplify the notation, we use several conventions to remove some of the parentheses.
- We often leave the outermost parentheses.
- The application is left-associative, e.g. is
- The bodies of functions extends to the right as far as possible.
Using the above conventions, we can simplify the following term
as follows:
The red parentheses are the only ones that remain because they determine the body of the anonymous function defined by .
The parentheses are necessary if we want to represent lambda terms as sequences of characters. Alternatively, we avoid parentheses if we understand lambda terms as abstract syntax trees. The above lambda term corresponds to the following tree (the application nodes are denoted by the symbol @):
Before focusing on semantics, we need to introduce the scopes of variables. An occurrence of a variable is said to be bound if it occurs in the body of an abstraction . The term is called scope of the binder . An occurrence of is free if it is not bound. To find out whether an occurrence of is bound or free, we can also track the path in the abstract syntax tree from the occurrence of to the root. The occurrence is bound if we find the node on the path. If not, it is free. In the above term , all the occurrences are bound except for the only occurrence of .[3] A lambda term is called closed (aka combinator) if all its variable occurrences are bound. Otherwise, it is called open.
Semantics
In lambda calculus, programs (i.e., lambda terms) consist of anonymous functions created by abstraction and function calls. The computation of such a program is the simplification process reducing the program to a lambda term that cannot be reduced anymore. This term is the value computed by the program.
A lambda term represents an anonymous function with an argument and body . The term corresponds to the following expression in Racket:
(lambda (x) t)
We can call the function by applying it to another term : . A lambda term of the form is called a redex because it is reducible by a substitution rule known as -reduction. When we reduce a redex , we substitute for all free occurrences of in . The resulting term is denoted .
The -reduction can be applied to any redex occurring in a lambda term. It represents a single step in the computation. We denote the relation that a term is -reducible to another term by .[4] For example,
Let us see some examples. We denote the equality on lambda terms by .
Note that we substitute only for the free occurrence of in the last example.
The substitution rule has one more caveat regarding the variable names. Consider a redex . A free occurrence of a variable in may become bound after the -reduction. For example,
The above reduction step is not valid. The reason is that functions do not depend on the names of their arguments. Consider the following Racket program:
(define y 5)
(define (cadd x) (lambda (y) (+ x y)))
If we now call
(cadd y)
the resulting function cannot be
(lambda (y) (+ y y))
which would be doubling its argument. Instead, it should be the function adding to its argument. To see that, we can simple rename the argument , for instance, to :
> (define (cadd x) (lambda (z) (+ x z)))
> (cadd y) => (lambda (z) (+ y z))
(lambda (z) (+ 5 z))
Thus the correct reduction of the above lambda term should proceed as follows:
Whenever we want to substitute a term for a variable in a term , we must first check if any free occurrence in becomes bound. If so, we must rename the clashing bound variable in to a new name. The renaming of the function arguments is known as -conversion. We will consider two lambda terms that are the same up to renaming a bound variable equivalent. For example,
Formally, we can define the substitution by induction as follows:
The last case is the one where we need to do the -conversion and rename the function argument to a fresh name before we proceed with the substitution .
Church-Rosser Theorems
The computation in lambda calculus is performed by -reduction. Given a lambda term, we reduce it as much as possible by repetitive applications of -reduction. The resulting irreducible lambda term is the computed value. However, a lambda term can contain several redexes, for example:
This leads to natural questions: Does it matter which redex we reduce first? Can different reduction orders lead to different final values? Can some reduction orders terminate, whereas some do not? The answers to these questions are provided by the Church-Rosser theorems.
Before we state the Church-Rosser theorems, let us discuss a few examples showing what might happen. First of all, it may happen that the reduction process does not terminate no matter which reduction order is used! Consider the following term containing only a single redex:
The above lambda term corresponds, in fact, to an infinite loop.
For some expressions, the reduction process terminates for some reduction orders and diverges for others. For example, the term
contains two redexes. The inner one whose reduction does not terminate. If we reduce the redex applying to the rest of the expression, the reduction stops immediately:
A lambda term is said to be in normal form if it is irreducible, i.e., it contains no redexes. Reduction orders are also called evaluation strategies. Several of them were investigated. Let us introduce two of them:
- Normal order reduces the leftmost outermost redex first at each step.
- Applicative order reduces the leftmost innermost redex first at each step.
We say a redex is to the left of another redex if its lambda appears further left. The leftmost outermost redex is the leftmost redex not contained in any other redex. The leftmost innermost redex is the leftmost redex, not containing any other redex. For example, consider the following lambda term:
It has five redexes depicted in the following figure by the red color.
The leftmost outermost redex is . On the other hand, the leftmost innermost redex is .
Let's see how the above lambda term would be evaluated by the evaluation strategy following the normal order. The reduced redexes are denoted by the line:
Finally, we get to the Church-Rosser theorems. The first theorem states that no matter which reduction order we choose, we will always get the same normal form provided the reduction process terminates. The second theorem states that the normal order always terminates, provided a sequence of -reduction steps leads to a normal form.
Church-Rosser Theorems:
- Normal forms are unique (independently of evaluation strategy). Consequently, a lambda term cannot be reduced to two different normal forms.
- Normal order always finds a normal form if it exists.
Programming in lambda calculus
Lambda calculus has no numbers, arithmetic, Booleans, etc. It has only anonymous functions and function calls. Thus it might look pretty useless. However, lambda calculus is Turing complete. So it should be no surprise that we can encode all the things like numbers and arithmetic. We will build a few basic encodings from scratch in the following sections. Besides numbers and arithmetic, the most interesting thing is how to represent recursive functions in lambda calculus. That is not straightforward, considering there are only anonymous (nameless) functions.
Although lambda cannot internally introduce names for terms or functions, we can do it in our exposition to simplify the notation. We will denote specific lambda terms (usually combinators) with uppercase letters. For instance, the identity function is denoted by :
We will also introduce one more convention for writing lambda terms. Each function defined by abstraction is unary. However, we can have functions of higher arity due to currying. For instance, a function of two arguments is defined by . To simplify this notation, we often group all the arguments under a single . For example, we replace with .
Booleans
We start with Boolean values and operations. Since Boolean values often appear in conditional expressions, we model them so that they work directly as the if-then-else expression. The if-then-else expression consists of a condition (i.e., a term whose value is either to true or false) and two expressions to be evaluated depending on the result of the condition. Thus we model Boolean values as binary projection functions:
Boolean values are functions of two arguments returning one of them. More precisely, representing "true" returns the first argument, whereas representing "false" returns the second. Consequently, if we have a lambda term , where is a condition term that is evaluated either to or , we get the behavior of the if-then-else expression:
Considering the encoding of Boolean values, it is easy to encode basic Boolean operations conjunction , , and negation . Recall that the conjunction of two arguments is false if the is false. If is true, the result of is just . This can be modeled as follows:
Let us check that the encoding works correctly. For an arbitrary lambda term , we have
Similarly, we can define disjunction that is true if the first argument is true and the value of the second argument otherwise.
For an arbitrary lambda term , we have
Finally, it is straightforward to encode negation:
Numbers and arithmetic
To encode numbers and arithmetic operations, we use the so-called Church numerals. They encode a natural number as a binary function applying the first argument -times to the second.
Note that . It is usual in programming that a single value might have two different meanings depending on its context. For instance, the value can represent the number or the uppercase letter .
Using the above encodings for numbers, one can easily define the successor function .
The input number in just applies -times to , i.e., this expression is equivalent to . Next, we add one more application of , i.e., . Let us compute the successor of :
Once we have the successor function , we can define the addition because we can get the result of by applying -times to . Thus we can model addition by the term:
For example, is computed as follows:
Multiplication of two numbers can be represented by the term
Since is a numeral, applying it to and results in a term applying -times to , i.e., . Analogously, each application of results in -many applications of . Altogether, we get many applications of to . Let us see an example:
The multiplication term can be further simplified to
We can remove the variable because and behave identically. To see that, consider an abstraction for a term . It defines a function of an argument applying to . If we apply it to any expression , we obtain . Thus we can directly replace just with the term . This simplification is known as -reduction. Utilizing -reduction to
we end up with .
In the following section on recursive functions, we will need the predecessor function on natural numbers ( is mapped to ). Unfortunately, encoding it is more complex than encoding the successor function . We first need to introduce an encoding of pairs. Given two terms , we define the pair consisting of and as follows:
Thus the pair is a function of argument that is applied to the pair's components. Note that this is exactly how we encoded a 2D point as a function closure in Lecture 3.
(define (point x y)
(lambda (m) (m x y)))
Given a pair, we can access its components by applying it to the projection functions and .
To create the predecessor function, we need to find a term such that if we apply it -times to another term, it gets evaluated to . We first define a function mapping a pair to .
The function takes a pair , extracts its first component , computes its successor , and returns a pair consisting of the successor and the first component. The reason why is important is that when we apply it -times to the pair , we get the pair .
Consequently, we can define the predecessor function as the function applying -times to and extracting the second component:
Zero test
To implement a recursive function, we need a condition when to stop the recursion. In the example we will discuss in the next section, we test whether a given number is zero. To check whether a given number is zero, recall that is just the projection to the second argument. Consequently, for each lambda term . Thus we need to find a term such that when applied to at least once, it returns . Such a term is the constant function always returning , i.e., . Altogether, we define
So we have
and for
Recursive functions
Finally, we are getting to the most exciting construction of how to encode recursive functions if we have only anonymous functions in lambda calculus. Recursive functions can be defined through the so-called -combinator.
Note that it is an expanded version of the term representing the infinite loop as applies a further argument to . Let us see what happens if we apply to any lambda term .
We obtained a term denoted , which is something like a recursive version of . To see that, note that can produce as many applications of as we wish:
Whether this process stops or not depends on . If it contains a leftmost outermost redex, whose reduction discards , the computation following the normal order terminates. However, the applicative order need not terminate as it can indefinitely reduce '.
To see an example, we implement a function computing for a given natural number the sum . We can define the function recursively using the fact . In Racket, we would implement such a function as follows:
(define (sum-to n)
(if (= n 0)
0
(+ n (sum-to (- n 1)))))
We must apply the -combinator in lambda calculus instead. We define a function corresponding to the body of sum-to
, but we replace the recursive call with a call of a function given as an argument.
The function tests if is zero by . If it is zero, evaluates to , which consequently returns its first argument, i.e., . Otherwise, evaluates to , which returns . This expression is nothing else than .
Now it remains to turn into its recursive version by the -combinator. We define
Let us see if it correctly sums up all the natural numbers to . Recall that .
Note that the expression in the last line contains several redexes. The one hidden in can be reduced forever. On the other hand, if we reduce , the term disappears. This is why the normal order terminates in this computation, whereas the applicative one does not.
If you are interested in the history, check the entry on Alonzo Church in the Stanford Encyclopedia of Philosophy. ↩︎
Benjamin C. Pierce: Types and programming languages. MIT Press 2002, ISBN 978-0-262-16209-8, pp. I-XXI, 1-623. ↩︎
Note that the terminology on bound and free occurrences is completely analogous to the terminology used in first-order logic, where variables can be bound by quantifiers instead of . ↩︎
We will abuse the notation and write even if several -reductions are needed to get from to . ↩︎