Lambda expressions is a hot topic these days. Lambda expressions play a prominent role in Microsoft's LINQ/DLINQ technology (at the time of writing in preview). Since this technology from MS is language independent, it might also turn up in Delphi in the future, as indicated in a blog posting by Danny Thorpe (until recently the Delphi compiler architect). Borland already has a product supporting lambda expressions; Borland's ECO product, including its predecessor Bold, has supported OCL (Object Constraint Language, a part of UML) for many years. OCL is a functional/declarative language supporting lambda expressions.
(Many "modern" programming language features like lambda expressions and garbage collection are actually quite old ideas, implemented in one of the oldest programming languages around, LISP.)
A lambda expression can be seen simply as a way of defining a
function.
In Delphi, you can define a function which adds 1 to every number
you pass to it like this:
Function Successor(const inputNumber: Integer): Integer;
Begin
Result:=inputNumber + 1;
End;
Many are familiar with this mathematical notation for the same
function:
Successor(x) = x + 1
If that looks a bit unfamiliar to you, try 'f(x)=x+1' instead. A
mathematician wouldn't use such a long name as 'Successor'. When a
mathematician is to name a function, (s)he names the function 'f', unless
that name already has been taken earlier in the same theorem, lecture or
article or whatever is the relevant context. If 'f' is already taken, (s)he
considers the next one, 'g' (or adds some more molecules to the 'f', turning it into 'f' with a hat or bar above it, or a mark to one of the sides.) . When running out of available letters, a
mathematician prefers to switch to another alphabet or another font, before
considering using longer names! Obviously, non-mathematicians suspect
that the common usage of single-letter (and typically case sensitive) names in
mathematics is an attempt at making maths look more cryptic than
it really is (also notice all the Greek letters!). ;-)
History
In their famous work 'Principia Mathematicia', Bertrand
Russel and Alfred N. Whitehead used an even more compact notation
for defining functions, making our Successor function look like this:
^
x + 1
(The ^ (circumflex) is supposed to be on top of the x, this
expression is to be read as "x hat plus one".)
So using this notation, we can define a function without even
bothering to give it a name! However, this notation never became popular.
Obviously, mathematicians have always been interested in
calculations. But only relatively recently the process of calculations
itself came under serious study by mathematicians. The mathematician Alonzo
Church wanted to define a foundation for mathematics, and needed a way to
define what a computable function (computation/algorithm) is. In the 1930's,
he (with input from some of his students) invented the Lambda
Calculus for this purpose. (Alan Turing invented the
computationally "equivalent", but rather different Turing machine around the
same time. It was actually a student of Church, Kleene (the inventor of regular
expressions) that coined the term "Turing Machine" for
denoting Turing's famous automata construct))
According to [Barendregt1997], Church wanted to use the "hat
notation" used in Principia Mathematica to define our Successor function
like this:
^
x | x + 1
(The "hat" is supposed to be above the first x)
However, according to [Barendregt1997], the typesetter Church used to publish the first book about
the lambda calculus [Church1941] wasn't able to fulfill Church's wish of
having the "hat" on top of the x (not very hard to believe!), so instead the
typesetter wrote it as:
^x | x + 1
Then (again according to [Barendregt1997]), another typesetter
misunderstood this notation, and changed it into this notation:
λ x | x + 1
So according to [Barendregt1997], the reason the Greek letter
'λ' (lambda) was used in Church's book (and used forever after) is due
to a combination of a technical limitation and plain old misunderstanding!
This is a nice story, but it is probably a myth. According to the article "Types in Logic and Mathematics before 1940":
"J. P. Seldin informed us that he had asked Church about it in 1982, and that Church had answered that there was no particular reason for choosing λ, that some letter was needed and that λ happened to have been chosen."
Please note that I'm "cheating" a bit, I'm using '|' to
separate the parameter from the body of the expression instead of the dot
notation that Church originally used (and which is still in common use), because the original
(hundred years old) dot-notation looks unnecessary confusing to today's
OO programmers)
In C# 3, it seems that the syntax will be this:
(x) => x + 1
which, when there's only a single parameter, can be
abbreviated to:
x => x + 1
The essence for us is that a lambda expression is a very compact
way of defining a function.
Higher order functions
The term 'lambda expression' implies that we're working with
something that is an expression, as opposed to a statement. An expression
results in some kind of value when evaluated. Values are the kind of
things we store in variables, in fields in classes, send as parameters etc.
Notice that in Delphi, events are actually properties, what distinguish
them from "ordinary" properties is that events have method references as
values.
In Delphi's TList class, there's a powerful method called Sort,
which takes a single parameter. This parameter is a function, which tells
the Sort method which of any two given items are the "largest" (or whether
they are equally "large"). This makes the Sort method very
flexible/powerful, it doesn't care about what kind of items are in the list,
still it is able to sort them by calling the provided compare function when
it needs to compare any two of the items in the list.
So a function taking other functions as parameters can be very powerful.
(A function taking functions as parameters is usually called a 'higher order
function')
Imagine that you want to call a higher order function with a very simple
function as an argument. Let's say you have a list of
objects representing programming languages called
ProgrammingLanguageList of type TProgrammingLanguageList, and that it
contains objects of the type TProgrammingLanguage and assume this latter
class has a Name and a Version property. Assume that the
ProgrammingLanguages list has a higher order method called Find, which takes
one parameter, a boolean function which tells the list whether a given
item is an item it should "find". (This Find method is called 'Where'
in LINQ and 'Select' in OCL.)
That is, we have something like this:
TProgrammingLanugage = class
Property Name: String ...;
Property Version: Integer ...;
End;
TRecognizerFunction = function(const p: TProgrammingLanugage): Boolean;
TProgrammingLanuageList = class(some kind of list class)
Function Find(const i_recognizerFunction: TRecognizerFunction): TProgrammingLanuageList;
End;
Assume you want to get the TProgrammingLanguage objects in this
list having Name='Delphi'.
You can use this to find the Delphi languages in this list:
l_delphiLanguages:=ProgrammingLanguages.Find(MyFunctionWhichRecognizesDelphi);
and define MyFunctionWhichRecognizesDelphi as:
Function MyFunctionWhichRecognizesDelphi(const p: TProgrammingLanguage): Boolean;
Begin
if p.Name='Delphi' then
Result:=True
else
Result:=False;
End;
or, equivalently, in a more compact notation:
Function MyFunctionWhichRecognizesDelphi(const p: TProgrammingLanguage): Boolean;
Begin
Result:= (p.Name='Delphi');
End;
Having to write this MyFunctionWhichRecognizesDelphi is quite
boring just to find a given item. If you could define this function "inline"
(lambda expression), you wouldn't need the "standalone" function, instead
you could write (using the C# syntax for the lambda expression):
l_delphiLanguages:=ProgrammingLanguages.Find(p => p.Name='Delphi');
This is much more elegant.
But there's more!
Imagine that the above expression is within a larger procedure, and
that you only want to find a given version of Delphi, and that the version
you want to find is stored in a local variable or a parameter.
Procedure ....
Var l_myDelphiVersion: Integer;
l_specificDelphiVersion: TProgrammingLanguageList;
Begin
l_myDelphiVersion:=...somehow decide which Delphi version to find...;
l_specificDelphiVersion:=ProgrammingLanguages.Find(p =>(( p.Name='Delphi') and (p.Version=l_myDelphiVersion)));
End;
This is nice and easy.
But how would you translate the above into a version not having the
lambda expression? Remember that the Find function expects a function of a
single parameter, how would it get at l_myDelphiVersion?
One way would be to assign l_myDelphiVersion to a global variable
g_myDelphiVersion just before the Find call, and let
MyFunctionWhichRecognizesDelphi look at this global variable in
addition to its provided p parameter. However, that's obviously ugly, and
wouldn't work in a multithreaded situation.
With the lambda-expression, the compiler does that magic for us (in a
safe way). This is called 'Closure', taking a "snapshot" of the local
variables/environment needed within the lambda-expression and providing them
to the anonymous function when it gets called. A Delphi programmer has seen something similar, in that a method pointer needs to remember which object the method belongs to.
So lambda expressions makes it very simple to call higher order
functions, making such powerful functions much more common than they are
today. Probably the main reason they haven't popped up earlier in C# (and
similar languages) is that they aren't very practical/useful without
generics in statically typed languages
(In non-OO languages supporting higher order functions and
closures, one can simulate "object-based" programming by exploiting
closures to store the state of "objects" and pass an "object" around
by passing the higher order function (which "remembers" its "object state"
via the closure))
An OO programmer is actually very used to higher order functions. A
method/routine taking an object as a parameter generalizes higher order
functions because an object can have methods. Such an object can easily
store the necessary "extra parameters" (the closure) needed. (In fact, this
is what the C# 2.0 compiler does behind the scenes for anonymous methods
(lambdas), creating an anonymous class having this anonymous method, and
having fields to store the "extra parameters".)
But writing such classes are very boring/cumbersome, so boring
that Borland and others assume we wouldn't bother calling "higher
order" functions much, so they didn't provide us with those "higher
order" functions (using objects) in the first place. But with support
for lambda expressions, the compiler writes those boring classes for
us and with generics, library authors can write typesafe methods like 'Find' in a generic way.
Example with multiple lambda expressions
Assume that 'customers' is a list/collection of
customers, here's a C# 3 statement using 3 lambda expressions:
NameOfOsloCustomersSortedByName = customers.Where(c => c.City == "Oslo").OrderBy(c => c.name).Select(c => c.Name)
Notice the similarity to the following SQL query:
select c.Name from customers c where c.City = 'Oslo' order by c.Name
In the 'Object Constraint Language' (OCL), used in Borland's ECO
product, this expression would be:
customers->select(c | c.City = 'Oslo')->orderBy(c | c.Name)->collect(c | c.Name)
Imagine how boring it would be to explicitly create 3 standalone
functions (or more likely 3 standalone classes) in order to do this without
the lambda expressions:
customers.Where(MyOsloRecognizerFunction).OrderBy(MyFunctionWhichReturnsTheNameOfAGivenCustomer).Select(MyCityNameSelectorFunction)
And this is the simple case, when the "small" functions doesn't need any "extra" parameters, if a "manual snapshot/closure" of the relevant variables needs to be taken, there will be object creation and destruction and try-finallys inside that expression making it quite unreadable.
In addition, with the lambda-expressions, the compiler has more
headroom for optimization (because what you are trying to do is more
transparent/"obvious" to the compiler, the compiler doesn't need to analyze a lot of code in order to figure out the "essence" of what you are doing), including the possibility of sending the whole (or parts of or
slightly modified version of) the expression directly to a database engine.
For the latter, see Microsoft's DLINQ project.
Digression
I mentioned in the beginning of this article
that mathematicians tend to use names like 'f' and 'g' for function
names. When the function is so important that it deserves a "global"
name this "rule" is broken. Our innocent Successor function is
very important to number theorists. It is usually called 'S', and
sometimes even the "extraordinary" long 'Succ'! (which it also is called in
Pascal). Given the number 0, and the Successor function, we can build all
the integers. (As an example, 3 is "implemented/built" by S(S(S(0)))). And
given all the integers, we can easily build all the rational numbers. And
given all the rational numbers, we can build all the real numbers (somewhat
complicated though). So from two "objects", the symbol '0' and the Successor function, we can build/construct all the
numbers used in mathematics.
The famous mathematician Gvdel showed the world how to encode
statements into integers. For us programmers, encoding characters as integers and text as sequences of integers (bytes) is the most natural thing
in the world. When the Delphi compiler generates an executable for us, we
get a file. A file is a sequence of bytes (0's and 1's). A sequence of 0's
and 1's is an integer. Which means that any program we build is simply
one *huge* integer. "Real programmers" write their programs using
only 0 and the Successor function! ;-) (The slight drawback is that only the most tiny assembler-coded 'Hello World' program will use less digits (when using unary encoding) than the numbers of atoms in the universe.)
References
"Principia Mathematica", Bertrand Russell and Alfred North
Whitehead, 1910-1913.
"The Calculi of Lambda Conversion", Alonzo Church, 1941
"The impact of the lambda calculus in logic and computer science".
Henk Barendregt, 1997.
(Can be found via
http://citeseer.ist.psu.edu/ )
"Types in Logic and Mathematics before 1940", Kamareddine, Laan and Nederpelt. (Available via citeseer)
Microsoft's LINQ Project: http://msdn.microsoft.com/netframework/future/linq/
Danny Thorpe's Blog: http://blogs.borland.com/dcc
About the Author
Jarle Stabell lives in Oslo, Norway. He wrote his first computer programs on a Sinclair ZX-81. He doesn't know the lambada, but enjoys practicing Irish set and step dancing as well as Argentine tango when not sitting in front of a computer.