[fpc-pascal] Feature Announcement: Function References and Anonymous Functions

Thu May 26 21:47:06 CEST 2022

Dear Free Pascal Community,

The Free Pascal Developer team is pleased to finally announce the 
addition of a long awaited feature, though to be precise it's two 
different, but very much related features: Function References and 
Anonymous Functions. These two features can be used independantly of 
each other, but their greatest power they unfold when used together.

These features are based on the work by Blaise.ru, so thank you very 
much and I hope you're doing well considering the current situation.

In the following we'll highlight both features separately and then we'll 
take a look at using them together.

= Function References =

Function References (also applicable names are Procedure References and 
Routine References, in the following only Function References will be 
used) are types that can take a function (or procedure or routine), 
method, function variable (or procedure variable or routine variable), 
method variable, nested function (or nested procedure or nested routine) 
or an anonymous function (or anonymous procedure or anonymous routine) 
as a value. The function reference can then be used to call the provided 
function just like other similar routine pointer types. In contrast to 
these other types nearly all function-like constructs can be assigned to 
it (the only exception are nested function variables (or nested 
procedure variables or nested routine variables), more about that later 
on) and then used or stored.

Function references are enabled with the modeswitch FUNCTIONREFERENCES 
(the following examples will assume that this modeswitch is provided).

A function reference is declared as follows:

REFERENCE TO FUNCTION|PROCEDURE [(argumentlist)][: resulttype;] 
[directives;]

Examples:

=== code begin ===

type
   TProcLongInt = reference to procedure(aArg: LongInt); stdcall;
   TFuncTObject = reference to function(aArg: TObject): TObject;

=== code end ===

Like other function pointer types function references can also be 
declared as generic:

=== code begin ===

type
   generic TGenericProc<T> = reference to procedure(aArg: T);

=== code end ===

As you can see, once function references are enabled you can't use the 
identifier "REFERENCE" as part of an alias declaration without using "&":

=== code begin ===

type
   someref = reference; // will fail
   someref = &reference; // correct fix

var
   somevar: reference; // will fail
   somevar: &reference; // correct fix

=== code end ===

A function reference variable can then be called like any other function 
pointer type:

=== code begin ===

var
   p: TProcLongInt;
begin
   p := @SomeLongIntProc;
   p(42);
end.

=== code end ===

If a function reference has no parameters then you need to use "()" 
nevertheless in the FPC/ObjFPC modes like for other function pointer types:

=== code begin ===

type
   TProc = reference to procedure;
var
   p: TProc;
begin
   p := @SomeProcedure;
   p(); // required
   p; // this can be used e.g. in mode Delphi
end.

=== code end ===

Like other function pointer types they can also be declared anonymously 
as part of a variable, field declaration (but not as part of a paramater 
declaration):

=== code begin ===

var
   f: reference to function: LongInt;

type
   TTest = class
     f: reference to procedure;
   end;

=== code end ===

They get their great power from a point that is for once *not* 
considered an implementation detail: function references are in fact 
internally declared as reference counted interfaces with a single 
Invoke() method of the provided signature. So the above examples are in 
fact declared like this:

=== code begin ===

type
   TProcLongInt = interface(IInterface)
     procedure Invoke(aArg: LongInt); stdcall; overload;
   end;

   TFuncTObject = interface(IInterface)
     procedure Invoke(aArg: TObject): TObject; overload;
   end;

   generic TGenericProc<T> = interface(IInterface)
     procedure Invoke(aArg: T); overload;
   end;

=== code end ===

This has a few implications:
- in the RTTI this will appear like a normal interface
- it reacts to the $M directive like a normal interface
- it is a managed type
- it has *no* valid GUID
- it can be implemented by a class
- it can be inherited from

Especially the last two points are important.

That the interface can be implemented means that much more functionality 
and state can be added to a function reference:

=== code begin ===

type
   TFunc = reference to function: LongInt;

   TSomeImpl = class(TInterfacedObject, TFunc)
     f: LongInt;
     function Invoke: LongInt;
   end;

function TSomeImpl.Invoke: LongInt;
begin
   Result := f;
end;

var
   t: TSomeImpl;
   f: TFunc;
begin
   t := TSomeImpl.Create;
   f := t;
   Writeln(f()); // will write 0
   t.f := 42;
   Writeln(f()); // will write 42
   f := Nil; // the usual warnings about mixing classes and interface apply!
end.

=== code end ===

As function references don't have valid GUIDs you can't however use 
QueryInterface() or the as-operator to retrieve it. Using the low level 
interface related functions of TObject however will work.

An interface that inherits from a function reference is still considered 
invokable by the compiler, so it can still be used like an ordinary 
function reference could, but you can also add additional methods 
including overloads for Invoke itself:

=== code begin ===

type
   TTest = reference to procedure(aArg: TObject);

   TTestEx = interface(TTest)
     function Invoke: TObject; overload;
   end;

var
   f: TTestEx;
   o: TObject;
begin
   f := TSomeImplEx.Create;
   o := f();
   f(o);
end.

=== code end ===

This is for example described by Stefan Glienke on his blog ( 
https://delphisorcery.blogspot.com/2015/06/anonymous-method-overloading.html 
). His linked example won't work as-is however due to missing 
functionality in Rtti.TValue.

As mentioned initially you can assign a nested function to a function 
reference, but not a nested function variable. There is no real 
technical reason for this, but it's instead a design choice based on how 
function references are assumed to behave: they are assumed to be valid 
beyond their scope (this will become clearer when combined with 
anonymous functions in the third part), so they can for example be 
returned from a function or stored in some class instance and can still 
be considered valid. However a nested function variable is no longer 
useable once the function frame it was retrieved has been left (for a 
nested function the compiler can safely convert it in a way that this is 
no problem, but for a nested function variable it simply can't).
One could argue that the same is true for method pointers and method 
variables as they aren't callable anymore once their class instance is 
freed however these are much more common in the Object Pascal world 
while nested function variables are very seldom used, thus the dangers 
of the former are much more apparent than the dangers of the later.
For this reason assigning nested function variables to function 
references is prohibited.

= Anonymous Functions =

Anonymous Functions (or Anonymous Procedures or Anonymous Routines, in 
the following simply Anonymous Functions) are routines that have no name 
associated with them and are declared in the middle of a code block (for 
example on the right side of an expression or as a parameter for a 
function call). However they can just as well be called directly like a 
nested function (or nested procedure or nested routine) would.

Anonymous functions are enabled with the modeswitch ANONYMOUSFUNCTIONS 
(the following examples will assume that this modeswitch is provided).

An anonymous function is declared as follows:

FUNCTION|PROCEDURE [(argumentlist)][[resultname]: resulttype;] [directives;]
[[VAR|TYPE|CONST section]|[nested routine]]*
BEGIN
[STATEMENTS]
END

As can be seen an anonymous function looks like a regular function (or 
procedure or routine) with the most important differences being that it 
does not have a name and that it isn't terminated by a semicolon 
(because it's essentially an expression). Because it doesn't have a name 
for modes that don't have the implicit RESULT variable it's allowed to 
explicitely name the result variable (even in modes that do have the 
RESULT variable) like is the case with operator overloads.

It's possible to directly call an anonymous function in which case it 
essentially behaves like a nested function.

Like nested functions anonymous functions have access to the symbols 
(variables, functions, etc.) of the surrounding scope including Self if 
the surrounding scope is a method. Accessing such a symbol is named 
“capturing” and is one of the core concepts of anonymous functions.

Their main use however is when assigning them to one of the various 
function pointer types: function variables, method variables, nested 
function variables and function references. However not every anonymous 
function is assignable to every function pointer type as it depends on 
which symbols (if any) are captured from the surrounding scope. Unlike 
for non-anonymous function or method identifiers this assignment is 
however *always* done without the "@"-operator, because aside from 
calling one can't do much else with anonymous functions.
An anonymous function that captures no symbols at all (except for global 
symbols or static symbols) is assignable to all four function pointer 
types. If the anonymous function captures Self then it is no longer 
assignable to function variables, but still to the other three. And if 
it captures any local symbol then it's only assignable to nested 
function variables or function references.
In case of function variables, method variables and nested function 
variables anonymous functions behave just like their non-anonymous 
counterparts. The differences appear when they're used with function 
references which will be highlighted in the next part.

But first some examples:

=== code begin ===

type
   TFunc = function: LongInt;

var
   p: TProcedure;
   f: TFunc;
   n: TNotifyEvent;
begin
   procedure(const aArg: String)
   begin
     Writeln(aArg);
   end('Hello World');

   p := procedure
        begin
          Writeln('Foobar');
        end;
   p();

   n := procedure(aSender: TObject);
        begin
          Writeln(HexStr(Pointer(aSender));
        end;
   n(Nil);

   f := function MyRes : LongInt;
        begin
          MyRes := 42;
        end;
   Writeln(f());
end.

=== code end ===

= Anonymous Functions References =

As mentioned above the greatest power of the two new features comes when 
the two are combined: like a nested function an anonymous function can 
access symbols from the surrounding scope, however unlike for nested 
functions a anonymous function that has been assigned to a function 
reference can *leave* the scope where it has been declared in and it 
will then take the captured symbols with it.
For this purpose any variable or parameter that is captured by an 
anonymous function will become part of the implicitely created object 
instance (which shall be considered opaque) that will be assigned to the 
function reference instead of belonging to the original function. The 
original function will then reference these symbols using the object 
instance instead of its stack frame. This has the implication that 
changes to the symobls will be reflected in all anonymous function that 
capture that symbol.

For example:

=== code begin ===

type
   TProc = reference to procedure;

procedure Test;
var
   i: LongInt;
   p: TProc;
begin
   i := 42;
   p := procedure
        begin
          Writeln(i);
        end;

   p(); // will print 42

   i := 21;

   p(); // will print 21
end;

=== code end ===

Changes will those also be persistent across calls and different 
anonymous functions as long as they capture the same symbols:

=== code begin ===

type
   TProc = reference to procedure;

procedure Test;
var
   i: LongInt;
   p1, p2: TProc;
begin
   i := 42;
   p1 := procedure
         begin
           Writeln(i);
             i := i * 2;
         end;

   p1(); // will print 42

   p2 := procedure
         begin
           Writeln(i);
         end;

   p1(); // will print 84
   p2(); // will print 168
end;

=== code end ===

The lifetime of managed types captured by anonymous function references 
will be handled accordingly (they will stay alive as long as at least 
one anonymous function that has captured them is alive as well), however 
special care needs to be taken regarding manual memory management:

=== code begin ===

type
   TProc = reference to procedure;

function Test: TProc;
var
   o: TObject;
begin
   o := TObject.Create;
   Result := procedure
             begin
               Writeln(o.ClassName);
             end;
   o.Free;
end;

=== code end ===

Calling the function reference returned by Test will essentially result 
in use-after-free. And not freeing “o” at all will result in a memory leak.

= Compatibility =

The two features are by and large compatible to Delphi's Anonymous 
Methods. However FPC allows the assignment of anonymous functions to 
various function pointer types while Delphi restricts them to function 
references.
Also FPC handles the assignment of function, method and nested function 
variables to function variables slightly differently. Take the following 
code:

=== code begin ===

procedure Foo;
begin
   Writeln('Foo');
end;

procedure Bar;
begin
   Writeln('Bar');
end;

procedure Test;
var
   p: reference to procedure;
   p2: procedure;
begin
   p2 := Foo;
   p := p2;
   p();
   p2 := Bar;
   p();
end;

=== code end ===

Delphi essentially generates the following:

=== code begin ===

procedure Test;
var
   p: reference to procedure;
   p2: procedure;
begin
   p2 := Foo;
   p := procedure
        begin
          p2();
        end;
   p();
   p2 := Bar;
   p();
end;

=== code end ===

This will result in the following output:

=== output begin ===

Foo
Bar

=== output end ===

However FPC will generate the following:

=== code begin ===

procedure Test;
var
   p: reference to procedure;
   p2, tmp: procedure;
begin
   p2 := Foo;
   tmp := p2;
   p := procedure
        begin
          tmp();
        end;
   p();
   p2 := Bar;
   p();
end;

=== code end ===

This will result in the following output:

=== output begin ===

Foo
Foo

=== output end ===

This is more consistent with assignments of other function pointer types 
to function pointer types.

The Function References feature is available on all platforms which have 
the Classes feature available (so essentially everything except AVR) and 
Anonymous Functions themselves are available on all platforms (excluding 
the assignment to function references on platforms where these are 
missing). Yes, this includes platform like DOS where directives like 
“far” and “near” are handled accordingly (which means that these need to 
be compatible as well when assigning).

As these two features are rather complicated there might still be a huge 
bundle of bugs lurking around so I ask you to test them to year heart's 
content and report found bugs to the issues on GitLab so that we can fix 
as many of them as possible before the next major version (which is not 
yet planned, so don't worry ;) ).

Further RTL enhancements like the declaration of TProc<> or the addition 
of a TThread.Queue() that takes a function reference will come in the 
near future now that the basics on the compiler side are done. Maybe we 
can now also tackle ports of libraries like Spring4D and 
OmniThreadLibrary. There's also the idea to introduce a syntax to 
control whether symbols are captured by-reference (as currently) or 
by-value.

Enjoy!

Regards,
Sven