[fpc-devel] Nested functions in numlib

Werner Pamler werner.pamler at freenet.de
Tue Apr 4 17:33:41 CEST 2017


Am 04.04.2017 um 03:28 schrieb Marco van de Voort:
> Did you test performance? Repeated access to parent frame in tight loops
> might be suboptimal. Could maybe be helped with some pointer work?

Right, I should have done that before asking...

Here are the results of a test running the original roof1r routine (A), 
the modified one using the nested function (B) and other modified one 
using a non-nested function but calling the version with the nested 
function (C). In each case, several functions are passed to the root 
finder which is called 5 million times, each call with a (reproducibly) 
different parameter:

f(x) = x
   (A)                           ORIGINAL version: 0.656s for 5000000 
runs (check: y = 0.00000000)
   (B)                             NESTED version: 0.703s for 5000000 
runs (7%)
   (C)    Global function calling nested function: 0.735s for 5000000 
runs (12%)

f(x) = x^2
                                 ORIGINAL version: 6.296s for 5000000 
runs (check: y = 0.00000000)
                                   NESTED version: 6.313s for 5000000 
runs (0%)
          Global function calling nested function: 6.546s for 5000000 
runs (4%)

f(x) = exp(x)
                                 ORIGINAL version: 6.734s for 5000000 
runs (check: y = 0.00000000)
                                   NESTED version: 6.703s for 5000000 
runs (0%)
          Global function calling nested function: 6.890s for 5000000 
runs (2%)

f(x) = arcsin(x)
                                 ORIGINAL version: 5.718s for 5000000 
runs (check: y = 0.00000000)
                                   NESTED version: 5.718s for 5000000 
runs (0%)
          Global function calling nested function: 5.937s for 5000000 
runs (4%)

f(x) = erf(x)
                                 ORIGINAL version: 6.391s for 5000000 
runs (check: y = 0.00000000)
                                   NESTED version: 6.422s for 5000000 
runs (0%)
          Global function calling nested function: 6.673s for 5000000 
runs (4%)

f(x) = gammaLn(x)
                                 ORIGINAL version: 15.260s for 5000000 
runs (check: y = 0.00000000)
                                   NESTED version: 15.142s for 5000000 
runs (-1%)
          Global function calling nested function: 15.426s for 5000000 
runs (1%)

I would interpret these results such that there are no dramatic 
slow-downs due to calling variant C. Variant B (nested funtion) is 
roughly the same speed as the original procedure.



More information about the fpc-devel mailing list