[fpc-pascal] Re: interested in building a library for functions?

Sat Feb 26 16:27:11 CET 2011

In our previous episode, Angel Montesinos said:
> > Have a look at FPC package symbolic. It sounds like roughly the same kind of
> > soup.  (parsing, differentiating, fast eval)
> 
> Yes, roughly. Because if I have read it correctly, the package 
> evaluates things ultimately by calls to the system functions like sin, 
> arctan, *, etc. This is very different than building on the fly the 
> minimal object code for evaluating the function, and much slower.

I know. I've played with going a step inbetween (the OS independant
evaluator, my current approach, and your approach, code generation), namely
a CPU dependant codegenerator.

I never really started it, but one of the things I noticed is that the CPU
instructions for certain goniometric functions don't always accept the full
domain. How have you dealt with that in your code generation (or am I wrong,
it is over 7 years ago that I made this)

> Also, for derivatives, Symbolic builds first the expression that is 
> the derivative of the initial expression. When the derivative is 
> called, it evaluates the new function.
> 
> Automatic differentiation does not compute the derivative function, 
> but at each step of the evaluation of the RPN it evaluates the 
> function and the first (or the function, the first and the second) 
> derivative of that step. For instance, if the RPN step is  "cosine", 
> the Symbolic approach is to calculate first  -sine (and for the second 
> derivative, to calculate then _again_ the cosine, etc.).

> But in my 
> approach, instead of computing the cosine, it is computed the sincos 
> pair by using the x87 instruction of AMD or Intel. The two values are 
> used directly for computing the first and second derivatives: thus at 
> most _one_ call to an expensive FPU instruction.

I don't fully understand this. In my approach I calculate d(cos(x))/dx =
-sin(x) and then  d(-sin(x)/dx = -cos(x)  and only then use one floating
-  point instruction to calculate the cosinus.

Or do you mean for stuff like Taylor where each term is a further
differentation?

> In my approach, if a call of a RPN step costs  n  clock cycles, then 
> the call of the first derivative of that function takes a cost  c*n 
> clock cycles, where  c  depends only on the function called (that is, 
> whether it is  *  or arctan, etc). And the same (with another greater 
> coefficient) with the second derivative. Suppose that c = 3 for the 
> operator  *  and one has to compute the value of  x*y*z*w,  where 
> these symbols denote variables or expressions on which the formula 
> depends. Then Symbolic will compute

The idea with symbolic was mainly to quickly draw graphs. Since the
evaluation is then done thousands of times, the symbolic manipulation is not
really optimized.

The whole project was a combination of needing something to draw graphs of
computed values quick (and derivatives were a requirement). I invested a bit
more time because I wanted to play a bit with syntax tree manipulation, just
as an exercise.

The evaluation speed was fine for my purposes then, considering the chosen
technique (OS independant RPN interpreter). But I'd be interested to see how
all this pans out.

Actually the problems I have with symbolic is the lack of custom functions
and boolean logic, rather than speed.  I have some half forgotten CVS with
an half finished attempt at adding that.

Does your engine do boolean logic?

> x'*y*z*w + x*y'*z*w + x*y*z'*w + x*y*z*w',
> 
> that is roughly four times the computation of the function. Automatic 
> differentiation will take only three (= c) times more (this should be 
> taken only as a sloppy explanation). Thus, the more complex the 
> formula the better relative performance.

I planned to eliminate this using a common subexpression tactic on the final
result. But I never got around to it (and only understood later that it is
relatively hard). Maybe indeed the derivation routine should be able to do
prepatory work for this.

> On the other hand, porting my library to other architectures will be 
> possible if they have an FPU based on a stack of perhaps more than six 
> registers, a collection of instructions that move numbers between them 
> and between them and RAM, and a minimal set of instructions for each 
> of the basic mathematical functions of Pascal:  +,-,*,/,sqr, sqrt, 
> sin, cos, tan, arctan, abs, exp, log. But I have no idea about those 
> architectures.

Hardware FPU stack is afaik a x86 oddity. Most other FPUs are register
based. But there is no one true solution. I'd take an evaluator as baseline
(for unknown architectures) and then plug in better solutions for specific
CPus. Be careful with border conditions though.

The evaluator can then also be used in case e.g. the codegeneration has a
problem, even on a known arch.