[fpc-devel] Registers allocation technique

Lag Programming lagprogramming at aim.com
Wed Oct 1 00:16:40 CEST 2014


   Thank you Sven. 
   Unfortunately I can't find a way to compress the information in few lines without losing important information. I've already done that mistake in the first mail, now I'll try to fully explain why I'm interested in the way these registers are allocated. Even if the mail is long I've tried to write it as readable as possible. I'll try to explain, step by step, with the help of 3 versions of a common function. I've chosen the "pos" function.
 

HERE IS THE ORIGINAL VERSION. YOU CAN SKIP THE PASCAL CODE.
{Elapsed time for this function in tests:6830ms.
# Var $result located in register r14
# Var i located in register rbx
# Var MaxLen located in register r13
# Var pc located in register r12
Temps allocated between rsp+16 and rsp+48}
Function PosORIGINAL(Const Substr : AnsiString; Const Source : AnsiString) : SizeInt;
var
  i,MaxLen : SizeInt;
  pc : pchar;
begin
  RESULT:=0;
  if Length(SubStr)>0 then
   begin
     MaxLen:=Length(source)-Length(SubStr);
     i:=0;
     pc:=@source[1];
     while (i<=MaxLen) do
      begin
        inc(i);
        if (SubStr[1]=pc^) and
           (CompareByte(Substr[1],pc^,Length(SubStr))=0) then
         begin
           RESULT:=i;
           exit;
         end;
        inc(pc);
      end;
   end;
end;


HERE COMES THE FIRST MODIFIED VERSION OF THE ORIGINAL FUNCTION. YOU CAN SKIP THE PASCAL CODE.
{Compared with PosORIGINAL, variable "MaxLen" has been removed, the "while" loop has been replaced with a "for" loop and "if Length(SubStr)>0 then" is replaced by "if Length(SubStr)<>0 then".
Elapsed time for this function in tests:6000ms. That's ~800ms faster than original function.
# Var $result located in register r12
# Var pc located in register r14
# Var I located in register rbx
Temps allocated between rsp+16 and rsp+48}
{Function PosFASTER(Const Substr:AnsiString; Const Source:AnsiString):SizeInt;
var pc:pchar;
    I:SIZEINT;
begin
if Length(SubStr)<>0 then
   begin
   pc:=@source[1];
   for I:=1 to (Length(source)-Length(SubStr)+1) do
       if (SubStr[1]=pc^)and(CompareByte(Substr[1],pc^,Length(SubStr))=0) then exit(I) else inc(pc);
   end;
RESULT:=0;
end;}


HERE COMES THE VERSION I'D LIKE TO RUN FASTEST. YOU CAN SKIP THE PASCAL CODE.
{Compared with function PosFASTER, variable "I:SIZEINT" has been removed and replaced with "result".
Elapsed time for this function in tests:6600ms. WHICH IS NOT WHAT I WANTED!!!
# Temps allocated between rsp+16 and rsp+40
# Var $result located in register r13
# Var pc located in register rbx}
Function PosWISHTORUNFASTEST(Const Substr:AnsiString; Const Source:AnsiString):SizeInt;
var pc:pchar;
begin
if Length(SubStr)<>0 then
   begin
   pc:=@source[1];
   for RESULT:=1 to (Length(source)-Length(SubStr)+1) do
       if (SubStr[1]=pc^)and(CompareByte(Substr[1],pc^,Length(SubStr))=0) then exit else inc(pc);
   end;
RESULT:=0;
end;


   CONCLUSION
   The second version(PosFASTER) runs noticeable faster(6000ms.) compared to the original version(6800ms). I wish the third version(PosWISHTORUNFASTEST) to run even faster(less than 6000ms in my tests). Why? Because by removing the variable "I:SIZEINT;" I consider I make the job easier to the compiler. Instead of running faster, the third version(PosWISHTORUNFASTEST) runs significantly slower(6600ms) than the second version(PosFASTER)(6000ms). There are two possible reasons for this slowdown:
   1)I hit the "if...then...else" anomaly again, but I strongly doubt this is the reason of the slowdown.
   2)Using r13("result") is slower than rbx("I"). If this is the case then I'd like a solution for these situations. In my mind I have three approaches:
   a) When function result is used in a "for" loop, try to assign a "better" register.
   b) If the first approach is not good, then give the programmer the chance to act as a child: I would like one of these registers for a variable, or I don't want a register in the r12-r15 range for the function result. I think C has a directive "register" for variables, I don't know much about it.
   c) If none of the above is ok, then please explain how can I identify the functions that can use the result in "for" loops without penalties regarding speed. For example I've noticed that if a function doesn't contain a call to another function or procedure than, a int64 result is returned in rax, not r12, which should be ok for me, see "Function Pos(c : Char; Const s : AnsiString) : SizeInt;".

   I'm not much interested in "while" loops as I'm interested in "for" ones. I hope I've explained better this time.
   Thank you for patience.


 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20140930/30dda848/attachment.html>


More information about the fpc-devel mailing list