[fpc-devel] Patch to speed up Uppercase/Lowercase functions

Daniël Mantione daniel at deadlock.et.tudelft.nl
Sun Jun 12 08:46:57 CEST 2005



Op Sat, 11 Jun 2005, schreef L505:

> http://dennishomepage.gugs-cats.dk/LowerCaseChallenge.htm
>
> LowerCaseShaPas2_c
> This one here is in Pascal, using GOTO and LABEL which consistently work fast on
> both -Og and -OG
> But no one wants to maintain a GOTO and a LABEL..
>
> [LowerCaseShaPas2_c] was slightly slower than [lowercase 6 ] (second fastest)
> in -OG  mode
> [LowerCaseShaPas2_c] was slightly faster than [lowercase 9] (still second
> fastest)  in -Og  mode .. so it's more consistent across compiler options it
> seemed
>
> So maybe [lowercase 6 ] result should be submitted to fastcode to be tested?
>
> Also, if no one wants to use the assembly functions and GOTO/LABEL functions in
> the RTL due to code bloat/maintenance, we could always offer an optional unit
> where people could call the fast functions only if they needed them badly.
> Just like how fastcode does, external from the VCL.

Hmmm... They managed to do the 4 bytes in parallel. I can figure out how
it works, but it is interresting and should be fast.

Replace the loop by:

  repeat
    if exitcondition then
      break;
  until false;

... this will generate exactly the same code, but is more according to the
rules of art.

Next, make it 64-bit safe, i.e. change cardinal(p) to ptruint(p). There
are also potential speed improvements, i.e.: c2:=not(c1) and $80808080.

If this is done it can be included in the sysutils unit.

It should also be kept in mind that this code assumes 32-bit, even though
it'll run on 64-bit, it won't be optimal on 64-bit (but this faster than
byte per byte I guess).

If you want to submit assembler routines (I think that LowerCaseSha2 trick
is a good basis to build one), take the following guidelines in mind:

* It should be worthwhile to use assembler, i.e. if you don't get more
  than 10% speed gain it isn't worth it; it's better to wait until the
  compiler generates better code.
* Don't use CPU-specific optimizations. I.e. code will have to run all
  kinds of machines, Pentium-4 or Athlon specific optimizations aren't a
  good idea.

Do it this way:

* At the top of the implementation section, add {$i i386.inc}.
* In i386.inc add the assembler version and do a {$define have_lowercase}
* Put the Pascal implementation between a {$ifndef have_lowercase} and
  {$endif}

This way the impact on the maintainability and portability is rather low.

Daniël





More information about the fpc-devel mailing list