[fpc-devel] Fast ascii upper/lowercase

Daniël Mantione daniel at deadlock.et.tudelft.nl
Sat Jun 11 20:38:23 CEST 2005



Op Sat, 11 Jun 2005, schreef Christian Iversen:

>
> With all the discussion about speed of lowercase/uppercase recently, I thought
> I'd chip in. I don't have the time to actually implement and test this, but I
> hope somebody else will.
>
> This semi-pseudocode should be a bit faster than what is currently done:
>
> function lowercase(const value: string): pchar;
> var
>   X,L,C: Integer;
>   B: Char;
> begin
>   C := length-of(value);
>   set-length(result, C);
>   for X := 0 to C/4-1 do
>   begin
>     L = PIntegers(value)[X];
>     // Char 1/4
>     B := char(L and $FF);
>     if (B in ['A'..'Z']) then
>       result[x shl 2] := B or $20 else
>       result[x shl 2] := B;
>     // Char 2/4
>     B := char((L shr 8) and $FF);
>     if (B in ['A'..'Z']) then
>       result[(x shl 2)+1] := B or $20 else
>       result[(x shl 2)+1] := B;
>     ..etc..
>   end;
>   ..fix remainging 0-3 chars..
> end;
>
> Basically, a simple loop unroll, with the twist that the fetch is collapsed to
> every 4th iteration. This, combined with the dynamic scheduling of modern
> CPUs, should increase the speed considerably. Since each iteration is very
> collapsable in terms of data-dependencies, this could work out to be quite
> fast.
>
> I could be wrong, of course :-)
>
> Does somebody want to try this?

I expect it would be hard to get any speed advantage using this approach,
since the overhead of expressions like char((L shr 8) and $FF) is quite
severe.

The real penalty is the expression in ['A'..'Z']. This expression needs
two branches which both cannot be predicted by the processer. They depend
on the input, the processors branch caches don't help here. You get a
pipeline stall in 50% of the cases. This is the reason why a branch-less
solution like I posted is so much faster.

Of course, working with dwords can be a huge advantage, but only if you
can process data in parallel or if you are memory bound. At least, this is
what I expect and theory and practise don't allways agree.

Daniël





More information about the fpc-devel mailing list