[fpc-devel] Fast ascii upper/lowercase
Daniël Mantione
daniel at deadlock.et.tudelft.nl
Sat Jun 11 20:38:23 CEST 2005
Op Sat, 11 Jun 2005, schreef Christian Iversen:
>
> With all the discussion about speed of lowercase/uppercase recently, I thought
> I'd chip in. I don't have the time to actually implement and test this, but I
> hope somebody else will.
>
> This semi-pseudocode should be a bit faster than what is currently done:
>
> function lowercase(const value: string): pchar;
> var
> X,L,C: Integer;
> B: Char;
> begin
> C := length-of(value);
> set-length(result, C);
> for X := 0 to C/4-1 do
> begin
> L = PIntegers(value)[X];
> // Char 1/4
> B := char(L and $FF);
> if (B in ['A'..'Z']) then
> result[x shl 2] := B or $20 else
> result[x shl 2] := B;
> // Char 2/4
> B := char((L shr 8) and $FF);
> if (B in ['A'..'Z']) then
> result[(x shl 2)+1] := B or $20 else
> result[(x shl 2)+1] := B;
> ..etc..
> end;
> ..fix remainging 0-3 chars..
> end;
>
> Basically, a simple loop unroll, with the twist that the fetch is collapsed to
> every 4th iteration. This, combined with the dynamic scheduling of modern
> CPUs, should increase the speed considerably. Since each iteration is very
> collapsable in terms of data-dependencies, this could work out to be quite
> fast.
>
> I could be wrong, of course :-)
>
> Does somebody want to try this?
I expect it would be hard to get any speed advantage using this approach,
since the overhead of expressions like char((L shr 8) and $FF) is quite
severe.
The real penalty is the expression in ['A'..'Z']. This expression needs
two branches which both cannot be predicted by the processer. They depend
on the input, the processors branch caches don't help here. You get a
pipeline stall in 50% of the cases. This is the reason why a branch-less
solution like I posted is so much faster.
Of course, working with dwords can be a huge advantage, but only if you
can process data in parallel or if you are memory bound. At least, this is
what I expect and theory and practise don't allways agree.
Daniël
More information about the fpc-devel
mailing list