[fpc-devel] Re: Faster Implementation for IntToStr

Mon Sep 3 18:09:05 CEST 2012

----- Oorspronkelijk e-mail -----
> Van: "Graeme Geldenhuys" <graeme at geldenhuys.co.uk>
> Aan: fpc-devel at lists.freepascal.org
> Verzonden: Maandag 3 september 2012 12:42:11
> Onderwerp: Re: [fpc-devel] Re: Faster Implementation for IntToStr
> 
> On 03/09/12 10:19, Daniël Mantione wrote:
> >
> > Certainly, but the code used in that asm implementation is quite
> > out of
> > reach for compilers.
> 
> Certainly, but there is always scope for improvements right? eg:
> Delphi
> is well know for generating more optimised ASM than FPC. I'm by no
> means
> saying that it is easy, I'm just saying that in the long run it is
> easier to maintain Object Pascal code [in the case of IntToStr and
> similar functions], than ASM code. Not to forget the fact that such
> Object Pascal implementations are much more portable.

true, but... Delphi uses some hand-coded asm optimisation as well. There are "pure pascal" implementations to functions in the RTL as well as x86 and 64bit implementations.

That is for the RTL. As for the compiler itself, I've heard it was written in C(++), so usually a better fit to asm. :-)

Let's not forget by the way that a lot of much-used functionality was rewritten in asm in the fastcode project, later included by CodeGear and Delphi-compiled programs went faster from those versions on (methinks Delphi 2007).

On the other hand, the hand-optimised ASM code may have been fast(est) in that era, but with each new processor, other ways/opcodes/sequences may be faster. Anybody remember introduction of U/V pipes, L1 cache optimisation of data-access or even code-access (jmps etc), general stalling when using registers for O then I within x instructions... ? A compiler can choose to optimise those type of things in a more generic way, whereas when you have it in ASM, it is set in stone: that way and no compiler-optimisation. Either way, both have their uses and use cases. I've even seen implementations of much used functionality where on startup (x86 and Delphi), a cpuid query is done and according to certain compatibilities a set of functionality is "registered". Then they used wrapper inline functions that used the underlying record-of-function-pointers. All they needed to do then is optimise for certain combinations of cpuid flags and architectures to make it "as fast as it gets" on every PC of that architecture.

Needless to say it was not as maintainable as pascal versions, but they did get a speed-increase (even over Delphi compiler) on the time-critical and heavily used functionality, even with the indirect call in place.

kind regards,
Dimitri Smits