[fpc-devel] Fwd: Re: An optimization suggestion for FPC

J. Gareth Moreton gareth at moreton-family.com
Sun Jun 28 14:18:46 CEST 2020


Thanks Jonas.  I'll see what I can put together.  A record with a single 
field is a bit of a special case, but one I'll keep in mind.  More than 
anything I'll have to study the disassembly to see what's happening, and 
if things are faster with primitive types simply because they're 
register variables (which are always faster than stack variables even on 
L1) or due to something else.

Gareth aka. Kit

On 28/06/2020 12:54, Jonas Maebe wrote:
> [accidentally only sent to Gareth initially]
>
> On 28/06/2020 12:31, J. Gareth Moreton wrote:
>> So someone reached out to me directly again asking for an FPC
>> optimisation.  Now I want to see if this is possible to optimise and
>> won't break something or be annoying specific.
> The general optimisation that would handle this is promoting individual
> record members into standalone variables when possible. FPC currently
> has no support at all for this.
>
> An optimisation that's a bit less general (although orthogonal in some
> cases, namely when you don't need to access individual members), is
> keeping records as a whole in a register. FPC already has support for
> this, see tstoreddef.is_intregable and tabstractvarsym.setregable.
>
> It does not get triggered here on x86-64 because of another involved
> method: the {$if} at the end of tabstractvarsym.is_regvar. That code
> prevents records from being kept in registers if they are written to on
> all architectures except for PowerPC and PowerPC64.
>
> The reason for this is that other supported architectures lack
> instructions to efficiently extract and insert bitfields from/into
> integer registers (although perhaps some of the newer x86-64 include
> them as part of an extension; and I think AArch64 and certain MIPS
> subarchs could also support it efficiently). This means that to perform
> an operation on a field of a record kept in a register, you have to do
> the following in the general case:
> 1) extract the field. On generic x86, that would be a move to a
> temporary register, then possibly a shift, and then possibly an "and".
> 2) perform the operation
> 3) possibly shift back the value to the corect position, clear it in the
> original register (mask its position with 0), and then "or" the result
> to insert it again
>
> In this case, just loading a value from memory (probably L1 cache, since
> register variables are only used locally within a single routine),
> performing the operation, and storing it back, is quite likely to be
> faster, and definitely results in much smaller code.
>
> However, as you've undoubtedly realised, in this case none of that
> shifting/masking would come into play, since the record only contains a
> single field. So you could definitely add an exception for that case for
> all architectures. We even have the perfect helper method for that in
> the mean time: tabstractrecordsymtable.has_single_field()
>
>
> Jonas
>
> PS: that person also asked the same question on the forum
> (https://forum.lazarus.freepascal.org/index.php?topic=50364)
>
> PS2: the case Benito mentions is a different thing again. Managed
> records can never be kept *only* in a register, because they need
> initialisation and finalisation, which requires them to be in memory.
> Caching individual fields of those locally in a register (while the
> record itself remains in memory) would definitely require the general
> optimisation I mentioned in the first paragraph.
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>

-- 
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



More information about the fpc-devel mailing list