[fpc-devel] Question about memory alignment (again!)

Thu Aug 18 10:06:09 CEST 2022

Thanks for the additional research.  I may be reinventing the wheel a 
bit with some of these discoveries.

Still, it feels good to beat gcc occasionally!

Gareth aka. Kit

On 18/08/2022 08:48, Stefan Glienke via fpc-devel wrote:
> Interestingly this is what clang also does:
>
> https://godbolt.org/z/Y4v14f9s3
>
>
>> On 17/08/2022 02:21 CEST J. Gareth Moreton via fpc-devel <fpc-devel at lists.freepascal.org> wrote:
>>
>>   
>> Hi everyone,
>>
>> Recently I've made some optimisations centred around the SHR instruction
>> on x86, and there was one pair of instructions that caught my attention:
>>
>> movl (%rbx),%eax
>> shrl $24,%eax
>>
>> Is it permissible to optimise this to (x86 is little-endian):
>>
>> movzbl 3(%rbx),%eax?
>>
>> (You could also optimise "movl; sarl" into a "movsbl" instruction this way)
>>
>> Logically the result is the same and it removes an instruction and a
>> pipeline stall, but will there be a performance hit that comes from
>> reading an unaligned byte of memory like that?
>>
>> I did make similar optimisation once before with QWords using the
>> implicit zero-extension of the 32-bit MOV instruction - that is:
>>
>> movq (%rbx),%rax
>> shrq $32,%rax
>>
>> To:
>>
>> movl 4(%rbx),%eax
>>
>> This one is a little nicer though because it's still on a 32-bit
>> boundary and so was permissible.
>>
>> Gareth aka. Kit
>>
>> _______________________________________________
>> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>