[fpc-devel] SSE/AVX instruction encodings

Fri Oct 2 12:14:04 CEST 2020

In the meantime, I've uploaded the patch to the bug report after 
confirming that all tests on x86_64-win64 have passed with no 
regressions: https://bugs.freepascal.org/view.php?id=37785

Other platforms and AVX-512-specific code still need testing though.

Gareth aka. Kit

On 02/10/2020 07:59, J. Gareth Moreton via fpc-devel wrote:
> Hi Torsten,
>
> The reason why it's not compiling correctly with -a is because the 
> operand size is being set to S_XMM, not S_YMM (because it's going by 
> the size of the source operand), so when writing the .s files, it adds 
> an 'x' suffix to the end of the opcode.
>
> I know there's a high risk of it breaking existing code, but there are 
> a lot of exceptional cases in the function check, many of which are 
> SSE/AVX instructions that deal with operands of different sizes.
>
> Gareth aka. Kit
>
> On 01/10/2020 23:09, avx512--- via fpc-devel wrote:
>> Hi Gareth,
>>
>> in my opinion it is not a good idea to introduce a new function to 
>> calculate the operand size.
>>
>> The risk of breaking existing code (fpc and user code) is very high.
>>
>> I introduced the system with memrefinfo for sse and avx opcodes to 
>> protect the existing user code. The basis of this concept is the 
>> opcode definition in x86ins.dat
>>
>> In trunk is the definition for opcode VCVTPD2PS:
>>
>> ; VCVTPD2PS xmmreg_mz,mem256 must come first - map MemRefSize 256bits 
>> correct
>> ;                                              map all other 
>> MemrefSize (without broasdcast MemRef) to xmmreg, xmmrm
>> [VCVTPD2PS,vcvtpd2psM]
>> (Ch_Wop2, Ch_Rop1)
>> xmmreg_mz,mem256 \350\352\361\362\364\370\1\x5A\110        
>> AVX,SANDYBRIDGE,TFV
>> xmmreg_mz,ymmreg \350\352\361\362\364\370\1\x5A\110        
>> AVX,SANDYBRIDGE
>> xmmreg_mz,xmmrm \350\352\361\362\370\1\x5A\110            
>> AVX,SANDYBRIDGE,TFV
>>
>> // AVX512
>> xmmreg_mz,bmem64 \350\352\361\370\1\x5A\110                
>> AVX512,BCST2,TFV
>> xmmreg_mz,bmem64 \350\352\361\364\370\1\x5A\110            
>> AVX512,BCST4,TFV
>> ymmreg_mz,mem512 \350\351\352\361\370\1\x5A\110            AVX512,TFV
>> ymmreg_mz,bmem64 \350\351\352\361\370\1\x5A\110            
>> AVX512,BCST8,TFV
>> ymmreg_mz,zmmreg_er \350\351\352\361\370\1\x5A\110            AVX512
>>
>>
>> In trunk is compiling correct (without compileroption -a), with -a is 
>> not correct. I check this.
>>
>> Torsten
>>
>>
>>
>> -----Original-Nachricht-----
>> Betreff: Re: [fpc-devel] SSE/AVX instruction encodings
>> Datum: 2020-10-01T18:04:26+0200
>> Von: "J. Gareth Moreton via fpc-devel" <fpc-devel at lists.freepascal.org>
>> An: "fpc-devel at lists.freepascal.org" <fpc-devel at lists.freepascal.org>
>>
>> Hi Torsten,
>>
>> I've done that already actually, although only to grab the value of the
>> ExistsSSEAVX field.  I'm currently testing a new nested function in
>> Tx86Instruction.SetInstructionOpsize:
>>
>>     function CheckSSEAVX: Boolean;
>>       begin
>>         Result := False;
>>
>>         if not MemRefInfo(opcode).ExistsSSEAVX then
>>           Exit;
>>
>>         { This check also covers MMX instructions that move data to 
>> and from
>>           32-bit and 64-bit registers or memory, since such 
>> instructions are
>>           replicated in SSE2 for use with XMM registers }
>>         if tx86operand(operands[1]).opsize in [S_B,S_W,S_L,S_Q] then
>>           begin
>>             opsize := S_NO;
>>             Exit(True);
>>           end;
>>
>>         if (tx86operand(operands[1]).opsize <> S_NO) and
>> (operands[1].opr.typ = OPR_REFERENCE) then
>>           begin
>>             { Memory sizes of 64 bits and under are handled above }
>>             opsize:=tx86operand(operands[1]).opsize;
>>             Exit(True);
>>           end;
>>
>>         { If the source operand is larger than the destination (e.g.
>>           "VCVTTPD2DQ XMM0, YMM1" in Intel notation), use the source
>> operand }
>>         if ((tx86operand(operands[1]).opsize = S_YMM) and
>> (tx86operand(operands[2]).opsize = S_XMM)) or
>>           (tx86operand(operands[1]).opsize = S_ZMM) and
>> (tx86operand(operands[2]).opsize = S_XMM) or
>>           (tx86operand(operands[1]).opsize = S_ZMM) and
>> (tx86operand(operands[2]).opsize = S_YMM) then
>>           begin
>>             opsize:=tx86operand(operands[1]).opsize;
>>             Exit(True);
>>           end;
>>
>>         { If none of the conditions are met, this function returns False
>> and the
>>           opsize is set to the last operand's opsize }
>>       end;
>>
>> I've also commented out the individual checks for MOVD, MOVQ, VMOVQ etc
>> to see how it handles itself and to simplify the code. "make all" at
>> least works successfully and it fixes the bug listed in #37785, but it
>> will need extensive testing, lest I break someone's assembly language.
>>
>> Note that the reason why I've done "(tx86operand(operands[1]).opsize =
>> S_YMM) and (tx86operand(operands[2]).opsize = S_XMM)" etc. and not
>> something like "(tx86operand(operands[1]).opsize >= S_YMM) and
>> (tx86operand(operands[1]).opsize > tx86operand(operands[2]).opsize)" is
>> for future safety, since the opsize field doesn't have items in size
>> order (plus some entries, like S_BL, don't have a distinct size because
>> it's a size conversion) and it's to prevent an unintended side-effect if
>> a new entry is added after S_ZMM in the future.
>>
>> One thing that makes it difficult is that I don't have a processor that
>> supports the AVX-512 instruction set, at least I don't think it does
>> (Intel Core i7-10750H).
>>
>> Gareth aka. Kit
>>
>> P.S. If anyone can see a way to break the above code (before I submit a
>> patch), please tell me!
>>
>>
>> On 01/10/2020 15:52, avx512--- via fpc-devel wrote:
>>> Hi,
>>>
>>> look at the function "MemRefInfo(aAsmop: TAsmOp)" in 
>>> "compiler/x86/aasmcpu.pas".
>>>
>>>
>>> Torsten
>>>
>>>
>>>
>>> -----Original-Nachricht-----
>>> Betreff: [fpc-devel] SSE/AVX instruction encodings
>>> Datum: 2020-10-01T13:57:05+0200
>>> Von: "J. Gareth Moreton via fpc-devel" <fpc-devel at lists.freepascal.org>
>>> An: "FPC developers' list" <fpc-devel at lists.freepascal.org>
>>>
>>> Hi everyone,
>>>
>>> I've decided to take on https://bugs.freepascal.org/view.php?id=37785 -
>>> I've noticed that the compiler isn't too good at working out the sizes
>>> of SSE and AVX instructions.  If you look at
>>> Tx86Instruction.SetInstructionOpsize in compiler/x86/rax86.pas, it
>>> checks for individual problematic instructions rather than any logical
>>> flags.  I feel this isn't viable in the long-term (i.e. I really don't
>>> want to continually add exceptional instructions) and has the code 
>>> smell
>>> of something being fundamentally wrong or incomplete with how
>>> instruction sizes and encodings are determined.
>>>
>>> I'm looking to see if there's a way I can detect the correct size
>>> logically given the flags.  I figure I'll need to learn a few things
>>> about AVX512 as well so I don't mess anything up (I've noticed a few
>>> AVX512 flags to indicate if scalars rather than vectors are being used,
>>> and wondering if they can be incorporated into the older SSE and AVX
>>> instructions in x86ins.dat.
>>>
>>> Long story short, I'm going to experiment a bit to see if I can develop
>>> an algorithm that works and is correct.
>>>
>>> Gareth aka. Kit
>>>
>>>
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>

-- 
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus