[fpc-devel] SSE/AVX instruction encodings
J. Gareth Moreton
gareth at moreton-family.com
Fri Oct 2 12:14:04 CEST 2020
In the meantime, I've uploaded the patch to the bug report after
confirming that all tests on x86_64-win64 have passed with no
regressions: https://bugs.freepascal.org/view.php?id=37785
Other platforms and AVX-512-specific code still need testing though.
Gareth aka. Kit
On 02/10/2020 07:59, J. Gareth Moreton via fpc-devel wrote:
> Hi Torsten,
>
> The reason why it's not compiling correctly with -a is because the
> operand size is being set to S_XMM, not S_YMM (because it's going by
> the size of the source operand), so when writing the .s files, it adds
> an 'x' suffix to the end of the opcode.
>
> I know there's a high risk of it breaking existing code, but there are
> a lot of exceptional cases in the function check, many of which are
> SSE/AVX instructions that deal with operands of different sizes.
>
> Gareth aka. Kit
>
> On 01/10/2020 23:09, avx512--- via fpc-devel wrote:
>> Hi Gareth,
>>
>> in my opinion it is not a good idea to introduce a new function to
>> calculate the operand size.
>>
>> The risk of breaking existing code (fpc and user code) is very high.
>>
>> I introduced the system with memrefinfo for sse and avx opcodes to
>> protect the existing user code. The basis of this concept is the
>> opcode definition in x86ins.dat
>>
>> In trunk is the definition for opcode VCVTPD2PS:
>>
>> ; VCVTPD2PS xmmreg_mz,mem256 must come first - map MemRefSize 256bits
>> correct
>> ; map all other
>> MemrefSize (without broasdcast MemRef) to xmmreg, xmmrm
>> [VCVTPD2PS,vcvtpd2psM]
>> (Ch_Wop2, Ch_Rop1)
>> xmmreg_mz,mem256 \350\352\361\362\364\370\1\x5A\110
>> AVX,SANDYBRIDGE,TFV
>> xmmreg_mz,ymmreg \350\352\361\362\364\370\1\x5A\110
>> AVX,SANDYBRIDGE
>> xmmreg_mz,xmmrm \350\352\361\362\370\1\x5A\110
>> AVX,SANDYBRIDGE,TFV
>>
>> // AVX512
>> xmmreg_mz,bmem64 \350\352\361\370\1\x5A\110
>> AVX512,BCST2,TFV
>> xmmreg_mz,bmem64 \350\352\361\364\370\1\x5A\110
>> AVX512,BCST4,TFV
>> ymmreg_mz,mem512 \350\351\352\361\370\1\x5A\110 AVX512,TFV
>> ymmreg_mz,bmem64 \350\351\352\361\370\1\x5A\110
>> AVX512,BCST8,TFV
>> ymmreg_mz,zmmreg_er \350\351\352\361\370\1\x5A\110 AVX512
>>
>>
>> In trunk is compiling correct (without compileroption -a), with -a is
>> not correct. I check this.
>>
>> Torsten
>>
>>
>>
>> -----Original-Nachricht-----
>> Betreff: Re: [fpc-devel] SSE/AVX instruction encodings
>> Datum: 2020-10-01T18:04:26+0200
>> Von: "J. Gareth Moreton via fpc-devel" <fpc-devel at lists.freepascal.org>
>> An: "fpc-devel at lists.freepascal.org" <fpc-devel at lists.freepascal.org>
>>
>> Hi Torsten,
>>
>> I've done that already actually, although only to grab the value of the
>> ExistsSSEAVX field. I'm currently testing a new nested function in
>> Tx86Instruction.SetInstructionOpsize:
>>
>> function CheckSSEAVX: Boolean;
>> begin
>> Result := False;
>>
>> if not MemRefInfo(opcode).ExistsSSEAVX then
>> Exit;
>>
>> { This check also covers MMX instructions that move data to
>> and from
>> 32-bit and 64-bit registers or memory, since such
>> instructions are
>> replicated in SSE2 for use with XMM registers }
>> if tx86operand(operands[1]).opsize in [S_B,S_W,S_L,S_Q] then
>> begin
>> opsize := S_NO;
>> Exit(True);
>> end;
>>
>> if (tx86operand(operands[1]).opsize <> S_NO) and
>> (operands[1].opr.typ = OPR_REFERENCE) then
>> begin
>> { Memory sizes of 64 bits and under are handled above }
>> opsize:=tx86operand(operands[1]).opsize;
>> Exit(True);
>> end;
>>
>> { If the source operand is larger than the destination (e.g.
>> "VCVTTPD2DQ XMM0, YMM1" in Intel notation), use the source
>> operand }
>> if ((tx86operand(operands[1]).opsize = S_YMM) and
>> (tx86operand(operands[2]).opsize = S_XMM)) or
>> (tx86operand(operands[1]).opsize = S_ZMM) and
>> (tx86operand(operands[2]).opsize = S_XMM) or
>> (tx86operand(operands[1]).opsize = S_ZMM) and
>> (tx86operand(operands[2]).opsize = S_YMM) then
>> begin
>> opsize:=tx86operand(operands[1]).opsize;
>> Exit(True);
>> end;
>>
>> { If none of the conditions are met, this function returns False
>> and the
>> opsize is set to the last operand's opsize }
>> end;
>>
>> I've also commented out the individual checks for MOVD, MOVQ, VMOVQ etc
>> to see how it handles itself and to simplify the code. "make all" at
>> least works successfully and it fixes the bug listed in #37785, but it
>> will need extensive testing, lest I break someone's assembly language.
>>
>> Note that the reason why I've done "(tx86operand(operands[1]).opsize =
>> S_YMM) and (tx86operand(operands[2]).opsize = S_XMM)" etc. and not
>> something like "(tx86operand(operands[1]).opsize >= S_YMM) and
>> (tx86operand(operands[1]).opsize > tx86operand(operands[2]).opsize)" is
>> for future safety, since the opsize field doesn't have items in size
>> order (plus some entries, like S_BL, don't have a distinct size because
>> it's a size conversion) and it's to prevent an unintended side-effect if
>> a new entry is added after S_ZMM in the future.
>>
>> One thing that makes it difficult is that I don't have a processor that
>> supports the AVX-512 instruction set, at least I don't think it does
>> (Intel Core i7-10750H).
>>
>> Gareth aka. Kit
>>
>> P.S. If anyone can see a way to break the above code (before I submit a
>> patch), please tell me!
>>
>>
>> On 01/10/2020 15:52, avx512--- via fpc-devel wrote:
>>> Hi,
>>>
>>> look at the function "MemRefInfo(aAsmop: TAsmOp)" in
>>> "compiler/x86/aasmcpu.pas".
>>>
>>>
>>> Torsten
>>>
>>>
>>>
>>> -----Original-Nachricht-----
>>> Betreff: [fpc-devel] SSE/AVX instruction encodings
>>> Datum: 2020-10-01T13:57:05+0200
>>> Von: "J. Gareth Moreton via fpc-devel" <fpc-devel at lists.freepascal.org>
>>> An: "FPC developers' list" <fpc-devel at lists.freepascal.org>
>>>
>>> Hi everyone,
>>>
>>> I've decided to take on https://bugs.freepascal.org/view.php?id=37785 -
>>> I've noticed that the compiler isn't too good at working out the sizes
>>> of SSE and AVX instructions. If you look at
>>> Tx86Instruction.SetInstructionOpsize in compiler/x86/rax86.pas, it
>>> checks for individual problematic instructions rather than any logical
>>> flags. I feel this isn't viable in the long-term (i.e. I really don't
>>> want to continually add exceptional instructions) and has the code
>>> smell
>>> of something being fundamentally wrong or incomplete with how
>>> instruction sizes and encodings are determined.
>>>
>>> I'm looking to see if there's a way I can detect the correct size
>>> logically given the flags. I figure I'll need to learn a few things
>>> about AVX512 as well so I don't mess anything up (I've noticed a few
>>> AVX512 flags to indicate if scalars rather than vectors are being used,
>>> and wondering if they can be incorporated into the older SSE and AVX
>>> instructions in x86ins.dat.
>>>
>>> Long story short, I'm going to experiment a bit to see if I can develop
>>> an algorithm that works and is correct.
>>>
>>> Gareth aka. Kit
>>>
>>>
> _______________________________________________
> fpc-devel maillist - fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
More information about the fpc-devel
mailing list