<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>That can't happen or won't benefit much, before the compiler
      supports super-natual alignments. So there's a deeper level of
      support needed.<br>
      <br>
      And personally I don't think that's the right long term direction.
      It takes a long time to develop and maintain this stuff and you
      never know what the market will look like in 10 years.<br>
      ARM has SVE, and RISC-V has the upcoming vector extension which
      will move far away from the traditional SIMD stuff.<br>
      <br>
      Compiler support for block vectorization has rarely paid off
      really well given the amount of work that needs to go into it. So
      maybe it's better to wait for the next iteration :)<br>
    </p>
    <div class="moz-cite-prefix">On 4/6/19 11:13 PM, Ben Grasset wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAL4d7FgXf48q-j9UpcaN3_xG3XkTBisw_UiORFAt_AHtmp04bg@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr">
          <div dir="ltr">On Wed, Mar 27, 2019 at 11:32 AM J. Gareth
            Moreton <<a href="mailto:gareth@moreton-family.com"
              moz-do-not-send="true">gareth@moreton-family.com</a>>
            wrote:<br>
          </div>
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
              0.8ex;border-left:1px solid
              rgb(204,204,204);padding-left:1ex">
              <div>So with the false start that was pure inline
                assembly, I like to talk about how to move forward with
                FPC, or at least with x86_64.</div>
            </blockquote>
            <div><br>
            </div>
            <div>It occurred to me today, aren't you the person who
              fixed the -Sv compiler flag so that it actually works? I'd
              say expansion on that functionality would be more widely
              useful than just about anything else I can think of with
              regards to optimization (because it's so easy to use, and
              yet so powerful.)</div>
            <div><br>
            </div>
            <div>Maybe start with making it fully use AVX instructions
              for the operations? IIRC, currently, even if you use the
              AVX or AVX2 compiler flags, it will always generate stuff
              like this:</div>
            <div><br>
            </div>
            <div>
              <div>vmovups<span style="white-space:pre">  </span>(%rdx),%xmm0</div>
              <div>addps<span style="white-space:pre">    </span>(%r8),%xmm0</div>
              <div>vmovups<span style="white-space:pre">  </span>%xmm0,(%rax)</div>
            </div>
            <div><br>
            </div>
            <div>rather than using vaddps.</div>
            <div><br>
            </div>
            <div>From there you could make it support arrays larger than
              4 elements, e.t.c....</div>
          </div>
        </div>
      </div>
    </blockquote>
  </body>
</html>