<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 1/13/22 10:58, Ben Grasset via
fpc-devel wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAL4d7FhucuEDFNkhQQVaQrf1AhC-X3+3dSer0sg-ZD80-YBCCw@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr">
<p>On Thu, Jan 13, 2022 at 1:58 AM Nikolay Nikolov via
fpc-devel <<a
href="mailto:fpc-devel@lists.freepascal.org"
moz-do-not-send="true" class="moz-txt-link-freetext">fpc-devel@lists.freepascal.org</a>>
wrote:</p>
</div>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p> I haven't tested in Windows, but it would be very
strange and suspicious if the results are very
different.</p>
</div>
</blockquote>
<div> </div>
<div>It would be neither of those things. The exception
handling on x64 Windows is the fastest provided by FPC, for
example (though the compiler AFAIK avoids doing anything
that would generate exception handling code within its own
codebase as much as possible).</div>
</div>
</div>
</blockquote>
<p>So, instead of giving actual benchmark data on the Windows
performance, you speculate by claiming that having faster
exception handling matters, and then you immediately debunk your
own argument by admitting it probably doesn't matter for the
compilation speed. Sure, using SSE2 is also faster, but it doesn't
matter for the compilation speed at all, because all the
performance critical parts are integer code, therefore it would be
silly to give this as an argument as well. Sometimes 64-bit is
faster (due to SSE2, AVX, exception handling, having more
registers), sometimes 32-bit is faster (pointers are half the
size, leading to less memory use, leading to less memory bandwidth
requirements and more data fitting in the processor caches). Which
is faster must always be determined by running some sort of
benchmark, not by theoretical speculation. Rule number 1 of
optimization is "never assume".</p>
<p>The fact that 32-bit x86 is sometimes faster is the reason why
things like x86-32<br>
</p>
<blockquote type="cite"
cite="mid:CAL4d7FhucuEDFNkhQQVaQrf1AhC-X3+3dSer0sg-ZD80-YBCCw@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<div>On Thu, Jan 13, 2022 at 1:58 AM Nikolay Nikolov via
fpc-devel <<a
href="mailto:fpc-devel@lists.freepascal.org"
moz-do-not-send="true" class="moz-txt-link-freetext">fpc-devel@lists.freepascal.org</a>>
wrote: </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>A bug report with steps to reproduce would probably be
nice.</p>
</div>
</blockquote>
<div> </div>
<div>That limit is a fundamental hardware limitation, not a
bug.</div>
</div>
</div>
</blockquote>
We claim it's virtually impossible to exceed it in practice and we
don't support a native win64 compiler, therefore it's a bug, if it's
impossible to compile something with the 32-bit crosscompiler.<br>
<blockquote type="cite"
cite="mid:CAL4d7FhucuEDFNkhQQVaQrf1AhC-X3+3dSer0sg-ZD80-YBCCw@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div> Not at all hard to imagine someone encountering it on
32-bit particularly if they're using Lazbuild for
multi-threaded compilation.</div>
</div>
</div>
</blockquote>
<p>Imagining something is one thing, whether it's possible to occur
in practice is another.<br>
</p>
<p>Why would it matter whether you use lazbuild with multi-threaded
compilation? Doesn't lazbuild start separate compiler processes?
Every 32-bit process gets a separate 4GB address space (meaning
that each process has a different set of page tables, thus a
different mapping of linear to physical memory addresses, this is
a memory protection mechanism, that ensures that one process isn't
able to destroy the memory of another process). In 64-bit
operating systems (as well as 32-bit, that use PAE, that is
Physical Address Extension), each such set of page tables can map
32-bit linear memory pages to physical memory beyond the 4GB limit
(with PAE it's a 32-bit to 36-bit mapping, in long mode it's a
32-bit to 64-bit mapping), therefore running multiple 32-bit
processes at the same time can access more than 4GB in total, even
though each process is limited to 4GB.<br>
</p>
<p>Nikolay</p>
</body>
</html>