[fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64

Thorsten Engler thorsten.engler at gmx.net
Sat Apr 28 15:33:09 CEST 2018


> -----Original Message-----
> From: fpc-devel <fpc-devel-bounces at lists.freepascal.org> On Behalf
> Of wkitty42 at windstream.net
> > Code address:
> > Frac1: 0000000000536430 (48)
> > Frac2: 0000000000536480 (0)
> > Frac3: 00000000005364D0 (80)
> > Frac4: 0000000000536520 (32)
> > Frac5: 0000000000536570 (112)
> > Frac6: 00000000005365C0 (64)
> > Frac7: 0000000000536610 (16)
> > Frac8: 0000000000536660 (96)
> why not 64? the pattern looks to be
> bad,good,bad,good,bad,good,bad,good but i'm very likely missing
> something...

Take a look at the pattern again, I've marked it with + and - this time:

Out of range (1e16+0.5):
-Frac1 1081226
+Frac2 892385
-Frac3 1078618
+Frac4 888650
-Frac5 1077251
-Frac6 1096165
-Frac7 1072769
+Frac8 882501

Only fraction (0.5):
-Frac1 990573
+Frac2 893984
-Frac3 989789
+Frac4 884970
-Frac5 975254
-Frac6 986453
-Frac7 983450
+Frac8 877745

Why not 64? No idea...

Some more funny points, if I set CODEALIGN to 1, so the 8 functions get aligned all over the board, I get the following result:

Code address:
Frac1: 0000000000536422 (34)
Frac2: 0000000000536458 (88)
Frac3: 000000000053648E (14)
Frac4: 00000000005364C4 (68)
Frac5: 00000000005364FA (122)
Frac6: 0000000000536530 (48)
Frac7: 0000000000536566 (102)
Frac8: 000000000053659C (28)

1st run:
In range (1e15+0.5):
Frac1 1322656
Frac2 1326933
Frac3 1321368
Frac4 1311924
Frac5 1302156
Frac6 1311717
Frac7 1314821
Frac8 1304493

Out of range (1e16+0.5):
Frac1 796050
Frac2 914107
Frac3 1086412
Frac4 792253
Frac5 999116
Frac6 1091263
Frac7 792960
Frac8 996593

Only fraction (0.5):
Frac1 889606
Frac2 887059
Frac3 978125
Frac4 888643
Frac5 989546
Frac6 990639
Frac7 891451
Frac8 993132

The in range numbers are pretty much all the same, the only fraction numbers are in the same ballpark for the "good" and "bad" cases, but for the out of range cases, for case 1, 4, and 7 (offsets of 34, 68, 102) the result is actually about 8-10% better than for offsets 0, 32, and 96.

The real shocker comes if I run it again after adding a single additional nop in a spacer function that I have above Frac1:

procedure XXX1;
asm
  .noframe
  nop
  nop // added this
end;

function Frac1(const X: Double): Double;
...

Code address:
Frac1: 0000000000536423 (35)
Frac2: 0000000000536459 (89)
Frac3: 000000000053648F (15)
Frac4: 00000000005364C5 (69)
Frac5: 00000000005364FB (123)
Frac6: 0000000000536531 (49)
Frac7: 0000000000536567 (103)
Frac8: 000000000053659D (29)

1st run:
In range (1e15+0.5):
Frac1 4541094
Frac2 4451186
Frac3 4446460
Frac4 4450625
Frac5 4323903
Frac6 4389160
Frac7 4422070
Frac8 1291440

Out of range (1e16+0.5):
Frac1 892037
Frac2 882626
Frac3 1082966
Frac4 785305
Frac5 887439
Frac6 991194
Frac7 789911
Frac8 987714

Only fraction (0.5):
Frac1 2245965
Frac2 2172882
Frac3 2260256
Frac4 2154653
Frac5 2158976
Frac6 2257709
Frac7 2147712
Frac8 979118

Total blowout. Time for In Range have ballooned by factor 3-4, for only fraction by over 2.

Clearly, even if you don't care about the differences between aligning to offset 0 vs 16, these show that proper alignment to at least a multiple of 2 is absolutely essential.

I find it interesting that in this case offset 29 (Frac8) performs pretty well for the in range and only fraction cases, but badly for out of range. And offset 103 (Frac7) performs very well for the out of range case, but just as badly as most of the others for in range or fraction only.

I've attached the source (I'm using Delphi 10.2.3, 64bit to compile it) in case anyone wants to try it out on different cpus and with different alignments (change the {$CODEALIGN 1} and add nops to the XXX1 .. XXX8 procedures to finetune alignment).

> also, not only highly dependent on the CPU but also on what other
> processes may
> be running and consuming some CPU time... i'm not even sure that
> booting linux
> to "single" mode would get you a system completely dedicated to one
> task like in
> the old DOS world...

There are two approaches you can use under windows to get pretty reliable timings:

a)

  SetThreadPriority(GetCurrentThread, THREAD_PRIORITY_TIME_CRITICAL);
...
  SetThreadPriority(GetCurrentThread, THREAD_PRIORITY_NORMAL);

As long as the total number of active (not in a wait state) realtime threads is smaller than the number of cores, this does a very good job of keeping your process running continuously. You could use SetProcessAffinityMask on top of that to make sure that it keeps running on one specific core. This very greatly reduces the impact of any other background activity on the timing.

This is what I used in this case because it was very easy to do.

The timing results are very consistent between multiple runs (within 1-3% I would say, definitely much below the difference between the different alignments).

b)

instead of the normal high precision timer, use:

function GetThreadTimes(hThread: THandle;
  var lpCreationTime, lpExitTime, lpKernelTime, lpUserTime: TFileTime): BOOL; stdcall;

lpKernelTime and lpUserTime only count the time while the thread is executing, not any time that it spends in a wait state or waiting to be scheduled to run.

This can give you pretty exact timing values for how long your code actually executed, independent of how often the thread was suspended to allow some other thread to run.

You do still have some variation as every thread context switch ends up resetting the TLBs which can significantly impact performance of memory access. So a combination of a) and b) gives the best possible results while running on a pre-emptive multi-tasking OS.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Project21.dpr
Type: application/octet-stream
Size: 7812 bytes
Desc: not available
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20180428/a520bf9b/attachment.obj>


More information about the fpc-devel mailing list