[fpc-pascal] FPC Graphics options?
Nikolay Nikolov
nickysn at gmail.com
Tue May 23 02:06:19 CEST 2017
I realized I should have posted this in fpc-other. So, please reply in
[fpc-other] and not here.
On 05/23/2017 03:03 AM, Nikolay Nikolov wrote:
>
>
> On 05/23/2017 01:20 AM, noreply at z505.com wrote:
>> On 2017-05-18 19:54, Ryan Joseph wrote:
>>>> On May 18, 2017, at 10:40 PM, Jon Foster
>>>> <jon-lists at jfpossibilities.com> wrote:
>>>>
>>>> 62.44 1.33 1.33 fpc_frac_real
>>>> 26.76 1.90 0.57 MATH_$$_FLOOR$EXTENDED$$LONGINT
>>>> 10.33 2.12 0.22 FPC_DIV_INT64
>>>
>>> Thanks for profiling this.
>>>
>>> Floor is there as I expected and 26% is pretty extreme but the others
>>> are floating point division? How does Java handle this so much better
>>> than FPC and what are the work arounds? Just curious. As it stands I
>>> can only reason that I need to avoid dividing floats in FPC like the
>>> plague.
>>>
>>
>> Isn't java just a wrapper around C?
> No. Java compilers generate code for a virtual machine, called JVM
> (Java Virtual Machine). They do not generate code for x86 CPUs or any
> other real CPU. The JVM is like a fictional CPU, that does not exist
> in a silicon implementation anywhere, but is implemented in software
> only.
>
> C compilers usually generate native code for real CPUs, just like FPC
> does.
>
> Why does it matter? The x86 instruction set architecture has gone
> through quite a long evolution and there are many instruction set
> extensions, that were added along the way: 32-bit extensions (x86
> originally started as 16-bit), the x87 FPU instructions (this was a
> separate coprocessor in the beginning, but later became integrated
> into the main CPU starting from the 486DX onwards), MMX, SSE, SSE2,
> the 64-bit extensions (x86_64), SSE3, AVX, etc.
>
> There are generally two ways to do floating point on the x86:
> - the x87 FPU - this is used by default by the FPC compiler on
> 32-bit (and 16-bit) x86
> - the SSE2 instruction set extension - this can replace the FPU and
> generally works faster on modern CPUs. This is used by default by the
> 64-bit FPC compiler. That's because all 64-bit x86 CPUs support this
> extension.
>
> There is one disadvantage to using SSE2 instead of the x87 FPU - the
> SSE2 instructions don't support the 80-bit extended precision float
> type. There's no support for it in any of the later x86 instruction
> set extensions either. If you need the 80-bit precision, the x87 FPU
> is the only way to go, even on x86_64.
>
> There's another disadvantage to using SSE2 by default on 32-bit x86 -
> programs, compiled for SSE2 will not run on older CPUs, which don't
> support SSE2. There's simply no way around that. Therefore, we cannot
> make use of SSE2 by default, without sacrificing backwards
> compatibility. The only exception to that are certain RTL routines,
> like Move() or FillChar() which take advantage of the SSE2 extensions,
> because they check the CPU capabilities at runtime and internally
> dispatch to several different implementations, for different CPU
> types, which are all compiled and linked in. But you simply cannot
> take this approach for every FPU operation, because if you do a CPU
> check on every floating point calculation, the overhead of all the
> checks will make your program slower that it would be, if you simply
> used the x87 FPU instructions for example.
>
> Virtual machines like the JVM don't have this problem and they can
> always take advantage of newer instruction set extensions, without
> sacrificing backward compatibility with older CPUs. Why? Because the
> JVM bytecode has nothing to do with any processor at all. When you run
> your program, the JVM bytecode is converted ("Just-In-Time" compiled)
> to native code for the CPU the user has. So, if the user is running
> your Java program on a CPU, that has SSE3, the JIT compiler will know
> it can use SSE2 and SSE3 instructions. If another person runs it on an
> older CPU, which doesn't have SSE2, the JIT compiler will compile it
> to use x87 FPU instructions. Sounds so great, you're going to ask if
> there are any disadvantages to this approach? And, of course, there
> are - since the program is essentially recompiled every time the user
> runs it, starting Java programs take a long time. There's also limited
> time that the JIT compiler can spend on optimization (otherwise
> programs will start even slower). There are ways to combat that, by
> using some sort of cache (.NET has the global assembly cache), but
> they are far from perfect either - these caches eat a lot of disk
> space and then either program installation or the first time it is run
> (when the JIT compiled assembly hasn't been added to the cache)
> becomes slow. In general native programs (FPC and C programs) feel a
> lot snappier to most users, because they start fast. But in the highly
> specific case of heavy floating point code (where SSE2 vs x87 FPU
> instruction sets matter), a native program (C or Pascal) compiled for
> the x87 FPU will be slower than the JVM, because the JVM will use SSE2
> and SSE3 on modern CPUs.
>
> Does this mean that it's always better to use the JVM? No. I mean, if
> it suits you, go ahead and use it, there's nothing wrong with it (even
> FPC supports it as a target: http://wiki.freepascal.org/FPC_JVM ), but
> there are a lot of options for using native code as well:
> - if SSE2 and SSE3 make a huge performance difference for your
> program, and you don't need to support old CPUs (e.g. your users are
> happy about it or your program would be too slow to be usable on these
> CPUs anyway, since you need a lot of CPU performance), then enable
> {$fputype sse3} and probably recompile the RTL with it, to take full
> advantage of it.
> - if SSE2 and SSE3 (or AVX or whatever new extension) make a huge
> performance difference, but old CPU support is still valuable for your
> users, then compile and provide two .exe files - one for old CPUs and
> one for new ones.
> - if SSE2 and SSE3 don't make a difference, then you're not writing
> floating point heavy code and you're happy with the default settings
> :) The compatibility with older CPUs is only a bonus in this case and
> isn't hurting your performance on new CPUs.
>
> And, of course, it is easy to give examples, where a Java program
> would be a lot slower than a FPC program. I know comparing different
> IDEs is a little apples-to-oranges comparison (because they may have
> different features and vastly different implementation details), but
> compare the speed of e.g. Lazarus to any IDE, written in Java, even
> the fastest one. :)
>
> Anyhow, enough ranting, already. Just remember the golden rule of
> optimization: never assume.
>
> Always measure and try to understand why something is slow. In 99% of
> the cases it's not what people initially think.
>
> Nikolay
More information about the fpc-pascal
mailing list