[fpc-pascal] FPC Graphics options?

Tue May 23 02:06:19 CEST 2017

I realized I should have posted this in fpc-other. So, please reply in 
[fpc-other] and not here.

On 05/23/2017 03:03 AM, Nikolay Nikolov wrote:
>
>
> On 05/23/2017 01:20 AM, noreply at z505.com wrote:
>> On 2017-05-18 19:54, Ryan Joseph wrote:
>>>> On May 18, 2017, at 10:40 PM, Jon Foster 
>>>> <jon-lists at jfpossibilities.com> wrote:
>>>>
>>>> 62.44      1.33     1.33 fpc_frac_real
>>>> 26.76      1.90     0.57 MATH_$$_FLOOR$EXTENDED$$LONGINT
>>>> 10.33      2.12     0.22 FPC_DIV_INT64
>>>
>>> Thanks for profiling this.
>>>
>>> Floor is there as I expected and 26% is pretty extreme but the others
>>> are floating point division? How does Java handle this so much better
>>> than FPC and what are the work arounds? Just curious. As it stands I
>>> can only reason that I need to avoid dividing floats in FPC like the
>>> plague.
>>>
>>
>> Isn't java just a wrapper around C?
> No. Java compilers generate code for a virtual machine, called JVM 
> (Java Virtual Machine). They do not generate code for x86 CPUs or any 
> other real CPU. The JVM is like a fictional CPU, that does not exist 
> in a silicon implementation anywhere, but is implemented in software 
> only.
>
> C compilers usually generate native code for real CPUs, just like FPC 
> does.
>
> Why does it matter? The x86 instruction set architecture has gone 
> through quite a long evolution and there are many instruction set 
> extensions, that were added along the way: 32-bit extensions (x86 
> originally started as 16-bit), the x87 FPU instructions (this was a 
> separate coprocessor in the beginning, but later became integrated 
> into the main CPU starting from the 486DX onwards), MMX, SSE, SSE2, 
> the 64-bit extensions (x86_64), SSE3, AVX, etc.
>
> There are generally two ways to do floating point on the x86:
>   - the x87 FPU - this is used by default by the FPC compiler on 
> 32-bit (and 16-bit) x86
>   - the SSE2 instruction set extension - this can replace the FPU and 
> generally works faster on modern CPUs. This is used by default by the 
> 64-bit FPC compiler. That's because all 64-bit x86 CPUs support this 
> extension.
>
> There is one disadvantage to using SSE2 instead of the x87 FPU - the 
> SSE2 instructions don't support the 80-bit extended precision float 
> type. There's no support for it in any of the later x86 instruction 
> set extensions either. If you need the 80-bit precision, the x87 FPU 
> is the only way to go, even on x86_64.
>
> There's another disadvantage to using SSE2 by default on 32-bit x86 - 
> programs, compiled for SSE2 will not run on older CPUs, which don't 
> support SSE2. There's simply no way around that. Therefore, we cannot 
> make use of SSE2 by default, without sacrificing backwards 
> compatibility. The only exception to that are certain RTL routines, 
> like Move() or FillChar() which take advantage of the SSE2 extensions, 
> because they check the CPU capabilities at runtime and internally 
> dispatch to several different implementations, for different CPU 
> types, which are all compiled and linked in. But you simply cannot 
> take this approach for every FPU operation, because if you do a CPU 
> check on every floating point calculation, the overhead of all the 
> checks will make your program slower that it would be, if you simply 
> used the x87 FPU instructions for example.
>
> Virtual machines like the JVM don't have this problem and they can 
> always take advantage of newer instruction set extensions, without 
> sacrificing backward compatibility with older CPUs. Why? Because the 
> JVM bytecode has nothing to do with any processor at all. When you run 
> your program, the JVM bytecode is converted ("Just-In-Time" compiled) 
> to native code for the CPU the user has. So, if the user is running 
> your Java program on a CPU, that has SSE3, the JIT compiler will know 
> it can use SSE2 and SSE3 instructions. If another person runs it on an 
> older CPU, which doesn't have SSE2, the JIT compiler will compile it 
> to use x87 FPU instructions. Sounds so great, you're going to ask if 
> there are any disadvantages to this approach? And, of course, there 
> are - since the program is essentially recompiled every time the user 
> runs it, starting Java programs take a long time. There's also limited 
> time that the JIT compiler can spend on optimization (otherwise 
> programs will start even slower). There are ways to combat that, by 
> using some sort of cache (.NET has the global assembly cache), but 
> they are far from perfect either - these caches eat a lot of disk 
> space and then either program installation or the first time it is run 
> (when the JIT compiled assembly hasn't been added to the cache) 
> becomes slow. In general native programs (FPC and C programs) feel a 
> lot snappier to most users, because they start fast. But in the highly 
> specific case of heavy floating point code (where SSE2 vs x87 FPU 
> instruction sets matter), a native program (C or Pascal) compiled for 
> the x87 FPU will be slower than the JVM, because the JVM will use SSE2 
> and SSE3 on modern CPUs.
>
> Does this mean that it's always better to use the JVM? No. I mean, if 
> it suits you, go ahead and use it, there's nothing wrong with it (even 
> FPC supports it as a target: http://wiki.freepascal.org/FPC_JVM ), but 
> there are a lot of options for using native code as well:
> - if SSE2 and SSE3 make a huge performance difference for your 
> program, and you don't need to support old CPUs (e.g. your users are 
> happy about it or your program would be too slow to be usable on these 
> CPUs anyway, since you need a lot of CPU performance), then enable 
> {$fputype sse3} and probably recompile the RTL with it, to take full 
> advantage of it.
> - if SSE2 and SSE3 (or AVX or whatever new extension) make a huge 
> performance difference, but old CPU support is still valuable for your 
> users, then compile and provide two .exe files - one for old CPUs and 
> one for new ones.
> - if SSE2 and SSE3 don't make a difference, then you're not writing 
> floating point heavy code and you're happy with the default settings 
> :) The compatibility with older CPUs is only a bonus in this case and 
> isn't hurting your performance on new CPUs.
>
> And, of course, it is easy to give examples, where a Java program 
> would be a lot slower than a FPC program. I know comparing different 
> IDEs is a little apples-to-oranges comparison (because they may have 
> different features and vastly different implementation details), but 
> compare the speed of e.g. Lazarus to any IDE, written in Java, even 
> the fastest one. :)
>
> Anyhow, enough ranting, already. Just remember the golden rule of 
> optimization: never assume.
>
> Always measure and try to understand why something is slow. In 99% of 
> the cases it's not what people initially think.
>
> Nikolay