[fpc-devel] Vectorization
Adriaan van Os
fpc at microbizz.nl
Tue Dec 12 10:50:41 CET 2017
J. Gareth Moreton wrote:
> - There is no desire to include MOVUPS instructions because, while they will work for unaligned memory, are
> much slower than MOVAPS, but MOVAPS will cause a segmentation fault if the memory is not aligned.
Memory should be aligned when using vector code. And developers should know that. The objective of
vectorization is speed, not to be nice to developers that don't know what they are doing. So, if
they get a crash, it's their fault. Maybe, the compiler can issue a warning if data used in vector
code is not aligned.
See e.g. Section 5.3 DATA ALIGNMENT of the IntelĀ® 64 and IA-32 Architectures Optimization Reference
Manual (where movaps and palignr are used for SSE3 optimized code).
I suggest an FPC runtime function to get an aligned block of memory on the heap. I use
posix_memalign, but note that Mac OS X has a severe bug where posix_memalign with size 0 causes
memory corruption ! For size 0, malloc can be used (or a nil pointer returned).
Note that 64-bit AVX-512 <https://software.intel.com/en-us/node/523777> instructions require
64-byte alignment.
Regards,
Adriaan van Os
More information about the fpc-devel
mailing list