[fpc-devel] Vectorization

Adriaan van Os fpc at microbizz.nl
Tue Dec 12 10:50:41 CET 2017

J. Gareth Moreton wrote:

> - There is no desire to include MOVUPS instructions because, while they will work for unaligned memory, are 
> much slower than MOVAPS, but MOVAPS will cause a segmentation fault if the memory is not aligned.

Memory should be aligned when using vector code. And developers should know that. The objective of 
vectorization is speed, not to be nice to developers that don't know what they are doing. So, if 
they get a crash, it's their fault. Maybe, the compiler can issue a warning if data used in vector 
code is not aligned.

See e.g. Section 5.3 DATA ALIGNMENT of the IntelĀ® 64 and IA-32 Architectures Optimization Reference 
Manual (where movaps and palignr are used for SSE3 optimized code).

I suggest an FPC runtime function to get an aligned block of memory on the heap. I use 
posix_memalign, but note that Mac OS X has a severe bug where posix_memalign with size 0 causes 
memory corruption ! For size 0, malloc can be used (or a nil pointer returned).

Note that 64-bit AVX-512 <https://software.intel.com/en-us/node/523777>  instructions require 
64-byte alignment.


Adriaan van Os

More information about the fpc-devel mailing list