vectorization - Why doesn't this C vector loop auto-vectorise? -


i trying optimise code use of avx intrinsics. simple test case compiles tells me loop not vectorised number of reasons don't understand.

this full program, simple.c

#include <math.h> #include <stdlib.h> #include <assert.h> #include <immintrin.h>  int main(void) {    __m256 * x = (__m256 *) calloc(1024,sizeof(__m256));        (int j=0;j<32;j++)     x[j] = _mm256_set1_ps(1.);     return(0); } 

this command line: gcc simple.c -o1 -fopenmp -ffast-math -lm -mavx2 -ftree-vectorize -fopt-info-vec-missed

this output:

  • simple.c:11:3: note: not vectorized: unsupported data-type
  • simple.c:11:3: note: can't determine vectorization factor.
  • simple.c:6:5: note: not vectorized: not enough data-refs in basic block.
  • simple.c:11:3: note: not vectorized: not enough data-refs in basic block.
  • simple.c:6:5: note: not vectorized: not enough data-refs in basic block.
  • simple.c:6:5: note: not vectorized: not enough data-refs in basic block.

i have gcc version 5.4.

can me interpret these messages , understand going on?

you're manually vectorizing intrinsics, there's nothing left gcc auto-vectorize. leads uninteresting warnings, assume trying auto-vectorize intrinsic or loop-counter increments.

i asm gcc 5.3 (on godbolt compiler explorer) if don't silly write function optimize away, or try compile -o1.

#include <immintrin.h>  void set_to_1(__m256 * x) {   (int j=0;j<32;j++)     x[j] = _mm256_set1_ps(1.);  }      push    rbp     lea     rax, [rdi+1024]     vmovaps ymm0, ymmword ptr .lc0[rip]     mov     rbp, rsp     push    r10                      # gcc weird r10 in functions ymm vectors .l2:                                 # vector loop     vmovaps ymmword ptr [rdi], ymm0     add     rdi, 32     cmp     rdi, rax     jne     .l2     vzeroupper     pop     r10     pop     rbp     ret  .lc0:     .long   1065353216     ... repeated several times because gcc failed use vbroadcastss load or generate constant on fly 

i same asm -o1, using -o1 not optimize things away isn't way see gcc do.


Comments

Popular posts from this blog

magento2 - Magento 2 admin grid add filter to collection -

Android volley - avoid multiple requests of the same kind to the server? -

Combining PHP Registration and Login into one class with multiple functions in one PHP file -