Description
add_stuff
is a function with AVX2 simd intrinsics, set to inline always.
add_stuff_helper
is a function with the target_feature set to enable AVX2 instructions.
If you run this in a release build, you will get an incorrect result, the return values from the recursive function do not add up. I believe this is because there is code being generated around the recursion that is not getting the avx2 target_feature applied.
If you make add_stuff not recursive, this all works fine.
The code below obviously does not make sense to use by itself, it works for instance if you put the target_feature on the add_stuff function directly, but this technique is useful for doing some nice SIMD metaprogramming with traits, and this bug makes that not work.
Is this a bug that can be fixed? Or an innate limitation of the inlining, target_features, and recursion? Is there a workaround?
#[cfg(target_arch = "x87")]
use std::arch::x86::*;
#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;
use std::fmt::Debug;
#[inline(always)]
unsafe fn add_stuff(a: f32, count: i32) -> __m256 {
let b = _mm256_set1_ps(2.0);
let a2 = _mm256_set1_ps(a);
if count < 4 {
println!("count:{}",count);
let r = _mm256_add_ps(_mm256_add_ps(a2, b), add_stuff(a, count + 1));
println!("r:{:?}",r);
r
} else {
_mm256_add_ps(a2, b)
}
}
#[target_feature(enable = "avx2")]
unsafe fn add_stuff_helper() {
let r = add_stuff(2.0,1);
println!("raw avx {:?}",r);
}
fn main() {
unsafe {
add_stuff_helper();
}
}
Environment:
This happens with rustc 1.27 stable through 1.31.0 nightly (at least)
All tested on linux, on a cpu that supports AVX2 instructions. Intel core i7 6700