Skip to content

Commit e38d5ac

Browse files
authored
Move from #[inline(always)] to #[inline] (rust-lang#306)
* Move from #[inline(always)] to #[inline] This commit blanket changes all `#[inline(always)]` annotations to `#[inline]`. Fear not though, this should not be a regression! To clarify, though, this change is done out of correctness to ensure that we don't hit stray LLVM errors. Most of the LLVM intrinsics and various LLVM functions we actually lower down to only work correctly if they are invoked from a function with an appropriate target feature set. For example if we were to out-of-the-blue invoke an AVX intrinsic then we get a [codegen error][avx-error]. This error comes about because the surrounding function isn't enabling the AVX feature. Now in general we don't have a lot of control over how this crate is consumed by downstream crates. It'd be a pretty bad mistake if all mistakes showed up as scary un-debuggable codegen errors in LLVM! On the other side of this issue *we* as the invokers of these intrinsics are "doing the right thing". All our functions in this crate are tagged appropriately with target features to be codegen'd correctly. Indeed we have plenty of tests asserting that we can codegen everything across multiple platforms! The error comes about here because of precisely the `#[inline(always)]` attribute. Typically LLVM *won't* inline functions across target feature sets. For example if you have a normal function which calls a function that enables AVX2, then the target, no matter how small, won't be inlined into the caller. This is done for correctness (register preserving and all that) but is also how these codegen errors are prevented in practice. Now we as stdsimd, however, are currently tagging all functions with "always inline this, no matter what". That ends up, apparently, bypassing the logic of "is this even possible to inline". In turn we start inlining things like AVX intrinsics into functions that can't actually call AVX intrinsics, creating codegen errors at compile time. So with all that motivation, this commit switches to the normal inline hints for these functions, just `#[inline]`, instead of `#[inline(always)]`. Now for the stdsimd crate it is absolutely critical that all functions are inlined to have good performance. Using `#[inline]`, however, shouldn't hamper that! The compiler will recognize the `#[inline]` attribute and make sure that each of these functions is *candidate* to being inlined into any and all downstream codegen units. (aka if we were missing `#[inline]` then LLVM wouldn't even know the definition to inline most of the time). After that, though, we're relying on LLVM to naturally inline these functions as opposed to forcing it to do so. Typically, however, these intrinsics are one-liners and are trivially inlineable, so I'd imagine that LLVM will go ahead and inline everything all over the place. All in all this change is brought about by rust-lang#253 which noticed various codegen errors. I originally thought it was due to ABI issues but turned out to be wrong! (although that was also a bug which has since been resolved). In any case after this change I was able to get the example in rust-lang#253 to execute in both release and debug mode. Closes rust-lang#253 [avx-error]: https://play.rust-lang.org/?gist=50cb08f1e2242e22109a6d69318bd112&version=nightly * Add inline(always) on eflags intrinsics Their ABI actually relies on it! * Leave #[inline(always)] on portable types They're causing test failures on ARM, let's investigate later.
1 parent cd74516 commit e38d5ac

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+1046
-1046
lines changed

coresimd/src/aarch64/neon.rs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,31 +8,31 @@ use simd_llvm::simd_add;
88
use v128::f64x2;
99

1010
/// Vector add.
11-
#[inline(always)]
11+
#[inline]
1212
#[target_feature(enable = "neon")]
1313
#[cfg_attr(test, assert_instr(fadd))]
1414
pub unsafe fn vadd_f64(a: f64, b: f64) -> f64 {
1515
a + b
1616
}
1717

1818
/// Vector add.
19-
#[inline(always)]
19+
#[inline]
2020
#[target_feature(enable = "neon")]
2121
#[cfg_attr(test, assert_instr(fadd))]
2222
pub unsafe fn vaddq_f64(a: f64x2, b: f64x2) -> f64x2 {
2323
simd_add(a, b)
2424
}
2525

2626
/// Vector add.
27-
#[inline(always)]
27+
#[inline]
2828
#[target_feature(enable = "neon")]
2929
#[cfg_attr(test, assert_instr(add))]
3030
pub unsafe fn vaddd_s64(a: i64, b: i64) -> i64 {
3131
a + b
3232
}
3333

3434
/// Vector add.
35-
#[inline(always)]
35+
#[inline]
3636
#[target_feature(enable = "neon")]
3737
#[cfg_attr(test, assert_instr(add))]
3838
pub unsafe fn vaddd_u64(a: u64, b: u64) -> u64 {

coresimd/src/aarch64/v8.rs

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,14 @@
99
use stdsimd_test::assert_instr;
1010

1111
/// Reverse the order of the bytes.
12-
#[inline(always)]
12+
#[inline]
1313
#[cfg_attr(test, assert_instr(rev))]
1414
pub unsafe fn _rev_u64(x: u64) -> u64 {
1515
x.swap_bytes() as u64
1616
}
1717

1818
/// Count Leading Zeros.
19-
#[inline(always)]
19+
#[inline]
2020
#[cfg_attr(test, assert_instr(clz))]
2121
pub unsafe fn _clz_u64(x: u64) -> u64 {
2222
x.leading_zeros() as u64
@@ -29,7 +29,7 @@ extern "C" {
2929
}
3030

3131
/// Reverse the bit order.
32-
#[inline(always)]
32+
#[inline]
3333
#[cfg_attr(test, assert_instr(rbit))]
3434
pub unsafe fn _rbit_u64(x: u64) -> u64 {
3535
rbit_u64(x as i64) as u64
@@ -39,7 +39,7 @@ pub unsafe fn _rbit_u64(x: u64) -> u64 {
3939
///
4040
/// When all bits of the operand are set it returns the size of the operand in
4141
/// bits.
42-
#[inline(always)]
42+
#[inline]
4343
#[cfg_attr(test, assert_instr(cls))]
4444
pub unsafe fn _cls_u32(x: u32) -> u32 {
4545
u32::leading_zeros((((((x as i32) >> 31) as u32) ^ x) << 1) | 1) as u32
@@ -49,7 +49,7 @@ pub unsafe fn _cls_u32(x: u32) -> u32 {
4949
///
5050
/// When all bits of the operand are set it returns the size of the operand in
5151
/// bits.
52-
#[inline(always)]
52+
#[inline]
5353
#[cfg_attr(test, assert_instr(cls))]
5454
pub unsafe fn _cls_u64(x: u64) -> u64 {
5555
u64::leading_zeros((((((x as i64) >> 63) as u64) ^ x) << 1) | 1) as u64

coresimd/src/arm/neon.rs

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -9,135 +9,135 @@ use v64::{f32x2, i16x4, i32x2, i8x8, u16x4, u32x2, u8x8};
99
use v128::{f32x4, i16x8, i32x4, i64x2, i8x16, u16x8, u32x4, u64x2, u8x16};
1010

1111
/// Vector add.
12-
#[inline(always)]
12+
#[inline]
1313
#[target_feature(enable = "neon")]
1414
#[cfg_attr(test, assert_instr(add))]
1515
pub unsafe fn vadd_s8(a: i8x8, b: i8x8) -> i8x8 {
1616
simd_add(a, b)
1717
}
1818

1919
/// Vector add.
20-
#[inline(always)]
20+
#[inline]
2121
#[target_feature(enable = "neon")]
2222
#[cfg_attr(test, assert_instr(add))]
2323
pub unsafe fn vaddq_s8(a: i8x16, b: i8x16) -> i8x16 {
2424
simd_add(a, b)
2525
}
2626

2727
/// Vector add.
28-
#[inline(always)]
28+
#[inline]
2929
#[target_feature(enable = "neon")]
3030
#[cfg_attr(test, assert_instr(add))]
3131
pub unsafe fn vadd_s16(a: i16x4, b: i16x4) -> i16x4 {
3232
simd_add(a, b)
3333
}
3434

3535
/// Vector add.
36-
#[inline(always)]
36+
#[inline]
3737
#[target_feature(enable = "neon")]
3838
#[cfg_attr(test, assert_instr(add))]
3939
pub unsafe fn vaddq_s16(a: i16x8, b: i16x8) -> i16x8 {
4040
simd_add(a, b)
4141
}
4242

4343
/// Vector add.
44-
#[inline(always)]
44+
#[inline]
4545
#[target_feature(enable = "neon")]
4646
#[cfg_attr(test, assert_instr(add))]
4747
pub unsafe fn vadd_s32(a: i32x2, b: i32x2) -> i32x2 {
4848
simd_add(a, b)
4949
}
5050

5151
/// Vector add.
52-
#[inline(always)]
52+
#[inline]
5353
#[target_feature(enable = "neon")]
5454
#[cfg_attr(test, assert_instr(add))]
5555
pub unsafe fn vaddq_s32(a: i32x4, b: i32x4) -> i32x4 {
5656
simd_add(a, b)
5757
}
5858

5959
/// Vector add.
60-
#[inline(always)]
60+
#[inline]
6161
#[target_feature(enable = "neon")]
6262
#[cfg_attr(test, assert_instr(add))]
6363
pub unsafe fn vaddq_s64(a: i64x2, b: i64x2) -> i64x2 {
6464
simd_add(a, b)
6565
}
6666

6767
/// Vector add.
68-
#[inline(always)]
68+
#[inline]
6969
#[target_feature(enable = "neon")]
7070
#[cfg_attr(test, assert_instr(add))]
7171
pub unsafe fn vadd_u8(a: u8x8, b: u8x8) -> u8x8 {
7272
simd_add(a, b)
7373
}
7474

7575
/// Vector add.
76-
#[inline(always)]
76+
#[inline]
7777
#[target_feature(enable = "neon")]
7878
#[cfg_attr(test, assert_instr(add))]
7979
pub unsafe fn vaddq_u8(a: u8x16, b: u8x16) -> u8x16 {
8080
simd_add(a, b)
8181
}
8282

8383
/// Vector add.
84-
#[inline(always)]
84+
#[inline]
8585
#[target_feature(enable = "neon")]
8686
#[cfg_attr(test, assert_instr(add))]
8787
pub unsafe fn vadd_u16(a: u16x4, b: u16x4) -> u16x4 {
8888
simd_add(a, b)
8989
}
9090

9191
/// Vector add.
92-
#[inline(always)]
92+
#[inline]
9393
#[target_feature(enable = "neon")]
9494
#[cfg_attr(test, assert_instr(add))]
9595
pub unsafe fn vaddq_u16(a: u16x8, b: u16x8) -> u16x8 {
9696
simd_add(a, b)
9797
}
9898

9999
/// Vector add.
100-
#[inline(always)]
100+
#[inline]
101101
#[target_feature(enable = "neon")]
102102
#[cfg_attr(test, assert_instr(add))]
103103
pub unsafe fn vadd_u32(a: u32x2, b: u32x2) -> u32x2 {
104104
simd_add(a, b)
105105
}
106106

107107
/// Vector add.
108-
#[inline(always)]
108+
#[inline]
109109
#[target_feature(enable = "neon")]
110110
#[cfg_attr(test, assert_instr(add))]
111111
pub unsafe fn vaddq_u32(a: u32x4, b: u32x4) -> u32x4 {
112112
simd_add(a, b)
113113
}
114114

115115
/// Vector add.
116-
#[inline(always)]
116+
#[inline]
117117
#[target_feature(enable = "neon")]
118118
#[cfg_attr(test, assert_instr(add))]
119119
pub unsafe fn vaddq_u64(a: u64x2, b: u64x2) -> u64x2 {
120120
simd_add(a, b)
121121
}
122122

123123
/// Vector add.
124-
#[inline(always)]
124+
#[inline]
125125
#[target_feature(enable = "neon")]
126126
#[cfg_attr(test, assert_instr(fadd))]
127127
pub unsafe fn vadd_f32(a: f32x2, b: f32x2) -> f32x2 {
128128
simd_add(a, b)
129129
}
130130

131131
/// Vector add.
132-
#[inline(always)]
132+
#[inline]
133133
#[target_feature(enable = "neon")]
134134
#[cfg_attr(test, assert_instr(fadd))]
135135
pub unsafe fn vaddq_f32(a: f32x4, b: f32x4) -> f32x4 {
136136
simd_add(a, b)
137137
}
138138

139139
/// Vector long add.
140-
#[inline(always)]
140+
#[inline]
141141
#[target_feature(enable = "neon")]
142142
#[cfg_attr(test, assert_instr(saddl))]
143143
pub unsafe fn vaddl_s8(a: i8x8, b: i8x8) -> i16x8 {
@@ -147,7 +147,7 @@ pub unsafe fn vaddl_s8(a: i8x8, b: i8x8) -> i16x8 {
147147
}
148148

149149
/// Vector long add.
150-
#[inline(always)]
150+
#[inline]
151151
#[target_feature(enable = "neon")]
152152
#[cfg_attr(test, assert_instr(saddl))]
153153
pub unsafe fn vaddl_s16(a: i16x4, b: i16x4) -> i32x4 {
@@ -157,7 +157,7 @@ pub unsafe fn vaddl_s16(a: i16x4, b: i16x4) -> i32x4 {
157157
}
158158

159159
/// Vector long add.
160-
#[inline(always)]
160+
#[inline]
161161
#[target_feature(enable = "neon")]
162162
#[cfg_attr(test, assert_instr(saddl))]
163163
pub unsafe fn vaddl_s32(a: i32x2, b: i32x2) -> i64x2 {
@@ -167,7 +167,7 @@ pub unsafe fn vaddl_s32(a: i32x2, b: i32x2) -> i64x2 {
167167
}
168168

169169
/// Vector long add.
170-
#[inline(always)]
170+
#[inline]
171171
#[target_feature(enable = "neon")]
172172
#[cfg_attr(test, assert_instr(uaddl))]
173173
pub unsafe fn vaddl_u8(a: u8x8, b: u8x8) -> u16x8 {
@@ -177,7 +177,7 @@ pub unsafe fn vaddl_u8(a: u8x8, b: u8x8) -> u16x8 {
177177
}
178178

179179
/// Vector long add.
180-
#[inline(always)]
180+
#[inline]
181181
#[target_feature(enable = "neon")]
182182
#[cfg_attr(test, assert_instr(uaddl))]
183183
pub unsafe fn vaddl_u16(a: u16x4, b: u16x4) -> u32x4 {
@@ -187,7 +187,7 @@ pub unsafe fn vaddl_u16(a: u16x4, b: u16x4) -> u32x4 {
187187
}
188188

189189
/// Vector long add.
190-
#[inline(always)]
190+
#[inline]
191191
#[target_feature(enable = "neon")]
192192
#[cfg_attr(test, assert_instr(uaddl))]
193193
pub unsafe fn vaddl_u32(a: u32x2, b: u32x2) -> u64x2 {
@@ -205,7 +205,7 @@ extern "C" {
205205
}
206206

207207
/// Reciprocal square-root estimate.
208-
#[inline(always)]
208+
#[inline]
209209
#[target_feature(enable = "neon")]
210210
#[cfg_attr(test, assert_instr(frsqrte))]
211211
pub unsafe fn vrsqrte_f32(a: f32x2) -> f32x2 {

coresimd/src/arm/v6.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,14 @@
1010
use stdsimd_test::assert_instr;
1111

1212
/// Reverse the order of the bytes.
13-
#[inline(always)]
13+
#[inline]
1414
#[cfg_attr(test, assert_instr(rev))]
1515
pub unsafe fn _rev_u16(x: u16) -> u16 {
1616
x.swap_bytes() as u16
1717
}
1818

1919
/// Reverse the order of the bytes.
20-
#[inline(always)]
20+
#[inline]
2121
#[cfg_attr(test, assert_instr(rev))]
2222
pub unsafe fn _rev_u32(x: u32) -> u32 {
2323
x.swap_bytes() as u32

coresimd/src/arm/v7.rs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,28 +13,28 @@ pub use super::v6::*;
1313
use stdsimd_test::assert_instr;
1414

1515
/// Count Leading Zeros.
16-
#[inline(always)]
16+
#[inline]
1717
#[cfg_attr(test, assert_instr(clz))]
1818
pub unsafe fn _clz_u8(x: u8) -> u8 {
1919
x.leading_zeros() as u8
2020
}
2121

2222
/// Count Leading Zeros.
23-
#[inline(always)]
23+
#[inline]
2424
#[cfg_attr(test, assert_instr(clz))]
2525
pub unsafe fn _clz_u16(x: u16) -> u16 {
2626
x.leading_zeros() as u16
2727
}
2828

2929
/// Count Leading Zeros.
30-
#[inline(always)]
30+
#[inline]
3131
#[cfg_attr(test, assert_instr(clz))]
3232
pub unsafe fn _clz_u32(x: u32) -> u32 {
3333
x.leading_zeros() as u32
3434
}
3535

3636
/// Reverse the bit order.
37-
#[inline(always)]
37+
#[inline]
3838
#[cfg_attr(test, assert_instr(rbit))]
3939
#[cfg_attr(target_arch = "arm", target_feature(enable = "v7"))]
4040
#[cfg(dont_compile_me)] // FIXME need to add `v7` upstream in rustc

0 commit comments

Comments
 (0)