-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Add more SIMD platform-intrinsics #117953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1492,6 +1492,198 @@ fn generic_simd_intrinsic<'ll, 'tcx>( | |
return Ok(v); | ||
} | ||
|
||
if name == sym::simd_masked_load { | ||
// simd_masked_load(mask: <N x i{M}>, pointer: *_ T, values: <N x T>) -> <N x T> | ||
// * N: number of elements in the input vectors | ||
// * T: type of the element to load | ||
// * M: any integer width is supported, will be truncated to i1 | ||
// Loads contiguous elements from memory behind `pointer`, but only for | ||
// those lanes whose `mask` bit is enabled. | ||
// The memory addresses corresponding to the “off” lanes are not accessed. | ||
|
||
// The element type of the "mask" argument must be a signed integer type of any width | ||
let mask_ty = in_ty; | ||
let (mask_len, mask_elem) = (in_len, in_elem); | ||
|
||
// The second argument must be a pointer matching the element type | ||
let pointer_ty = arg_tys[1]; | ||
|
||
// The last argument is a passthrough vector providing values for disabled lanes | ||
let values_ty = arg_tys[2]; | ||
let (values_len, values_elem) = require_simd!(values_ty, SimdThird); | ||
|
||
require_simd!(ret_ty, SimdReturn); | ||
|
||
// Of the same length: | ||
require!( | ||
values_len == mask_len, | ||
InvalidMonomorphization::ThirdArgumentLength { | ||
span, | ||
name, | ||
in_len: mask_len, | ||
in_ty: mask_ty, | ||
arg_ty: values_ty, | ||
out_len: values_len | ||
} | ||
); | ||
|
||
// The return type must match the last argument type | ||
require!( | ||
ret_ty == values_ty, | ||
InvalidMonomorphization::ExpectedReturnType { span, name, in_ty: values_ty, ret_ty } | ||
); | ||
|
||
require!( | ||
matches!( | ||
pointer_ty.kind(), | ||
ty::RawPtr(p) if p.ty == values_elem && p.ty.kind() == values_elem.kind() | ||
), | ||
InvalidMonomorphization::ExpectedElementType { | ||
span, | ||
name, | ||
expected_element: values_elem, | ||
second_arg: pointer_ty, | ||
in_elem: values_elem, | ||
in_ty: values_ty, | ||
mutability: ExpectedPointerMutability::Not, | ||
} | ||
); | ||
|
||
require!( | ||
matches!(mask_elem.kind(), ty::Int(_)), | ||
InvalidMonomorphization::ThirdArgElementType { | ||
span, | ||
name, | ||
expected_element: values_elem, | ||
third_arg: mask_ty, | ||
} | ||
); | ||
|
||
// Alignment of T, must be a constant integer value: | ||
let alignment_ty = bx.type_i32(); | ||
let alignment = bx.const_i32(bx.align_of(values_ty).bytes() as i32); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this should be Am I understanding this correctly or does it not make a difference @workingjubilee? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having checked the Arm ARM, RISCV V spec, and Intel SDM, we should continue to require element alignment at least. We can expect any masked-load-supporting implementation to support masked loads and stores on element boundaries. However, they may reject accesses on byte boundaries (except where the element is u8, of course), or they may just implement the load/store inefficiently if unaligned, which is honestly about as bad. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are definitely reasons to do unaligned loads (deserialization, packet framing, etc) that benefit from vectorization, so I think we should at least make writing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same, element alignment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can adjust them to take a const parameter for alignment, if we want, I guess? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can only guess the intent, but I am fairly sure about what If it is intended to allow entirely unaligned pointers, it should be implemented as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Opened rust-lang/portable-simd#382 for the load question. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will open a PR to fix the alignment passed down to LLVM IR. The optimizer converts the masked load to an umasked, aligned load because the alignment is too high There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Correct.
I certainly wasn't intending on being freewheeling here, or I would not have asked a zillion small questions myself. :^) The problem is slightly of the "remembering all the questions I should be asking" nature, especially when there has already been multiple rounds of review. I guess the main list of questions to keep in mind is:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Note that we do not ever treat uninit memory specially for memory operations in Rust. The UB on uninit derives entirely from the fact that uninit memory violates the validity invariant of most types. So the question that should be asked is: is this an untyped operation, working on raw abstract machine bytes, or is it a typed operation, and at which type? |
||
|
||
// Truncate the mask vector to a vector of i1s: | ||
let (mask, mask_ty) = { | ||
let i1 = bx.type_i1(); | ||
let i1xn = bx.type_vector(i1, mask_len); | ||
(bx.trunc(args[0].immediate(), i1xn), i1xn) | ||
}; | ||
|
||
let llvm_pointer = bx.type_ptr(); | ||
|
||
// Type of the vector of elements: | ||
let llvm_elem_vec_ty = llvm_vector_ty(bx, values_elem, values_len); | ||
let llvm_elem_vec_str = llvm_vector_str(bx, values_elem, values_len); | ||
|
||
let llvm_intrinsic = format!("llvm.masked.load.{llvm_elem_vec_str}.p0"); | ||
let fn_ty = bx | ||
.type_func(&[llvm_pointer, alignment_ty, mask_ty, llvm_elem_vec_ty], llvm_elem_vec_ty); | ||
let f = bx.declare_cfn(&llvm_intrinsic, llvm::UnnamedAddr::No, fn_ty); | ||
workingjubilee marked this conversation as resolved.
Show resolved
Hide resolved
|
||
let v = bx.call( | ||
fn_ty, | ||
None, | ||
None, | ||
f, | ||
&[args[1].immediate(), alignment, mask, args[2].immediate()], | ||
None, | ||
); | ||
return Ok(v); | ||
} | ||
|
||
if name == sym::simd_masked_store { | ||
// simd_masked_store(mask: <N x i{M}>, pointer: *mut T, values: <N x T>) -> () | ||
// * N: number of elements in the input vectors | ||
// * T: type of the element to load | ||
// * M: any integer width is supported, will be truncated to i1 | ||
// Stores contiguous elements to memory behind `pointer`, but only for | ||
// those lanes whose `mask` bit is enabled. | ||
// The memory addresses corresponding to the “off” lanes are not accessed. | ||
|
||
// The element type of the "mask" argument must be a signed integer type of any width | ||
let mask_ty = in_ty; | ||
let (mask_len, mask_elem) = (in_len, in_elem); | ||
|
||
// The second argument must be a pointer matching the element type | ||
let pointer_ty = arg_tys[1]; | ||
|
||
// The last argument specifies the values to store to memory | ||
let values_ty = arg_tys[2]; | ||
let (values_len, values_elem) = require_simd!(values_ty, SimdThird); | ||
|
||
// Of the same length: | ||
require!( | ||
values_len == mask_len, | ||
InvalidMonomorphization::ThirdArgumentLength { | ||
span, | ||
name, | ||
in_len: mask_len, | ||
in_ty: mask_ty, | ||
arg_ty: values_ty, | ||
out_len: values_len | ||
} | ||
); | ||
|
||
// The second argument must be a mutable pointer type matching the element type | ||
require!( | ||
matches!( | ||
pointer_ty.kind(), | ||
ty::RawPtr(p) if p.ty == values_elem && p.ty.kind() == values_elem.kind() && p.mutbl.is_mut() | ||
), | ||
InvalidMonomorphization::ExpectedElementType { | ||
span, | ||
name, | ||
expected_element: values_elem, | ||
second_arg: pointer_ty, | ||
in_elem: values_elem, | ||
in_ty: values_ty, | ||
mutability: ExpectedPointerMutability::Mut, | ||
} | ||
); | ||
|
||
require!( | ||
matches!(mask_elem.kind(), ty::Int(_)), | ||
InvalidMonomorphization::ThirdArgElementType { | ||
span, | ||
name, | ||
expected_element: values_elem, | ||
third_arg: mask_ty, | ||
} | ||
); | ||
|
||
// Alignment of T, must be a constant integer value: | ||
let alignment_ty = bx.type_i32(); | ||
let alignment = bx.const_i32(bx.align_of(values_elem).bytes() as i32); | ||
|
||
// Truncate the mask vector to a vector of i1s: | ||
let (mask, mask_ty) = { | ||
let i1 = bx.type_i1(); | ||
let i1xn = bx.type_vector(i1, in_len); | ||
(bx.trunc(args[0].immediate(), i1xn), i1xn) | ||
}; | ||
|
||
let ret_t = bx.type_void(); | ||
|
||
let llvm_pointer = bx.type_ptr(); | ||
|
||
// Type of the vector of elements: | ||
let llvm_elem_vec_ty = llvm_vector_ty(bx, values_elem, values_len); | ||
let llvm_elem_vec_str = llvm_vector_str(bx, values_elem, values_len); | ||
|
||
let llvm_intrinsic = format!("llvm.masked.store.{llvm_elem_vec_str}.p0"); | ||
let fn_ty = bx.type_func(&[llvm_elem_vec_ty, llvm_pointer, alignment_ty, mask_ty], ret_t); | ||
let f = bx.declare_cfn(&llvm_intrinsic, llvm::UnnamedAddr::No, fn_ty); | ||
let v = bx.call( | ||
fn_ty, | ||
None, | ||
None, | ||
f, | ||
&[args[2].immediate(), args[1].immediate(), alignment, mask], | ||
None, | ||
); | ||
return Ok(v); | ||
} | ||
|
||
if name == sym::simd_scatter { | ||
// simd_scatter(values: <N x T>, pointers: <N x *mut T>, | ||
// mask: <N x i{M}>) -> () | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
// compile-flags: -C no-prepopulate-passes | ||
|
||
#![crate_type = "lib"] | ||
|
||
#![feature(repr_simd, platform_intrinsics)] | ||
#![allow(non_camel_case_types)] | ||
|
||
#[repr(simd)] | ||
#[derive(Copy, Clone, PartialEq, Debug)] | ||
pub struct Vec2<T>(pub T, pub T); | ||
|
||
#[repr(simd)] | ||
#[derive(Copy, Clone, PartialEq, Debug)] | ||
pub struct Vec4<T>(pub T, pub T, pub T, pub T); | ||
|
||
extern "platform-intrinsic" { | ||
fn simd_masked_load<M, P, T>(mask: M, pointer: P, values: T) -> T; | ||
} | ||
|
||
// CHECK-LABEL: @load_f32x2 | ||
#[no_mangle] | ||
pub unsafe fn load_f32x2(mask: Vec2<i32>, pointer: *const f32, | ||
values: Vec2<f32>) -> Vec2<f32> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the meaning of "values" here? It seems strange that a load is told the values to load...? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are the fallback/passthrough values used for lanes that are masked off, so have the corresponding bits in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, the name |
||
// CHECK: call <2 x float> @llvm.masked.load.v2f32.p0(ptr {{.*}}, i32 {{.*}}, <2 x i1> {{.*}}, <2 x float> {{.*}}) | ||
simd_masked_load(mask, pointer, values) | ||
} | ||
|
||
// CHECK-LABEL: @load_pf32x4 | ||
#[no_mangle] | ||
pub unsafe fn load_pf32x4(mask: Vec4<i32>, pointer: *const *const f32, | ||
values: Vec4<*const f32>) -> Vec4<*const f32> { | ||
// CHECK: call <4 x ptr> @llvm.masked.load.v4p0.p0(ptr {{.*}}, i32 {{.*}}, <4 x i1> {{.*}}, <4 x ptr> {{.*}}) | ||
simd_masked_load(mask, pointer, values) | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
// compile-flags: -C no-prepopulate-passes | ||
|
||
#![crate_type = "lib"] | ||
|
||
#![feature(repr_simd, platform_intrinsics)] | ||
#![allow(non_camel_case_types)] | ||
|
||
#[repr(simd)] | ||
#[derive(Copy, Clone, PartialEq, Debug)] | ||
pub struct Vec2<T>(pub T, pub T); | ||
|
||
#[repr(simd)] | ||
#[derive(Copy, Clone, PartialEq, Debug)] | ||
pub struct Vec4<T>(pub T, pub T, pub T, pub T); | ||
|
||
extern "platform-intrinsic" { | ||
fn simd_masked_store<M, P, T>(mask: M, pointer: P, values: T) -> (); | ||
} | ||
|
||
// CHECK-LABEL: @store_f32x2 | ||
#[no_mangle] | ||
pub unsafe fn store_f32x2(mask: Vec2<i32>, pointer: *mut f32, values: Vec2<f32>) { | ||
// CHECK: call void @llvm.masked.store.v2f32.p0(<2 x float> {{.*}}, ptr {{.*}}, i32 {{.*}}, <2 x i1> {{.*}}) | ||
simd_masked_store(mask, pointer, values) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What are the alignment requirements on the pointer for the store to be valid? (And same question for the load.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The pointer should be aligned to the individual element, not the whole vector. So when loading There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if the mask is all-0, i.e., no load/store actually happens. Is the pointer still required to be aligned? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Even with alignment checking, masked elements never generate an access for the load, thus no hardware implementation requires it. If you have a strong reasoning for arguing otherwise, we could make it UB regardless? Otherwise: no, it's a non-event, like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fwiw x86-64 for these doesn't ever generate an alignment check exception even if you turn alignment checking on, while the others are explicit that the logic is that predication prevents the access, therefore the alignment check. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes they do. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe everything that's been raised so far has been addressed. Thanks @RalfJung and apologies if my answers earlier were unhelpful. I wasn't aware that the alignment requirement is only relaxed through the from_array/to_array path, relying on the optimizer to elide those later. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @farnoy can you link to the PR(s) that address this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Mostly I've been asking for documentation. #118853 addresses that, please double-check there that the docs for these new intrinsics match what you implemented now. :) |
||
} | ||
|
||
// CHECK-LABEL: @store_pf32x4 | ||
#[no_mangle] | ||
pub unsafe fn store_pf32x4(mask: Vec4<i32>, pointer: *mut *const f32, values: Vec4<*const f32>) { | ||
// CHECK: call void @llvm.masked.store.v4p0.p0(<4 x ptr> {{.*}}, ptr {{.*}}, i32 {{.*}}, <4 x i1> {{.*}}) | ||
simd_masked_store(mask, pointer, values) | ||
} |
Uh oh!
There was an error while loading. Please reload this page.