-
Notifications
You must be signed in to change notification settings - Fork 288
mm256_srli,slli_si256; mm256_bsrli,bslli_epi128 to const generics #1067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
r? @Amanieu (rust-highfive has picked a reviewer for you, use r? to override) |
At f16c.rs |
The instruction definition here says that bits 3 to 7 are ignored by the CPU. I think to be safe we should only allow imm3, we can always relax it later if necessary. |
|
I'm still wondering about the fact that we "* 8" the immediates that are supposed to be in bytes and <= 16 for the shifts |
It might be better to switch the implementation to use a shuffle like clang does and like we already do for |
|
It seems mm256_slli_si256 = mm256_bslli_epi128? |
Yes, see #1012. |
@@ -4,7 +4,7 @@ | |||
|
|||
use crate::{ | |||
core_arch::{simd::*, x86::*}, | |||
hint::unreachable_unchecked, | |||
// hint::unreachable_unchecked, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted commented code.
crates/core_arch/src/x86/avx2.rs
Outdated
} | ||
transmute(constify_imm8!(imm8 * 8, call)) | ||
let r = vpslldq(a, IMM8 * 8); | ||
transmute(r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just call _mm256_bslli_epi128
here.
crates/core_arch/src/x86/avx2.rs
Outdated
} | ||
transmute(constify_imm8!(imm8 * 8, call)) | ||
let r = vpsrldq(a, IMM8 * 8); | ||
transmute(r) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just call _mm256_bsrli_epi128
here.
Thanks. I think my bsrli_epi128 and bslli_epi128 having problems. I need to check them first. |
f16c: _mm256_cvtps_ph; mm_cvtps_ph