Skip to content

mm256_srli,slli_si256; mm256_bsrli,bslli_epi128 to const generics #1067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 10, 2021
Merged

mm256_srli,slli_si256; mm256_bsrli,bslli_epi128 to const generics #1067

merged 4 commits into from
Mar 10, 2021

Conversation

minybot
Copy link
Contributor

@minybot minybot commented Mar 9, 2021

f16c: _mm256_cvtps_ph; mm_cvtps_ph

@rust-highfive
Copy link

r? @Amanieu

(rust-highfive has picked a reviewer for you, use r? to override)

@minybot
Copy link
Contributor Author

minybot commented Mar 9, 2021

At f16c.rs
_mm256_cvtps_ph(a: __m256, imm_rounding: i32). The current imm_rounding is set to 0-7.
I checked the Clang, it accepts 0-255.
Any suggestion?

@Amanieu
Copy link
Member

Amanieu commented Mar 9, 2021

The instruction definition here says that bits 3 to 7 are ignored by the CPU. I think to be safe we should only allow imm3, we can always relax it later if necessary.

@minybot
Copy link
Contributor Author

minybot commented Mar 9, 2021

The instruction definition here says that bits 3 to 7 are ignored by the CPU. I think to be safe we should only allow imm3, we can always relax it later if necessary.
Ok. I will finish f16c.

@lqd
Copy link
Member

lqd commented Mar 9, 2021

I'm still wondering about the fact that we "* 8" the immediates that are supposed to be in bytes and <= 16 for the shifts

@Amanieu
Copy link
Member

Amanieu commented Mar 9, 2021

It might be better to switch the implementation to use a shuffle like clang does and like we already do for _mm_slli_si128.

@minybot
Copy link
Contributor Author

minybot commented Mar 9, 2021

It might be better to switch the implementation to use a shuffle like clang does and like we already do for _mm_slli_si128.
Ok. I will modify it to similar to _mm_slli_si128.

@minybot
Copy link
Contributor Author

minybot commented Mar 9, 2021

It might be better to switch the implementation to use a shuffle like clang does and like we already do for _mm_slli_si128.

It seems mm256_slli_si256 = mm256_bslli_epi128?

@Amanieu
Copy link
Member

Amanieu commented Mar 9, 2021

Yes, see #1012.

@@ -4,7 +4,7 @@

use crate::{
core_arch::{simd::*, x86::*},
hint::unreachable_unchecked,
// hint::unreachable_unchecked,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted commented code.

}
transmute(constify_imm8!(imm8 * 8, call))
let r = vpslldq(a, IMM8 * 8);
transmute(r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just call _mm256_bslli_epi128 here.

}
transmute(constify_imm8!(imm8 * 8, call))
let r = vpsrldq(a, IMM8 * 8);
transmute(r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just call _mm256_bsrli_epi128 here.

@minybot
Copy link
Contributor Author

minybot commented Mar 9, 2021

Yes, see #1012.

Thanks. I think my bsrli_epi128 and bslli_epi128 having problems. I need to check them first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants