Skip to content

Remove requirement on neon feature for arm #893

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 7, 2020

Conversation

calebzulawski
Copy link
Member

As I noted in #889, the neon target feature shouldn't be required to use NEON on ARM (it should be possible to detect it at runtime).

@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Amanieu (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@Lokathor
Copy link
Contributor

Lokathor commented Sep 6, 2020

Ya know that's a good point, we don't gate avx intrinsics on avx being enabled.

(but I don't have a deep knowledge, so there may be an explanation)

@calebzulawski
Copy link
Member Author

The removed comment indicates a codegen issue, which I have no idea how to validate (I couldn't quite figure out what the tests do). I just removed the neon target feature from the CI tests, though.

@calebzulawski
Copy link
Member Author

calebzulawski commented Sep 6, 2020

It appears the vtbl/vtbx intrinsics are generating a few more instructions than allowed:
https://github.com/rust-lang/stdarch/pull/893/checks?check_run_id=1076991757#step:13:1456
With the neon target feature this didn't occur. Is this a rust issue or an LLVM issue?

@calebzulawski
Copy link
Member Author

I've been playing around with this on godbolt, comparing with clang's output.

C++ (optimized): https://godbolt.org/z/babdbh
C++ (unoptimized): https://godbolt.org/z/6KfoTb
Rust: https://rust.godbolt.org/z/dYM93Y

I'm not sure, but my guess is that rustc is generating slightly different IR and LLVM is failing to optimize it. Both produce sequences of getelementptr inbounds and load, so the optimizer can clearly handle this. However, clang simply rewrites the inputs to the stack with store, but rustc skips that and does a bitcast. Is is possible the cast is opaque to the optimizer?

@Amanieu
Copy link
Member

Amanieu commented Sep 6, 2020

Rust is passing the vector by reference instead of by value because the NEON target feature is not enabled globally.

Specifically this paragraph in the RFC:

The Rust ABI will currently be implemented such that all related packed-SIMD types are passed via memory instead of by-value. This means that regardless of the target features enabled for a function everything should agree on how packed SIMD arguments are passed across boundaries and whatnot.

@calebzulawski
Copy link
Member Author

calebzulawski commented Sep 6, 2020

Thanks, that completely clears that up! Testing a somewhat contrived example (https://rust.godbolt.org/z/f4acxb)
all of those moves do seem purely related to passing by reference and don't appear in actual usage.

My uneducated guess is that this is appearing here because the int8x8x4_t type requires a few more instructions to access via reference because it's so big, so intrinsics using it (vtbl/vtbx) exceed the 22 instruction threshold. I'm guessing aarch64 didn't have this problem because the larger registers allow fewer instructions for dereferencing.

If this is all expected, is it appropriate to expand the 22 instruction limit (or some other solution)?

@Amanieu
Copy link
Member

Amanieu commented Sep 6, 2020

Yes, enabling the neon feature globally will change then ABI and cause vectors to be passed by value.

Rather than increasing the instruction limit I think you can just restore -C target-feature=+neon in the CI flags. This should allow as_ptr and store to be inlined which avoids 2 function calls.

@calebzulawski
Copy link
Member Author

Ok, that seems reasonable, I removed that change.

@Amanieu Amanieu merged commit 8cb6688 into rust-lang:master Sep 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants