Skip to content

Cherry-pick different fix for AArch64 truncating FP stores #128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

nikic
Copy link

@nikic nikic commented Jan 25, 2022

Unfortunately, the fix I picked up in #127 seems to have some additional dependency that is not present on the 13.x branch. It works for a truncating store to i32 and i16, but not i8. There's some issue related to 8-bit FPR subreg copies.

This reverts that commit, and instead applies the workaround fix that was committed before the "proper" fix landed. This one works for both i32 and i8 truncating stores, at least based on a reduced test case.

nikic and others added 2 commits January 25, 2022 11:32
…g stores.

Truncating stores with GPR bank sources shouldn't be mutated into using FPR bank
sources, since those aren't supported.

Ideally this should be a selection failure in the tablegen patterns, but for now
avoid generating them.
@cuviper
Copy link
Member

cuviper commented Jan 25, 2022

For the record, this new commit is upstream 67bf3ac, which was reverted in 2ed8053 in favor of the fix we tried in #127.

@cuviper cuviper merged commit 221a195 into rust-lang:rustc/13.0-2021-09-30 Jan 25, 2022
vext01 pushed a commit to vext01/llvm-project that referenced this pull request Apr 8, 2024
nikic pushed a commit to nikic/llvm-project that referenced this pull request Sep 14, 2024
This patch does 3 things:
1. Add support for optimizing the address mode of HVX load/store
instructions
2. Reduce the value of Add instruction immediates by replacing with the
difference from other Addi instructions that share common base:

For Example, If we have the below sequence of instructions: r1 =
add(r2,# 1024) ... r3 = add(r2,# 1152) ... r4 = add(r2,# 1280)

Where the register r2 has the same reaching definition, They get
modified to the below sequence:

       r1 = add(r2,# 1024)
            ...
       r3 = add(r1,# 128)
            ...
       r4 = add(r1,# 256)
3. Fixes a bug pass where the addi instructions were modified based on a
predicated register definition, leading to incorrect output.

Eg:
         INST-1: if (p0) r2 = add(r13,# 128)
         INST-2: r1 = add(r2,# 1024)
         INST-3: r3 = add(r2,# 1152)
         INST-4: r5 = add(r2,# 1280)

In the above case, since r2's definition is predicated, we do not want
to modify the uses of r2 in INST-3/INST-4 with add(r1,rust-lang#128/256)

4.Fixes a corner case

It looks like we never check whether the offset register is actually
live (not clobbered) at optimization site. Add the check whether it is
live at MBB entrance. The rest should have already been verified.

5. Fixes a bad codegen

For whatever reason we do transformation without checking if the value
in register actually reaches the user. This is second identical fix for
this pass.

   Co-authored-by: Anirudh Sundar <[email protected]>
   Co-authored-by: Sergei Larin <[email protected]>
nikic pushed a commit to nikic/llvm-project that referenced this pull request Feb 26, 2025
…-lang#128… (llvm#128662)

…471)"

Reland llvm#128471

The Passes library was not linked in earlier.
nikic pushed a commit to nikic/llvm-project that referenced this pull request Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants