Skip to content

[X86] X86LowerTileCopy: Find dead register to use to prevent save-reload of tile register #83628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 21, 2024

Conversation

AZero13
Copy link
Contributor

@AZero13 AZero13 commented Mar 1, 2024

No description provided.

@llvmbot
Copy link
Member

llvmbot commented Mar 1, 2024

@llvm/pr-subscribers-backend-x86

Author: AtariDreams (AtariDreams)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/83628.diff

1 Files Affected:

  • (modified) llvm/lib/Target/X86/X86LowerTileCopy.cpp (+51-15)
diff --git a/llvm/lib/Target/X86/X86LowerTileCopy.cpp b/llvm/lib/Target/X86/X86LowerTileCopy.cpp
index e7afc49240e547..3d82360942ddbf 100644
--- a/llvm/lib/Target/X86/X86LowerTileCopy.cpp
+++ b/llvm/lib/Target/X86/X86LowerTileCopy.cpp
@@ -20,6 +20,7 @@
 #include "X86InstrBuilder.h"
 #include "X86InstrInfo.h"
 #include "X86Subtarget.h"
+#include "llvm/CodeGen/LivePhysRegs.h"
 #include "llvm/CodeGen/MachineBasicBlock.h"
 #include "llvm/CodeGen/MachineFrameInfo.h"
 #include "llvm/CodeGen/MachineFunction.h"
@@ -73,6 +74,7 @@ bool X86LowerTileCopy::runOnMachineFunction(MachineFunction &MF) {
   const X86Subtarget &ST = MF.getSubtarget<X86Subtarget>();
   const X86InstrInfo *TII = ST.getInstrInfo();
   bool Changed = false;
+  MachineRegisterInfo &MRI = MF.getRegInfo();
 
   for (MachineBasicBlock &MBB : MF) {
     for (MachineInstr &MI : llvm::make_early_inc_range(MBB)) {
@@ -90,22 +92,54 @@ bool X86LowerTileCopy::runOnMachineFunction(MachineFunction &MF) {
       unsigned Size = TRI->getSpillSize(X86::TILERegClass);
       Align Alignment = TRI->getSpillAlign(X86::TILERegClass);
       int TileSS = MF.getFrameInfo().CreateSpillStackObject(Size, Alignment);
-      // Allocate stack slot for stride register
-      Size = TRI->getSpillSize(X86::GR64RegClass);
-      Alignment = TRI->getSpillAlign(X86::GR64RegClass);
-      int StrideSS = MF.getFrameInfo().CreateSpillStackObject(Size, Alignment);
+      int StrideSS = 0;
 
-      // TODO: Pick a killed regiter to avoid save/reload. There is problem
+      LivePhysRegs UsedRegs(*TRI);
+      UsedRegs.addLiveOuts(MBB);
+
+      const MCPhysReg *CSRegs = TRI->getCalleeSavedRegs(&MF);
+      for (unsigned i = 0; CSRegs[i]; ++i)
+        UsedRegs.addReg(CSRegs[i]);
+
+      auto InstUpToMI = MBB.end();
+      while (InstUpToMI != MI)
+        // The pre-decrement is on purpose here.
+        // We want to have the liveness right before I.
+        UsedRegs.stepBackward(*--InstUpToMI);
+
+      // Look for a temporary register to use.
+      BitVector GR64Regs =
+          TRI->getAllocatableSet(MF, TRI->getRegClass(X86::GR64RegClassID));
+      // Pick a killed regiter to avoid save/reload. There is problem
       // to get live interval in this stage.
       Register GR64Cand = X86::RAX;
 
+      // Find the first available-register that is available
+      bool found = false;
+      for (auto RegT : GR64Regs.set_bits()) {
+        if (UsedRegs.available(MRI, RegT)) {
+          GR64Cand = RegT;
+          break;
+        }
+      }
+
       const DebugLoc &DL = MI.getDebugLoc();
-      // mov %rax (%sp)
-      BuildMI(MBB, MI, DL, TII->get(X86::IMPLICIT_DEF), GR64Cand);
-      addFrameReference(BuildMI(MBB, MI, DL, TII->get(X86::MOV64mr)), StrideSS)
-          .addReg(GR64Cand);
-      // mov 64 %rax
-      BuildMI(MBB, MI, DL, TII->get(X86::MOV64ri), GR64Cand).addImm(64);
+      if (found) {
+        // mov 64 %reg
+        BuildMI(MBB, MI, DL, TII->get(X86::MOV64ri), GR64Cand).addImm(64);
+      } else {
+        // Allocate stack slot for stride register
+        Size = TRI->getSpillSize(X86::GR64RegClass);
+        Alignment = TRI->getSpillAlign(X86::GR64RegClass);
+        StrideSS = MF.getFrameInfo().CreateSpillStackObject(Size, Alignment);
+        // mov %rax (%sp)
+        BuildMI(MBB, MI, DL, TII->get(X86::IMPLICIT_DEF), GR64Cand);
+        addFrameReference(BuildMI(MBB, MI, DL, TII->get(X86::MOV64mr)),
+                          StrideSS)
+            .addReg(GR64Cand);
+        // mov 64 %rax
+        BuildMI(MBB, MI, DL, TII->get(X86::MOV64ri), GR64Cand).addImm(64);
+      }
       // tilestored %tmm, (%sp, %idx)
 #define GET_EGPR_IF_ENABLED(OPC) (ST.hasEGPR() ? OPC##_EVEX : OPC)
       unsigned Opc = GET_EGPR_IF_ENABLED(X86::TILESTORED);
@@ -120,10 +154,12 @@ bool X86LowerTileCopy::runOnMachineFunction(MachineFunction &MF) {
 #undef GET_EGPR_IF_ENABLED
       NewMI = addFrameReference(BuildMI(MBB, MI, DL, TII->get(Opc), DstReg),
                                 TileSS);
-      // restore %rax
-      // mov (%sp) %rax
-      addFrameReference(BuildMI(MBB, MI, DL, TII->get(X86::MOV64rm), GR64Cand),
-                        StrideSS);
+      if (!found) {
+        // restore %rax
+        // mov (%sp) %rax
+        addFrameReference(
+            BuildMI(MBB, MI, DL, TII->get(X86::MOV64rm), GR64Cand), StrideSS);
+      }
       MI.eraseFromParent();
       Changed = true;
     }

Copy link

github-actions bot commented Mar 1, 2024

⚠️ We detected that you are using a GitHub private e-mail address to contribute to the repo.
Please turn off Keep my email addresses private setting in your account.
See LLVM Discourse for more information.

@RKSimon RKSimon changed the title Find dead register to use to prevent save-reload [X86] X86LowerTileCopy - Find dead register to use to prevent save-reload Mar 2, 2024
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test coverage?

@AZero13 AZero13 force-pushed the Target branch 3 times, most recently from 10f95c2 to edef42d Compare March 14, 2024 14:59
@phoebewang
Copy link
Contributor

@AtariDreams Reverse ping~

I met a problem cased by #86973 which was confused by IMPLICIT_DEF in this pass, and checked this patch can fix it. I put a commit to add the missing found = true; and optimize a bit. Please take a look, thanks!

@AZero13 AZero13 changed the title [X86] X86LowerTileCopy - Find dead register to use to prevent save-reload [X86] X86LowerTileCopy: Find dead register to use to prevent save-reload of tile register Apr 20, 2024
@AZero13
Copy link
Contributor Author

AZero13 commented Apr 20, 2024

@AtariDreams Reverse ping~

I met a problem cased by #86973 which was confused by IMPLICIT_DEF in this pass, and checked this patch can fix it. I put a commit to add the missing found = true; and optimize a bit. Please take a look, thanks!

Thank you! I just made a few changes, and I think this is ready now!

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@AZero13
Copy link
Contributor Author

AZero13 commented Apr 21, 2024

LGTM.

I don't have commit perms, just an FYI.

@phoebewang phoebewang merged commit 34acbb3 into llvm:main Apr 21, 2024
3 of 4 checks passed
@AZero13 AZero13 deleted the Target branch April 21, 2024 03:22
nikic added a commit that referenced this pull request Apr 21, 2024
…save-reload of tile register (#83628)"

This reverts commit 34acbb3.

This change causes major compile-time regressions.
@nikic
Copy link
Contributor

nikic commented Apr 21, 2024

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nikic!

I didn't think UsedRegs.stepBackward(MI); is such expensive. I think we can early out with below code. How can we check it before commit?

bool Changed = false;

for (MachineBasicBlock &MBB : MF) {
for (MachineInstr &MI : llvm::make_early_inc_range(MBB)) {
LiveRegUnits UsedRegs(*TRI);
UsedRegs.addLiveOuts(MBB);
Copy link
Contributor

@phoebewang phoebewang Apr 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can early out the loop by checking tile registers, e.g.,

  BitVector TILERegs =
      TRI->getAllocatableSet(MF, TRI->getRegClass(X86::TILERegClassID));
  bool Changed = false;

  for (MachineBasicBlock &MBB : MF) {
    LiveRegUnits UsedRegs(*TRI);
    UsedRegs.addLiveOuts(MBB);
    // There won't be a tile copy if no tile register live out.
    if (!TILERegs.anyCommon(UsedRegs.getBitVector())
      continue;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add if (!ST.hasAMXTILE()) return false at line 75?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a good idea.

@KanRobert KanRobert self-requested a review April 21, 2024 07:41
aniplcc pushed a commit to aniplcc/llvm-project that referenced this pull request Apr 21, 2024
aniplcc pushed a commit to aniplcc/llvm-project that referenced this pull request Apr 21, 2024
…save-reload of tile register (llvm#83628)"

This reverts commit 34acbb3.

This change causes major compile-time regressions.
phoebewang added a commit that referenced this pull request Apr 29, 2024
…save-reload of tile register (#83628)"

Fixes compile time regression in previous commit.
phoebewang added a commit to phoebewang/llvm-project that referenced this pull request May 6, 2024
We need to check if `GR64Cand` a valid register before using it.

Test is not needed since it's covered in llvm-test-suite.

Fixes llvm#90954
phoebewang added a commit that referenced this pull request May 15, 2024
We need to check if `GR64Cand` a valid register before using it.

Test is not needed since it's covered in llvm-test-suite.

Fixes #90954
phoebewang added a commit to phoebewang/llvm-project that referenced this pull request May 29, 2024
phoebewang added a commit that referenced this pull request Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants