-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[AMDGPU] Always emit SI_KILL_I1_PSEUDO for uniform floating point branches. #124028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
kmitropoulou
commented
Jan 23, 2025
- [NFC] Use GCNPat instead of Pat.
- [AMDGPU] Always emit SI_KILL_I1_PSEUDO for uniform floating point branches.
@llvm/pr-subscribers-backend-amdgpu Author: Konstantina Mitropoulou (kmitropoulou) Changes
Full diff: https://github.com/llvm/llvm-project/pull/124028.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index 7ad6720b8001af..6439149d801f6b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -2498,6 +2498,9 @@ def HasNotMADIntraFwdBug : Predicate<"!Subtarget->hasMADIntraFwdBug()">;
def HasSALUFloatInsts : Predicate<"Subtarget->hasSALUFloatInsts()">,
AssemblerPredicate<(all_of FeatureSALUFloatInsts)>;
+def NotHasSALUFloatInsts : Predicate<"!Subtarget->hasSALUFloatInsts()">,
+ AssemblerPredicate<(all_of (not FeatureSALUFloatInsts))>;
+
def HasPseudoScalarTrans : Predicate<"Subtarget->hasPseudoScalarTrans()">,
AssemblerPredicate<(all_of FeaturePseudoScalarTrans)>;
diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td
index 40a20fa9cb15ea..ae3d20bcfb3aaf 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -1053,39 +1053,40 @@ def : GCNPat<
(SI_ELSE $src, $target)
>;
-def : Pat <
+def : GCNPat <
(int_amdgcn_kill i1:$src),
(SI_KILL_I1_PSEUDO SCSrc_i1:$src, 0)
>;
-def : Pat <
+def : GCNPat <
(int_amdgcn_kill (i1 (not i1:$src))),
(SI_KILL_I1_PSEUDO SCSrc_i1:$src, -1)
>;
-def : Pat <
+let SubtargetPredicate = NotHasSALUFloatInsts in
+def : GCNPat <
(int_amdgcn_kill (i1 (setcc f32:$src, InlineImmFP32:$imm, cond:$cond))),
(SI_KILL_F32_COND_IMM_PSEUDO VSrc_b32:$src, (bitcast_fpimm_to_i32 $imm), (cond_as_i32imm $cond))
>;
-def : Pat <
+def : GCNPat <
(int_amdgcn_wqm_demote i1:$src),
(SI_DEMOTE_I1 SCSrc_i1:$src, 0)
>;
-def : Pat <
+def : GCNPat <
(int_amdgcn_wqm_demote (i1 (not i1:$src))),
(SI_DEMOTE_I1 SCSrc_i1:$src, -1)
>;
// TODO: we could add more variants for other types of conditionals
-def : Pat <
+def : GCNPat <
(i64 (int_amdgcn_icmp i1:$src, (i1 0), (i32 33))),
(COPY $src) // Return the SGPRs representing i1 src
>;
-def : Pat <
+def : GCNPat <
(i32 (int_amdgcn_icmp i1:$src, (i1 0), (i32 33))),
(COPY $src) // Return the SGPRs representing i1 src
>;
|
Patch looks OK but can you come up with a test case? Maybe use As a follow up, we should probably rename SI_KILL_I1_PSEUDO and SI_KILL_F32_COND_IMM_PSEUDO, because the important different is not whether the comparison is integer or floating point. It is whether the comparison instruction is SALU or VALU (i.e. whether the result is in scc or vcc). |
5eadc3d
to
6fd8c15
Compare
@@ -2498,6 +2498,9 @@ def HasNotMADIntraFwdBug : Predicate<"!Subtarget->hasMADIntraFwdBug()">; | |||
def HasSALUFloatInsts : Predicate<"Subtarget->hasSALUFloatInsts()">, | |||
AssemblerPredicate<(all_of FeatureSALUFloatInsts)>; | |||
|
|||
def NotHasSALUFloatInsts : Predicate<"!Subtarget->hasSALUFloatInsts()">, | |||
AssemblerPredicate<(all_of (not FeatureSALUFloatInsts))>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really not have none_of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really don't. Looks like there are plenty of places where it could be used in existing .td files.
6fd8c15
to
2ae284a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the test. It can be simplified as noted inline.
LGTM apart from that.
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9" | ||
target triple = "amdgcn--amdpal" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
; Function Attrs: nocallback nofree nounwind | ||
declare void @llvm.amdgcn.kill(i1) #0 | ||
|
||
; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none) | ||
declare noundef i64 @llvm.amdgcn.s.getpc() #1 | ||
|
||
; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none) | ||
declare i32 @llvm.amdgcn.s.buffer.load.i32(<4 x i32>, i32, i32 immarg) #2 | ||
|
||
attributes #0 = { nocallback nofree nounwind } | ||
attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) } | ||
attributes #2 = { nocallback nofree nosync nounwind willreturn memory(none) } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
; CHECK-NEXT: bb.2 (%ir-block.5): | ||
; CHECK-NEXT: S_ENDPGM 0 | ||
.entry: | ||
%0 = call i32 @llvm.amdgcn.s.buffer.load.i32(<4 x i32> zeroinitializer, i32 0, i32 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use named values instead of %0
etc. You can do this automatically by running the test through opt -passes=instnamer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
2ae284a
to
6954d3c
Compare
@@ -0,0 +1,42 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5 | |||
; RUN: llc -mtriple=amdgcn -mcpu=gfx1200 -verify-machineinstrs -stop-after=amdgpu-isel < %s 2>&1 | FileCheck %s | |||
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with datalayout removed
6954d3c
to
26ef9ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks