AMDGPU: Fix overly conservative immediate operand check #127563

arsenm · 2025-02-18T04:11:15Z

The real legality check is peformed later anyway, so this was
unnecessarily blocking immediate folds in handled cases.

This also stops folding s_fmac_f32 to s_fmamk_f32 in a few tests,
but that seems better. The globalisel changes look suspicious,
it may be mishandling constants for VOP3P instructions.

arsenm · 2025-02-18T04:11:30Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-02-18T04:12:01Z

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)

Changes

The real legality check is peformed later anyway, so this was
unnecessarily blocking immediate folds in handled cases.

This also stops folding s_fmac_f32 to s_fmamk_f32 in a few tests,
but that seems better. The globalisel changes look suspicious,
it may be mishandling constants for VOP3P instructions.

Full diff: https://github.com/llvm/llvm-project/pull/127563.diff

10 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (+2-1)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll (+4-12)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/orn2.ll (+4-12)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/xnor.ll (+1-3)
(modified) llvm/test/CodeGen/AMDGPU/bug-cselect-b64.ll (+2-4)
(modified) llvm/test/CodeGen/AMDGPU/constrained-shift.ll (+2-4)
(modified) llvm/test/CodeGen/AMDGPU/fold-operands-scalar-fmac.mir (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/global-saddr-load.ll (+1-4)
(modified) llvm/test/CodeGen/AMDGPU/packed-fp32.ll (+5-5)
(modified) llvm/test/CodeGen/AMDGPU/scalar-float-sop2.ll (+2-2)

diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 84773349e0ca0..cbd858b9002ee 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -830,7 +830,8 @@ bool SIFoldOperandsImpl::tryToFoldACImm(
   if (UseOpIdx >= Desc.getNumOperands())
     return false;
 
-  if (!AMDGPU::isSISrcInlinableOperand(Desc, UseOpIdx))
+  // Filter out unhandled pseudos.
+  if (!AMDGPU::isSISrcOperand(Desc, UseOpIdx))
     return false;
 
   uint8_t OpTy = Desc.operands()[UseOpIdx].OperandType;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll
index 4be00fedb972e..89078f20f1d47 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll
@@ -920,9 +920,7 @@ define amdgpu_ps i64 @s_andn2_v4i16(<4 x i16> inreg %src0, <4 x i16> inreg %src1
 ; GFX6-NEXT:    s_lshl_b32 s3, s9, 16
 ; GFX6-NEXT:    s_and_b32 s4, s8, 0xffff
 ; GFX6-NEXT:    s_or_b32 s3, s3, s4
-; GFX6-NEXT:    s_mov_b32 s4, -1
-; GFX6-NEXT:    s_mov_b32 s5, s4
-; GFX6-NEXT:    s_xor_b64 s[2:3], s[2:3], s[4:5]
+; GFX6-NEXT:    s_xor_b64 s[2:3], s[2:3], -1
 ; GFX6-NEXT:    s_and_b64 s[0:1], s[0:1], s[2:3]
 ; GFX6-NEXT:    ; return to shader part epilog
 ;
@@ -962,9 +960,7 @@ define amdgpu_ps i64 @s_andn2_v4i16_commute(<4 x i16> inreg %src0, <4 x i16> inr
 ; GFX6-NEXT:    s_lshl_b32 s3, s9, 16
 ; GFX6-NEXT:    s_and_b32 s4, s8, 0xffff
 ; GFX6-NEXT:    s_or_b32 s3, s3, s4
-; GFX6-NEXT:    s_mov_b32 s4, -1
-; GFX6-NEXT:    s_mov_b32 s5, s4
-; GFX6-NEXT:    s_xor_b64 s[2:3], s[2:3], s[4:5]
+; GFX6-NEXT:    s_xor_b64 s[2:3], s[2:3], -1
 ; GFX6-NEXT:    s_and_b64 s[0:1], s[2:3], s[0:1]
 ; GFX6-NEXT:    ; return to shader part epilog
 ;
@@ -1004,9 +1000,7 @@ define amdgpu_ps { i64, i64 } @s_andn2_v4i16_multi_use(<4 x i16> inreg %src0, <4
 ; GFX6-NEXT:    s_lshl_b32 s3, s9, 16
 ; GFX6-NEXT:    s_and_b32 s4, s8, 0xffff
 ; GFX6-NEXT:    s_or_b32 s3, s3, s4
-; GFX6-NEXT:    s_mov_b32 s4, -1
-; GFX6-NEXT:    s_mov_b32 s5, s4
-; GFX6-NEXT:    s_xor_b64 s[2:3], s[2:3], s[4:5]
+; GFX6-NEXT:    s_xor_b64 s[2:3], s[2:3], -1
 ; GFX6-NEXT:    s_and_b64 s[0:1], s[0:1], s[2:3]
 ; GFX6-NEXT:    ; return to shader part epilog
 ;
@@ -1060,9 +1054,7 @@ define amdgpu_ps { i64, i64 } @s_andn2_v4i16_multi_foldable_use(<4 x i16> inreg
 ; GFX6-NEXT:    s_lshl_b32 s5, s13, 16
 ; GFX6-NEXT:    s_and_b32 s6, s12, 0xffff
 ; GFX6-NEXT:    s_or_b32 s5, s5, s6
-; GFX6-NEXT:    s_mov_b32 s6, -1
-; GFX6-NEXT:    s_mov_b32 s7, s6
-; GFX6-NEXT:    s_xor_b64 s[4:5], s[4:5], s[6:7]
+; GFX6-NEXT:    s_xor_b64 s[4:5], s[4:5], -1
 ; GFX6-NEXT:    s_and_b64 s[0:1], s[0:1], s[4:5]
 ; GFX6-NEXT:    s_and_b64 s[2:3], s[2:3], s[4:5]
 ; GFX6-NEXT:    ; return to shader part epilog
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/orn2.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/orn2.ll
index e7119c89ac06c..065fadf3b5ef3 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/orn2.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/orn2.ll
@@ -919,9 +919,7 @@ define amdgpu_ps i64 @s_orn2_v4i16(<4 x i16> inreg %src0, <4 x i16> inreg %src1)
 ; GFX6-NEXT:    s_lshl_b32 s3, s9, 16
 ; GFX6-NEXT:    s_and_b32 s4, s8, 0xffff
 ; GFX6-NEXT:    s_or_b32 s3, s3, s4
-; GFX6-NEXT:    s_mov_b32 s4, -1
-; GFX6-NEXT:    s_mov_b32 s5, s4
-; GFX6-NEXT:    s_xor_b64 s[2:3], s[2:3], s[4:5]
+; GFX6-NEXT:    s_xor_b64 s[2:3], s[2:3], -1
 ; GFX6-NEXT:    s_or_b64 s[0:1], s[0:1], s[2:3]
 ; GFX6-NEXT:    ; return to shader part epilog
 ;
@@ -961,9 +959,7 @@ define amdgpu_ps i64 @s_orn2_v4i16_commute(<4 x i16> inreg %src0, <4 x i16> inre
 ; GFX6-NEXT:    s_lshl_b32 s3, s9, 16
 ; GFX6-NEXT:    s_and_b32 s4, s8, 0xffff
 ; GFX6-NEXT:    s_or_b32 s3, s3, s4
-; GFX6-NEXT:    s_mov_b32 s4, -1
-; GFX6-NEXT:    s_mov_b32 s5, s4
-; GFX6-NEXT:    s_xor_b64 s[2:3], s[2:3], s[4:5]
+; GFX6-NEXT:    s_xor_b64 s[2:3], s[2:3], -1
 ; GFX6-NEXT:    s_or_b64 s[0:1], s[2:3], s[0:1]
 ; GFX6-NEXT:    ; return to shader part epilog
 ;
@@ -1003,9 +999,7 @@ define amdgpu_ps { i64, i64 } @s_orn2_v4i16_multi_use(<4 x i16> inreg %src0, <4
 ; GFX6-NEXT:    s_lshl_b32 s3, s9, 16
 ; GFX6-NEXT:    s_and_b32 s4, s8, 0xffff
 ; GFX6-NEXT:    s_or_b32 s3, s3, s4
-; GFX6-NEXT:    s_mov_b32 s4, -1
-; GFX6-NEXT:    s_mov_b32 s5, s4
-; GFX6-NEXT:    s_xor_b64 s[2:3], s[2:3], s[4:5]
+; GFX6-NEXT:    s_xor_b64 s[2:3], s[2:3], -1
 ; GFX6-NEXT:    s_or_b64 s[0:1], s[0:1], s[2:3]
 ; GFX6-NEXT:    ; return to shader part epilog
 ;
@@ -1059,9 +1053,7 @@ define amdgpu_ps { i64, i64 } @s_orn2_v4i16_multi_foldable_use(<4 x i16> inreg %
 ; GFX6-NEXT:    s_lshl_b32 s5, s13, 16
 ; GFX6-NEXT:    s_and_b32 s6, s12, 0xffff
 ; GFX6-NEXT:    s_or_b32 s5, s5, s6
-; GFX6-NEXT:    s_mov_b32 s6, -1
-; GFX6-NEXT:    s_mov_b32 s7, s6
-; GFX6-NEXT:    s_xor_b64 s[4:5], s[4:5], s[6:7]
+; GFX6-NEXT:    s_xor_b64 s[4:5], s[4:5], -1
 ; GFX6-NEXT:    s_or_b64 s[0:1], s[0:1], s[4:5]
 ; GFX6-NEXT:    s_or_b64 s[2:3], s[2:3], s[4:5]
 ; GFX6-NEXT:    ; return to shader part epilog
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/xnor.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/xnor.ll
index ed85fb19d9051..43322b1e23412 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/xnor.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/xnor.ll
@@ -118,13 +118,11 @@ define amdgpu_ps i64 @scalar_xnor_v4i16_one_use(<4 x i16> inreg %a, <4 x i16> in
 ; GFX7-NEXT:    s_xor_b64 s[2:3], s[2:3], s[6:7]
 ; GFX7-NEXT:    s_lshl_b32 s1, s1, 16
 ; GFX7-NEXT:    s_and_b32 s0, s0, 0xffff
-; GFX7-NEXT:    s_mov_b32 s8, -1
 ; GFX7-NEXT:    s_or_b32 s0, s1, s0
 ; GFX7-NEXT:    s_lshl_b32 s1, s3, 16
 ; GFX7-NEXT:    s_and_b32 s2, s2, 0xffff
-; GFX7-NEXT:    s_mov_b32 s9, s8
 ; GFX7-NEXT:    s_or_b32 s1, s1, s2
-; GFX7-NEXT:    s_xor_b64 s[0:1], s[0:1], s[8:9]
+; GFX7-NEXT:    s_xor_b64 s[0:1], s[0:1], -1
 ; GFX7-NEXT:    ; return to shader part epilog
 ;
 ; GFX8-LABEL: scalar_xnor_v4i16_one_use:
diff --git a/llvm/test/CodeGen/AMDGPU/bug-cselect-b64.ll b/llvm/test/CodeGen/AMDGPU/bug-cselect-b64.ll
index f6fc69a6e3e47..ea93e3ac1e595 100644
--- a/llvm/test/CodeGen/AMDGPU/bug-cselect-b64.ll
+++ b/llvm/test/CodeGen/AMDGPU/bug-cselect-b64.ll
@@ -5,16 +5,14 @@ define amdgpu_cs <2 x i32> @f() {
 ; CHECK-LABEL: f:
 ; CHECK:       ; %bb.0: ; %bb
 ; CHECK-NEXT:    s_mov_b32 s4, 0
+; CHECK-NEXT:    s_mov_b32 s1, 0
 ; CHECK-NEXT:    s_mov_b32 s5, s4
 ; CHECK-NEXT:    s_mov_b32 s6, s4
 ; CHECK-NEXT:    s_mov_b32 s7, s4
-; CHECK-NEXT:    s_mov_b32 s0, s4
 ; CHECK-NEXT:    buffer_load_dwordx2 v[0:1], off, s[4:7], 0
-; CHECK-NEXT:    s_mov_b32 s1, s4
 ; CHECK-NEXT:    s_waitcnt vmcnt(0)
-; CHECK-NEXT:    v_cmp_ne_u64_e32 vcc_lo, s[0:1], v[0:1]
+; CHECK-NEXT:    v_cmp_ne_u64_e32 vcc_lo, 0, v[0:1]
 ; CHECK-NEXT:    v_mov_b32_e32 v1, s4
-; CHECK-NEXT:    s_mov_b32 s1, 0
 ; CHECK-NEXT:    v_cndmask_b32_e64 v0, 0, 1, vcc_lo
 ; CHECK-NEXT:    v_readfirstlane_b32 s0, v0
 ; CHECK-NEXT:    buffer_store_dwordx2 v[0:1], off, s[4:7], 0
diff --git a/llvm/test/CodeGen/AMDGPU/constrained-shift.ll b/llvm/test/CodeGen/AMDGPU/constrained-shift.ll
index 4011c21af6904..661af021e8a84 100644
--- a/llvm/test/CodeGen/AMDGPU/constrained-shift.ll
+++ b/llvm/test/CodeGen/AMDGPU/constrained-shift.ll
@@ -192,10 +192,8 @@ define amdgpu_ps <4 x i32> @s_csh_v4i32(<4 x i32> inreg %a, <4 x i32> inreg %b)
 ;
 ; GISEL-LABEL: s_csh_v4i32:
 ; GISEL:       ; %bb.0:
-; GISEL-NEXT:    s_mov_b32 s8, 31
-; GISEL-NEXT:    s_mov_b32 s9, s8
-; GISEL-NEXT:    s_and_b64 s[4:5], s[4:5], s[8:9]
-; GISEL-NEXT:    s_and_b64 s[6:7], s[6:7], s[8:9]
+; GISEL-NEXT:    s_and_b64 s[4:5], s[4:5], 31
+; GISEL-NEXT:    s_and_b64 s[6:7], s[6:7], 31
 ; GISEL-NEXT:    s_lshl_b32 s8, s0, s4
 ; GISEL-NEXT:    s_lshl_b32 s9, s1, s5
 ; GISEL-NEXT:    s_lshl_b32 s10, s2, s6
diff --git a/llvm/test/CodeGen/AMDGPU/fold-operands-scalar-fmac.mir b/llvm/test/CodeGen/AMDGPU/fold-operands-scalar-fmac.mir
index 08693ec9db1d4..aeca4398f9a83 100644
--- a/llvm/test/CodeGen/AMDGPU/fold-operands-scalar-fmac.mir
+++ b/llvm/test/CodeGen/AMDGPU/fold-operands-scalar-fmac.mir
@@ -13,7 +13,7 @@ body:             |
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0
     ; CHECK-NEXT: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr1
-    ; CHECK-NEXT: %fma:sreg_32 = nofpexcept S_FMAMK_F32 [[COPY]], 1056964608, [[COPY1]], implicit $mode
+    ; CHECK-NEXT: %fma:sreg_32 = nofpexcept S_FMAC_F32 1056964608, [[COPY]], [[COPY1]], implicit $mode
     ; CHECK-NEXT: $sgpr0 = COPY %fma
     %0:sreg_32 = COPY $sgpr0
     %1:sreg_32 = COPY $sgpr1
@@ -33,7 +33,7 @@ body:             |
     ; CHECK-NEXT: {{  $}}
     ; CHECK-NEXT: [[COPY:%[0-9]+]]:sreg_32 = COPY $sgpr0
     ; CHECK-NEXT: [[COPY1:%[0-9]+]]:sreg_32 = COPY $sgpr1
-    ; CHECK-NEXT: %fma:sreg_32 = nofpexcept S_FMAMK_F32 [[COPY]], 1056964608, [[COPY1]], implicit $mode
+    ; CHECK-NEXT: %fma:sreg_32 = nofpexcept S_FMAC_F32 [[COPY]], 1056964608, [[COPY1]], implicit $mode
     ; CHECK-NEXT: $sgpr0 = COPY %fma
     %0:sreg_32 = COPY $sgpr0
     %1:sreg_32 = COPY $sgpr1
diff --git a/llvm/test/CodeGen/AMDGPU/global-saddr-load.ll b/llvm/test/CodeGen/AMDGPU/global-saddr-load.ll
index 492a30b67089c..bc49f70cbee11 100644
--- a/llvm/test/CodeGen/AMDGPU/global-saddr-load.ll
+++ b/llvm/test/CodeGen/AMDGPU/global-saddr-load.ll
@@ -742,10 +742,7 @@ define amdgpu_ps float @global_load_saddr_i8_offset_0x100000001(ptr addrspace(1)
 ;
 ; GFX12-SDAG-LABEL: global_load_saddr_i8_offset_0x100000001:
 ; GFX12-SDAG:       ; %bb.0:
-; GFX12-SDAG-NEXT:    s_mov_b32 s0, 1
-; GFX12-SDAG-NEXT:    s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
-; GFX12-SDAG-NEXT:    s_mov_b32 s1, s0
-; GFX12-SDAG-NEXT:    s_add_nc_u64 s[0:1], s[2:3], s[0:1]
+; GFX12-SDAG-NEXT:    s_add_nc_u64 s[0:1], s[2:3], 1
 ; GFX12-SDAG-NEXT:    s_load_u8 s0, s[0:1], 0x0
 ; GFX12-SDAG-NEXT:    s_wait_kmcnt 0x0
 ; GFX12-SDAG-NEXT:    v_mov_b32_e32 v0, s0
diff --git a/llvm/test/CodeGen/AMDGPU/packed-fp32.ll b/llvm/test/CodeGen/AMDGPU/packed-fp32.ll
index b59f3c0d410f8..9b03a72fd826d 100644
--- a/llvm/test/CodeGen/AMDGPU/packed-fp32.ll
+++ b/llvm/test/CodeGen/AMDGPU/packed-fp32.ll
@@ -87,7 +87,7 @@ define amdgpu_kernel void @fadd_v2_v_v_splat(ptr addrspace(1) %a) {
 ; GCN-LABEL: {{^}}fadd_v2_v_lit_splat:
 ; GFX900-COUNT-2: v_add_f32_e32 v{{[0-9]+}}, 1.0, v{{[0-9]+}}
 ; PACKED-SDAG:    v_pk_add_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], 1.0 op_sel_hi:[1,0]{{$}}
-; PACKED-GISEL:   v_pk_add_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], s[{{[0-9:]+}}]{{$}}
+; PACKED-GISEL:    v_pk_add_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], 1.0{{$}}
 define amdgpu_kernel void @fadd_v2_v_lit_splat(ptr addrspace(1) %a) {
   %id = tail call i32 @llvm.amdgcn.workitem.id.x()
   %gep = getelementptr inbounds <2 x float>, ptr addrspace(1) %a, i32 %id
@@ -308,7 +308,7 @@ define amdgpu_kernel void @fmul_v2_v_v_splat(ptr addrspace(1) %a) {
 ; GCN-LABEL: {{^}}fmul_v2_v_lit_splat:
 ; GFX900-COUNT-2: v_mul_f32_e32 v{{[0-9]+}}, 4.0, v{{[0-9]+}}
 ; PACKED-SDAG:    v_pk_mul_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], 4.0 op_sel_hi:[1,0]{{$}}
-; PACKED-GISEL:   v_pk_mul_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], s[{{[0-9:]+}}]{{$}}
+; PACKED-GISEL:   v_pk_mul_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], 4.0{{$}}
 define amdgpu_kernel void @fmul_v2_v_lit_splat(ptr addrspace(1) %a) {
   %id = tail call i32 @llvm.amdgcn.workitem.id.x()
   %gep = getelementptr inbounds <2 x float>, ptr addrspace(1) %a, i32 %id
@@ -432,7 +432,7 @@ define amdgpu_kernel void @fma_v2_v_v_splat(ptr addrspace(1) %a) {
 ; GCN-LABEL: {{^}}fma_v2_v_lit_splat:
 ; GFX900-COUNT-2: v_fma_f32 v{{[0-9]+}}, v{{[0-9]+}}, 4.0, 1.0
 ; PACKED-SDAG:    v_pk_fma_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], 4.0, 1.0 op_sel_hi:[1,0,0]{{$}}
-; PACKED-GISEL:   v_pk_fma_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], s[{{[0-9:]+}}], v[{{[0-9:]+}}]{{$}}
+; PACKED-GISEL:   v_pk_fma_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], 4.0, 1.0{{$}}
 define amdgpu_kernel void @fma_v2_v_lit_splat(ptr addrspace(1) %a) {
   %id = tail call i32 @llvm.amdgcn.workitem.id.x()
   %gep = getelementptr inbounds <2 x float>, ptr addrspace(1) %a, i32 %id
@@ -556,8 +556,8 @@ bb:
 ; PACKED-SDAG: v_add_f32_e64 v{{[0-9]+}}, s{{[0-9]+}}, 0
 ; PACKED-SDAG: v_add_f32_e32 v{{[0-9]+}}, 0, v{{[0-9]+}}
 
-; PACKED-GISEL: v_pk_add_f32 v[{{[0-9:]+}}], s[{{[0-9:]+}}], v[{{[0-9:]+}}]{{$}}
-; PACKED-GISEL: v_pk_add_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], s[{{[0-9:]+}}]{{$}}
+; PACKED-GISEL: v_pk_add_f32 v[{{[0-9:]+}}], s[{{[0-9:]+}}], 0{{$}}
+; PACKED-GISEL: v_pk_add_f32 v[{{[0-9:]+}}], v[{{[0-9:]+}}], 0{{$}}
 define amdgpu_kernel void @fadd_fadd_fsub_0(<2 x float> %arg) {
 bb:
   %i12 = fadd <2 x float> zeroinitializer, %arg
diff --git a/llvm/test/CodeGen/AMDGPU/scalar-float-sop2.ll b/llvm/test/CodeGen/AMDGPU/scalar-float-sop2.ll
index 81d792183dc06..debbfce7dadcc 100644
--- a/llvm/test/CodeGen/AMDGPU/scalar-float-sop2.ll
+++ b/llvm/test/CodeGen/AMDGPU/scalar-float-sop2.ll
@@ -218,7 +218,7 @@ define amdgpu_ps float @_amdgpu_ps_main() {
 ; GFX1150-NEXT:    s_mov_b32 s3, s0
 ; GFX1150-NEXT:    s_buffer_load_b64 s[0:1], s[0:3], 0x0
 ; GFX1150-NEXT:    s_waitcnt lgkmcnt(0)
-; GFX1150-NEXT:    s_fmamk_f32 s0, s1, 0x40800000, s0
+; GFX1150-NEXT:    s_fmac_f32 s0, s1, 4.0
 ; GFX1150-NEXT:    s_delay_alu instid0(SALU_CYCLE_3)
 ; GFX1150-NEXT:    v_mov_b32_e32 v0, s0
 ; GFX1150-NEXT:    ; return to shader part epilog
@@ -232,7 +232,7 @@ define amdgpu_ps float @_amdgpu_ps_main() {
 ; GFX12-NEXT:    s_mov_b32 s3, s0
 ; GFX12-NEXT:    s_buffer_load_b64 s[0:1], s[0:3], 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    s_fmamk_f32 s0, s1, 0x40800000, s0
+; GFX12-NEXT:    s_fmac_f32 s0, s1, 4.0
 ; GFX12-NEXT:    s_delay_alu instid0(SALU_CYCLE_3)
 ; GFX12-NEXT:    v_mov_b32_e32 v0, s0
 ; GFX12-NEXT:    ; return to shader part epilog

jayfoad · 2025-02-26T16:45:27Z

This also stops folding s_fmac_f32 to s_fmamk_f32 in a few tests,
but that seems better.

It might be better to form s_fmamk_f32 pre-RA, since that gives RA the freedom to use different physical registers for dst and src2 if it wants to. Then we could shrink back to s_fmac_f32 post-RA if the physical registers are the same.

arsenm · 2025-02-26T16:50:22Z

It might be better to form s_fmamk_f32 pre-RA, since that gives RA the freedom to use different physical registers for dst and src2 if it wants to. Then we could shrink back to s_fmac_f32 post-RA if the physical registers are the same.

I tried to do this a few times and it was always worse. The three address form is a stronger hint to RA than anything else

arsenm · 2025-02-27T01:34:27Z

Merge activity

Feb 26, 8:34 PM EST: A user started a stack merge that includes this pull request via Graphite.
Feb 26, 8:39 PM EST: Graphite rebased this pull request as part of a merge.
Feb 26, 8:42 PM EST: A user merged this pull request with Graphite.

The real legality check is peformed later anyway, so this was unnecessarily blocking immediate folds in handled cases. This also stops folding s_fmac_f32 to s_fmamk_f32 in a few tests, but that seems better. The globalisel changes look suspicious, it may be mishandling constants for VOP3P instructions.

llvm-ci · 2025-02-27T02:40:54Z

LLVM Buildbot has detected a new failure on builder sanitizer-x86_64-linux running on sanitizer-buildbot1 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/66/builds/10457

Here is the relevant piece of the build log for the reference

Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:248: warning: COMPILER_RT_TEST_STANDALONE_BUILD_LIBS=ON, but this test suite does not support testing the just-built runtime libraries when the test compiler is configured to use different runtime libraries. Either modify this test suite to support this test configuration, or set COMPILER_RT_TEST_STANDALONE_BUILD_LIBS=OFF to test the runtime libraries included in the compiler instead.
llvm-lit: /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:259: note: Testing using libraries in "/home/b/sanitizer-x86_64-linux/build/build_default/./lib/../lib/clang/21/lib/i386-unknown-linux-gnu"
llvm-lit: /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:237: warning: Compiler lib dir != compiler-rt lib dir
Compiler libdir:     "/home/b/sanitizer-x86_64-linux/build/build_default/lib/clang/21/lib/i386-unknown-linux-gnu"
compiler-rt libdir:  "/home/b/sanitizer-x86_64-linux/build/build_default/lib/clang/21/lib/x86_64-unknown-linux-gnu"
llvm-lit: /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:248: warning: COMPILER_RT_TEST_STANDALONE_BUILD_LIBS=ON, but this test suite does not support testing the just-built runtime libraries when the test compiler is configured to use different runtime libraries. Either modify this test suite to support this test configuration, or set COMPILER_RT_TEST_STANDALONE_BUILD_LIBS=OFF to test the runtime libraries included in the compiler instead.
llvm-lit: /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:259: note: Testing using libraries in "/home/b/sanitizer-x86_64-linux/build/build_default/./lib/../lib/clang/21/lib/x86_64-unknown-linux-gnu"
llvm-lit: /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 4859 of 10640 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70..
FAIL: ThreadSanitizer-x86_64 :: stress.cpp (3793 of 4859)
******************** TEST 'ThreadSanitizer-x86_64 :: stress.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/b/sanitizer-x86_64-linux/build/build_default/./bin/clang  --driver-mode=g++ -fsanitize=thread -Wall  -m64  -msse4.2   -gline-tables-only -I/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/tsan/../ -nostdinc++ -I/home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1 -O1 /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/tsan/stress.cpp -o /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/tsan/X86_64Config/Output/stress.cpp.tmp && env TSAN_OPTIONS=atexit_sleep_ms=0:flush_memory_ms=1:flush_symbolizer_ms=1:memory_limit_mb=1  /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/tsan/X86_64Config/Output/stress.cpp.tmp 2>&1 | FileCheck /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/tsan/stress.cpp --check-prefix=CHECK-NORACE
+ /home/b/sanitizer-x86_64-linux/build/build_default/./bin/clang --driver-mode=g++ -fsanitize=thread -Wall -m64 -msse4.2 -gline-tables-only -I/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/tsan/../ -nostdinc++ -I/home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1 -O1 /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/tsan/stress.cpp -o /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/tsan/X86_64Config/Output/stress.cpp.tmp
+ env TSAN_OPTIONS=atexit_sleep_ms=0:flush_memory_ms=1:flush_symbolizer_ms=1:memory_limit_mb=1 /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/tsan/X86_64Config/Output/stress.cpp.tmp
+ FileCheck /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/tsan/stress.cpp --check-prefix=CHECK-NORACE
/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/tsan/stress.cpp:99:18: error: CHECK-NORACE: expected string not found in input
// CHECK-NORACE: DONE
                 ^
<stdin>:1:1: note: scanning from here
ThreadSanitizer: CHECK failed: tsan_rtl_access.cpp:84 "((res)) != (0)" (0x0, 0x0) (tid=11559)
^

Input file: <stdin>
Check file: /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/tsan/stress.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          1: ThreadSanitizer: CHECK failed: tsan_rtl_access.cpp:84 "((res)) != (0)" (0x0, 0x0) (tid=11559) 
check:99     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
          2:  #0 __tsan::CheckUnwind() /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/tsan/rtl/tsan_rtl.cpp:676:21 (stress.cpp.tmp+0xff7e5) 
check:99     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          3:  #1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:86:5 (stress.cpp.tmp+0x89e82) 
check:99     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          4:  #2 __tsan::TraceMemoryAccessRange(__tsan::ThreadState*, unsigned long, unsigned long, unsigned long, unsigned long) /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/tsan/rtl/tsan_rtl_access.cpp (stress.cpp.tmp+0x103150) 
check:99     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          5:  #3 __tsan::MemoryRangeImitateWrite(__tsan::ThreadState*, unsigned long, unsigned long, unsigned long) /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/tsan/rtl/tsan_rtl_access.cpp:636:3 (stress.cpp.tmp+0x108539) 
check:99     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          6:  #4 __tsan::ImitateTlsWrite(__tsan::ThreadState*, unsigned long, unsigned long) /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp:578:3 (stress.cpp.tmp+0x12256d) 
check:99     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          .
          .
          .
Step 11 (test compiler-rt debug) failure: test compiler-rt debug (failure)
...
llvm-lit: /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:248: warning: COMPILER_RT_TEST_STANDALONE_BUILD_LIBS=ON, but this test suite does not support testing the just-built runtime libraries when the test compiler is configured to use different runtime libraries. Either modify this test suite to support this test configuration, or set COMPILER_RT_TEST_STANDALONE_BUILD_LIBS=OFF to test the runtime libraries included in the compiler instead.
llvm-lit: /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:259: note: Testing using libraries in "/home/b/sanitizer-x86_64-linux/build/build_default/./lib/../lib/clang/21/lib/i386-unknown-linux-gnu"
llvm-lit: /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:237: warning: Compiler lib dir != compiler-rt lib dir
Compiler libdir:     "/home/b/sanitizer-x86_64-linux/build/build_default/lib/clang/21/lib/i386-unknown-linux-gnu"
compiler-rt libdir:  "/home/b/sanitizer-x86_64-linux/build/build_default/lib/clang/21/lib/x86_64-unknown-linux-gnu"
llvm-lit: /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:248: warning: COMPILER_RT_TEST_STANDALONE_BUILD_LIBS=ON, but this test suite does not support testing the just-built runtime libraries when the test compiler is configured to use different runtime libraries. Either modify this test suite to support this test configuration, or set COMPILER_RT_TEST_STANDALONE_BUILD_LIBS=OFF to test the runtime libraries included in the compiler instead.
llvm-lit: /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/lit.common.cfg.py:259: note: Testing using libraries in "/home/b/sanitizer-x86_64-linux/build/build_default/./lib/../lib/clang/21/lib/x86_64-unknown-linux-gnu"
llvm-lit: /home/b/sanitizer-x86_64-linux/build/llvm-project/llvm/utils/lit/lit/main.py:72: note: The test suite configuration requested an individual test timeout of 0 seconds but a timeout of 900 seconds was requested on the command line. Forcing timeout to be 900 seconds.
-- Testing: 4859 of 10640 tests, 88 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70..
FAIL: ThreadSanitizer-x86_64 :: stress.cpp (3793 of 4859)
******************** TEST 'ThreadSanitizer-x86_64 :: stress.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/b/sanitizer-x86_64-linux/build/build_default/./bin/clang  --driver-mode=g++ -fsanitize=thread -Wall  -m64  -msse4.2   -gline-tables-only -I/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/tsan/../ -nostdinc++ -I/home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1 -O1 /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/tsan/stress.cpp -o /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/tsan/X86_64Config/Output/stress.cpp.tmp && env TSAN_OPTIONS=atexit_sleep_ms=0:flush_memory_ms=1:flush_symbolizer_ms=1:memory_limit_mb=1  /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/tsan/X86_64Config/Output/stress.cpp.tmp 2>&1 | FileCheck /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/tsan/stress.cpp --check-prefix=CHECK-NORACE
+ /home/b/sanitizer-x86_64-linux/build/build_default/./bin/clang --driver-mode=g++ -fsanitize=thread -Wall -m64 -msse4.2 -gline-tables-only -I/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/tsan/../ -nostdinc++ -I/home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/lib/tsan/libcxx_tsan_x86_64/include/c++/v1 -O1 /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/tsan/stress.cpp -o /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/tsan/X86_64Config/Output/stress.cpp.tmp
+ env TSAN_OPTIONS=atexit_sleep_ms=0:flush_memory_ms=1:flush_symbolizer_ms=1:memory_limit_mb=1 /home/b/sanitizer-x86_64-linux/build/build_default/runtimes/runtimes-bins/compiler-rt/test/tsan/X86_64Config/Output/stress.cpp.tmp
+ FileCheck /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/tsan/stress.cpp --check-prefix=CHECK-NORACE
/home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/tsan/stress.cpp:99:18: error: CHECK-NORACE: expected string not found in input
// CHECK-NORACE: DONE
                 ^
<stdin>:1:1: note: scanning from here
ThreadSanitizer: CHECK failed: tsan_rtl_access.cpp:84 "((res)) != (0)" (0x0, 0x0) (tid=11559)
^

Input file: <stdin>
Check file: /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/test/tsan/stress.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          1: ThreadSanitizer: CHECK failed: tsan_rtl_access.cpp:84 "((res)) != (0)" (0x0, 0x0) (tid=11559) 
check:99     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
          2:  #0 __tsan::CheckUnwind() /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/tsan/rtl/tsan_rtl.cpp:676:21 (stress.cpp.tmp+0xff7e5) 
check:99     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          3:  #1 __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_termination.cpp:86:5 (stress.cpp.tmp+0x89e82) 
check:99     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          4:  #2 __tsan::TraceMemoryAccessRange(__tsan::ThreadState*, unsigned long, unsigned long, unsigned long, unsigned long) /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/tsan/rtl/tsan_rtl_access.cpp (stress.cpp.tmp+0x103150) 
check:99     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          5:  #3 __tsan::MemoryRangeImitateWrite(__tsan::ThreadState*, unsigned long, unsigned long, unsigned long) /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/tsan/rtl/tsan_rtl_access.cpp:636:3 (stress.cpp.tmp+0x108539) 
check:99     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          6:  #4 __tsan::ImitateTlsWrite(__tsan::ThreadState*, unsigned long, unsigned long) /home/b/sanitizer-x86_64-linux/build/llvm-project/compiler-rt/lib/tsan/rtl/tsan_platform_linux.cpp:578:3 (stress.cpp.tmp+0x12256d) 
check:99     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          .
          .
          .

The real legality check is peformed later anyway, so this was unnecessarily blocking immediate folds in handled cases. This also stops folding s_fmac_f32 to s_fmamk_f32 in a few tests, but that seems better. The globalisel changes look suspicious, it may be mishandling constants for VOP3P instructions.

arsenm added the backend:AMDGPU label Feb 18, 2025 — with Graphite App

arsenm requested review from cdevadas, gandhi56, jayfoad, jrbyrnes, kerbowa, Pierre-vh, rampitec, shiltian and Sisyph February 18, 2025 04:11

arsenm marked this pull request as ready for review February 18, 2025 04:11

llvmbot added the llvm:globalisel label Feb 18, 2025

arsenm force-pushed the users/arsenm/do-not-commute-instruction-with-same-reg branch from b28280b to 2f11ad0 Compare February 18, 2025 04:26

arsenm force-pushed the users/arsenm/amdgpu/si-fold-operands-fix-overly-conservative-operand-check branch from 3dd61c6 to 2f31f25 Compare February 18, 2025 04:27

cdevadas approved these changes Feb 18, 2025

View reviewed changes

arsenm force-pushed the users/arsenm/do-not-commute-instruction-with-same-reg branch from 2f11ad0 to 1d3dce0 Compare February 18, 2025 10:21

arsenm force-pushed the users/arsenm/amdgpu/si-fold-operands-fix-overly-conservative-operand-check branch from 2f31f25 to 2903c24 Compare February 18, 2025 10:22

gandhi56 approved these changes Feb 18, 2025

View reviewed changes

arsenm force-pushed the users/arsenm/do-not-commute-instruction-with-same-reg branch from 1d3dce0 to e9a741f Compare February 26, 2025 16:37

arsenm force-pushed the users/arsenm/amdgpu/si-fold-operands-fix-overly-conservative-operand-check branch from 2903c24 to 5501596 Compare February 26, 2025 16:37

arsenm force-pushed the users/arsenm/amdgpu/si-fold-operands-fix-overly-conservative-operand-check branch from 5501596 to f3accf6 Compare February 27, 2025 00:18

arsenm force-pushed the users/arsenm/do-not-commute-instruction-with-same-reg branch from e9a741f to 7e136af Compare February 27, 2025 00:18

arsenm force-pushed the users/arsenm/do-not-commute-instruction-with-same-reg branch from 7e136af to 0edaa87 Compare February 27, 2025 01:35

Base automatically changed from users/arsenm/do-not-commute-instruction-with-same-reg to main February 27, 2025 01:39

arsenm force-pushed the users/arsenm/amdgpu/si-fold-operands-fix-overly-conservative-operand-check branch from f3accf6 to 9c076c0 Compare February 27, 2025 01:39

arsenm merged commit a316539 into main Feb 27, 2025
6 of 10 checks passed

arsenm deleted the users/arsenm/amdgpu/si-fold-operands-fix-overly-conservative-operand-check branch February 27, 2025 01:42

huaatian mentioned this pull request Feb 28, 2025

fix live interval empty issue huaatian/llvm-project#1

Open

mbrkusanin mentioned this pull request May 21, 2025

AMDGPU: Delete seemingly dead s_fmaak_f32/s_fmamk_f32 folding code #140580

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMDGPU: Fix overly conservative immediate operand check #127563

AMDGPU: Fix overly conservative immediate operand check #127563

Uh oh!

arsenm commented Feb 18, 2025

Uh oh!

arsenm commented Feb 18, 2025 •

edited

Loading

Uh oh!

llvmbot commented Feb 18, 2025 •

edited

Loading

Uh oh!

jayfoad commented Feb 26, 2025

Uh oh!

arsenm commented Feb 26, 2025

Uh oh!

arsenm commented Feb 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

llvm-ci commented Feb 27, 2025

Uh oh!

Uh oh!

AMDGPU: Fix overly conservative immediate operand check #127563

AMDGPU: Fix overly conservative immediate operand check #127563

Uh oh!

Conversation

arsenm commented Feb 18, 2025

Uh oh!

arsenm commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jayfoad commented Feb 26, 2025

Uh oh!

arsenm commented Feb 26, 2025

Uh oh!

arsenm commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

llvm-ci commented Feb 27, 2025

Uh oh!

Uh oh!

arsenm commented Feb 18, 2025 •

edited

Loading

llvmbot commented Feb 18, 2025 •

edited

Loading

arsenm commented Feb 27, 2025 •

edited

Loading