-
Notifications
You must be signed in to change notification settings - Fork 13.6k
DAG: Use phi to create vregs instead of the constant input #129464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAG: Use phi to create vregs instead of the constant input #129464
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) ChangesFor most targets, the register class comes from the type so this This avoids an intermediate s_mov_b32 plus a copy in some cases. These This only adjusts the constant input case. It may make sense to do Patch is 160.17 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/129464.diff 34 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index ea28f7262de54..5687147151e9d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -12022,7 +12022,7 @@ SelectionDAGBuilder::HandlePHINodesInSuccessorBlocks(const BasicBlock *LLVMBB) {
if (const auto *C = dyn_cast<Constant>(PHIOp)) {
unsigned &RegOut = ConstantsOut[C];
if (RegOut == 0) {
- RegOut = FuncInfo.CreateRegs(C);
+ RegOut = FuncInfo.CreateRegs(&PN);
// We need to zero/sign extend ConstantInt phi operands to match
// assumptions in FunctionLoweringInfo::ComputePHILiveOutRegInfo.
ISD::NodeType ExtendType = ISD::ANY_EXTEND;
diff --git a/llvm/test/CodeGen/AMDGPU/bb-prolog-spill-during-regalloc.ll b/llvm/test/CodeGen/AMDGPU/bb-prolog-spill-during-regalloc.ll
index 9988b2fa1eaf0..55a560c8d9b2f 100644
--- a/llvm/test/CodeGen/AMDGPU/bb-prolog-spill-during-regalloc.ll
+++ b/llvm/test/CodeGen/AMDGPU/bb-prolog-spill-during-regalloc.ll
@@ -8,13 +8,11 @@ define i32 @prolog_spill(i32 %arg0, i32 %arg1, i32 %arg2) {
; REGALLOC-NEXT: successors: %bb.3(0x40000000), %bb.1(0x40000000)
; REGALLOC-NEXT: liveins: $vgpr0, $vgpr1, $vgpr2
; REGALLOC-NEXT: {{ $}}
- ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr2, %stack.5, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.5, addrspace 5)
- ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr1, %stack.4, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.4, addrspace 5)
+ ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr2, %stack.4, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.4, addrspace 5)
+ ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr1, %stack.3, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.3, addrspace 5)
; REGALLOC-NEXT: renamable $sgpr4 = S_MOV_B32 49
; REGALLOC-NEXT: renamable $sgpr4_sgpr5 = V_CMP_GT_I32_e64 killed $vgpr0, killed $sgpr4, implicit $exec
- ; REGALLOC-NEXT: renamable $sgpr6 = IMPLICIT_DEF
- ; REGALLOC-NEXT: renamable $vgpr0 = COPY killed renamable $sgpr6
- ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr0, %stack.3, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.3, addrspace 5)
+ ; REGALLOC-NEXT: renamable $vgpr0 = IMPLICIT_DEF
; REGALLOC-NEXT: renamable $sgpr6_sgpr7 = COPY $exec, implicit-def $exec
; REGALLOC-NEXT: renamable $sgpr4_sgpr5 = S_AND_B64 renamable $sgpr6_sgpr7, killed renamable $sgpr4_sgpr5, implicit-def dead $scc
; REGALLOC-NEXT: renamable $sgpr6_sgpr7 = S_XOR_B64 renamable $sgpr4_sgpr5, killed renamable $sgpr6_sgpr7, implicit-def dead $scc
@@ -33,8 +31,8 @@ define i32 @prolog_spill(i32 %arg0, i32 %arg1, i32 %arg2) {
; REGALLOC-NEXT: $sgpr4 = SI_RESTORE_S32_FROM_VGPR $vgpr63, 0, implicit-def $sgpr4_sgpr5
; REGALLOC-NEXT: $sgpr5 = SI_RESTORE_S32_FROM_VGPR $vgpr63, 1
; REGALLOC-NEXT: renamable $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 killed renamable $sgpr4_sgpr5, implicit-def $exec, implicit-def dead $scc, implicit $exec
- ; REGALLOC-NEXT: $vgpr0 = SI_SPILL_V32_RESTORE %stack.3, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)
- ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr0, %stack.6, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.6, addrspace 5)
+ ; REGALLOC-NEXT: $vgpr0 = SI_SPILL_V32_RESTORE %stack.6, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.6, addrspace 5)
+ ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr0, %stack.5, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.5, addrspace 5)
; REGALLOC-NEXT: renamable $sgpr4_sgpr5 = S_AND_B64 $exec, killed renamable $sgpr4_sgpr5, implicit-def dead $scc
; REGALLOC-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr4, 2, $vgpr63, implicit-def $sgpr4_sgpr5, implicit $sgpr4_sgpr5
; REGALLOC-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr5, 3, $vgpr63, implicit $sgpr4_sgpr5
@@ -46,19 +44,19 @@ define i32 @prolog_spill(i32 %arg0, i32 %arg1, i32 %arg2) {
; REGALLOC-NEXT: bb.2.bb.1:
; REGALLOC-NEXT: successors: %bb.4(0x80000000)
; REGALLOC-NEXT: {{ $}}
- ; REGALLOC-NEXT: $vgpr0 = SI_SPILL_V32_RESTORE %stack.4, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.4, addrspace 5)
+ ; REGALLOC-NEXT: $vgpr0 = SI_SPILL_V32_RESTORE %stack.3, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)
; REGALLOC-NEXT: renamable $sgpr4 = S_MOV_B32 10
; REGALLOC-NEXT: renamable $vgpr0 = V_ADD_U32_e64 $vgpr0, killed $sgpr4, 0, implicit $exec
- ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr0, %stack.6, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.6, addrspace 5)
+ ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr0, %stack.5, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.5, addrspace 5)
; REGALLOC-NEXT: S_BRANCH %bb.4
; REGALLOC-NEXT: {{ $}}
; REGALLOC-NEXT: bb.3.bb.2:
; REGALLOC-NEXT: successors: %bb.1(0x80000000)
; REGALLOC-NEXT: {{ $}}
- ; REGALLOC-NEXT: $vgpr0 = SI_SPILL_V32_RESTORE %stack.5, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.5, addrspace 5)
+ ; REGALLOC-NEXT: $vgpr0 = SI_SPILL_V32_RESTORE %stack.4, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.4, addrspace 5)
; REGALLOC-NEXT: renamable $sgpr4 = S_MOV_B32 20
; REGALLOC-NEXT: renamable $vgpr0 = V_ADD_U32_e64 $vgpr0, killed $sgpr4, 0, implicit $exec
- ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr0, %stack.3, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.3, addrspace 5)
+ ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr0, %stack.6, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.6, addrspace 5)
; REGALLOC-NEXT: S_BRANCH %bb.1
; REGALLOC-NEXT: {{ $}}
; REGALLOC-NEXT: bb.4.bb.3:
@@ -66,7 +64,7 @@ define i32 @prolog_spill(i32 %arg0, i32 %arg1, i32 %arg2) {
; REGALLOC-NEXT: $sgpr4 = SI_RESTORE_S32_FROM_VGPR $vgpr63, 2, implicit-def $sgpr4_sgpr5
; REGALLOC-NEXT: $sgpr5 = SI_RESTORE_S32_FROM_VGPR killed $vgpr63, 3
; REGALLOC-NEXT: $exec = S_OR_B64 $exec, killed renamable $sgpr4_sgpr5, implicit-def dead $scc
- ; REGALLOC-NEXT: $vgpr0 = SI_SPILL_V32_RESTORE %stack.6, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.6, addrspace 5)
+ ; REGALLOC-NEXT: $vgpr0 = SI_SPILL_V32_RESTORE %stack.5, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.5, addrspace 5)
; REGALLOC-NEXT: renamable $vgpr0 = V_LSHL_ADD_U32_e64 killed $vgpr0, 2, $vgpr0, implicit $exec
; REGALLOC-NEXT: SI_RETURN implicit killed $vgpr0
bb.0:
diff --git a/llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-flat.ll b/llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-flat.ll
index fdae1696a5a49..3305cac0d7ea6 100644
--- a/llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-flat.ll
+++ b/llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-flat.ll
@@ -73,76 +73,76 @@ define void @test_sinkable_flat_small_offset_i32(ptr %out, ptr %in, i32 %cond) {
; GFX7-LABEL: test_sinkable_flat_small_offset_i32:
; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT: v_mov_b32_e32 v5, 0
; GFX7-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
-; GFX7-NEXT: v_mov_b32_e32 v4, 0
; GFX7-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX7-NEXT: s_cbranch_execz .LBB0_2
; GFX7-NEXT: ; %bb.1: ; %if
; GFX7-NEXT: v_add_i32_e32 v2, vcc, 28, v2
; GFX7-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
-; GFX7-NEXT: flat_load_dword v4, v[2:3]
+; GFX7-NEXT: flat_load_dword v5, v[2:3]
; GFX7-NEXT: .LBB0_2: ; %endif
; GFX7-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX7-NEXT: v_add_i32_e32 v0, vcc, 0x3d08fc, v0
; GFX7-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX7-NEXT: flat_store_dword v[0:1], v4
+; GFX7-NEXT: flat_store_dword v[0:1], v5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7-NEXT: s_setpc_b64 s[30:31]
;
; GFX8-LABEL: test_sinkable_flat_small_offset_i32:
; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT: v_mov_b32_e32 v5, 0
; GFX8-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
-; GFX8-NEXT: v_mov_b32_e32 v4, 0
; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX8-NEXT: s_cbranch_execz .LBB0_2
; GFX8-NEXT: ; %bb.1: ; %if
; GFX8-NEXT: v_add_u32_e32 v2, vcc, 28, v2
; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
-; GFX8-NEXT: flat_load_dword v4, v[2:3]
+; GFX8-NEXT: flat_load_dword v5, v[2:3]
; GFX8-NEXT: .LBB0_2: ; %endif
; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX8-NEXT: v_add_u32_e32 v0, vcc, 0x3d08fc, v0
; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX8-NEXT: flat_store_dword v[0:1], v4
+; GFX8-NEXT: flat_store_dword v[0:1], v5
; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX8-NEXT: s_setpc_b64 s[30:31]
;
; GFX9-LABEL: test_sinkable_flat_small_offset_i32:
; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_mov_b32_e32 v5, 0
; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
-; GFX9-NEXT: v_mov_b32_e32 v4, 0
; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX9-NEXT: s_cbranch_execz .LBB0_2
; GFX9-NEXT: ; %bb.1: ; %if
-; GFX9-NEXT: flat_load_dword v4, v[2:3] offset:28
+; GFX9-NEXT: flat_load_dword v5, v[2:3] offset:28
; GFX9-NEXT: .LBB0_2: ; %endif
; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x3d0000, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX9-NEXT: flat_store_dword v[0:1], v4 offset:2300
+; GFX9-NEXT: flat_store_dword v[0:1], v5 offset:2300
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]
;
; GFX10-LABEL: test_sinkable_flat_small_offset_i32:
; GFX10: ; %bb.0: ; %entry
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v4
-; GFX10-NEXT: v_mov_b32_e32 v4, 0
-; GFX10-NEXT: s_and_saveexec_b32 s4, vcc_lo
+; GFX10-NEXT: v_mov_b32_e32 v5, 0
+; GFX10-NEXT: s_mov_b32 s4, exec_lo
+; GFX10-NEXT: v_cmpx_ne_u32_e32 0, v4
; GFX10-NEXT: s_cbranch_execz .LBB0_2
; GFX10-NEXT: ; %bb.1: ; %if
-; GFX10-NEXT: flat_load_dword v4, v[2:3] offset:28
+; GFX10-NEXT: flat_load_dword v5, v[2:3] offset:28
; GFX10-NEXT: .LBB0_2: ; %endif
; GFX10-NEXT: s_or_b32 exec_lo, exec_lo, s4
; GFX10-NEXT: v_add_co_u32 v0, vcc_lo, 0x3d0800, v0
; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
; GFX10-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX10-NEXT: flat_store_dword v[0:1], v4 offset:252
+; GFX10-NEXT: flat_store_dword v[0:1], v5 offset:252
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]
entry:
@@ -228,78 +228,78 @@ define void @test_sink_noop_addrspacecast_flat_to_global_i32(ptr %out, ptr %in,
; GFX7-LABEL: test_sink_noop_addrspacecast_flat_to_global_i32:
; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-NEXT: s_mov_b32 s6, 0
+; GFX7-NEXT: v_mov_b32_e32 v5, 0
; GFX7-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
-; GFX7-NEXT: v_mov_b32_e32 v4, 0
-; GFX7-NEXT: s_and_saveexec_b64 s[8:9], vcc
+; GFX7-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX7-NEXT: s_cbranch_execz .LBB1_2
; GFX7-NEXT: ; %bb.1: ; %if
-; GFX7-NEXT: s_mov_b32 s7, 0xf000
-; GFX7-NEXT: s_mov_b32 s4, s6
-; GFX7-NEXT: s_mov_b32 s5, s6
-; GFX7-NEXT: buffer_load_dword v4, v[2:3], s[4:7], 0 addr64 offset:28
+; GFX7-NEXT: s_mov_b32 s10, 0
+; GFX7-NEXT: s_mov_b32 s11, 0xf000
+; GFX7-NEXT: s_mov_b32 s8, s10
+; GFX7-NEXT: s_mov_b32 s9, s10
+; GFX7-NEXT: buffer_load_dword v5, v[2:3], s[8:11], 0 addr64 offset:28
; GFX7-NEXT: .LBB1_2: ; %endif
-; GFX7-NEXT: s_or_b64 exec, exec, s[8:9]
+; GFX7-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX7-NEXT: v_add_i32_e32 v0, vcc, 0x3d08fc, v0
; GFX7-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; GFX7-NEXT: s_waitcnt vmcnt(0)
-; GFX7-NEXT: flat_store_dword v[0:1], v4
+; GFX7-NEXT: flat_store_dword v[0:1], v5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7-NEXT: s_setpc_b64 s[30:31]
;
; GFX8-LABEL: test_sink_noop_addrspacecast_flat_to_global_i32:
; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT: v_mov_b32_e32 v5, 0
; GFX8-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
-; GFX8-NEXT: v_mov_b32_e32 v4, 0
; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX8-NEXT: s_cbranch_execz .LBB1_2
; GFX8-NEXT: ; %bb.1: ; %if
; GFX8-NEXT: v_add_u32_e32 v2, vcc, 28, v2
; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
-; GFX8-NEXT: flat_load_dword v4, v[2:3]
+; GFX8-NEXT: flat_load_dword v5, v[2:3]
; GFX8-NEXT: .LBB1_2: ; %endif
; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX8-NEXT: v_add_u32_e32 v0, vcc, 0x3d08fc, v0
; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; GFX8-NEXT: s_waitcnt vmcnt(0)
-; GFX8-NEXT: flat_store_dword v[0:1], v4
+; GFX8-NEXT: flat_store_dword v[0:1], v5
; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX8-NEXT: s_setpc_b64 s[30:31]
;
; GFX9-LABEL: test_sink_noop_addrspacecast_flat_to_global_i32:
; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_mov_b32_e32 v5, 0
; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
-; GFX9-NEXT: v_mov_b32_e32 v4, 0
; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX9-NEXT: s_cbranch_execz .LBB1_2
; GFX9-NEXT: ; %bb.1: ; %if
-; GFX9-NEXT: global_load_dword v4, v[2:3], off offset:28
+; GFX9-NEXT: global_load_dword v5, v[2:3], off offset:28
; GFX9-NEXT: .LBB1_2: ; %endif
; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x3d0000, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: s_waitcnt vmcnt(0)
-; GFX9-NEXT: flat_store_dword v[0:1], v4 offset:2300
+; GFX9-NEXT: flat_store_dword v[0:1], v5 offset:2300
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]
;
; GFX10-LABEL: test_sink_noop_addrspacecast_flat_to_global_i32:
; GFX10: ; %bb.0: ; %entry
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v4
-; GFX10-NEXT: v_mov_b32_e32 v4, 0
-; GFX10-NEXT: s_and_saveexec_b32 s4, vcc_lo
+; GFX10-NEXT: v_mov_b32_e32 v5, 0
+; GFX10-NEXT: s_mov_b32 s4, exec_lo
+; GFX10-NEXT: v_cmpx_ne_u32_e32 0, v4
; GFX10-NEXT: s_cbranch_execz .LBB1_2
; GFX10-NEXT: ; %bb.1: ; %if
-; GFX10-NEXT: global_load_dword v4, v[2:3], off offset:28
+; GFX10-NEXT: global_load_dword v5, v[2:3], off offset:28
; GFX10-NEXT: .LBB1_2: ; %endif
; GFX10-NEXT: s_or_b32 exec_lo, exec_lo, s4
; GFX10-NEXT: v_add_co_u32 v0, vcc_lo, 0x3d0800, v0
; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
; GFX10-NEXT: s_waitcnt vmcnt(0)
-; GFX10-NEXT: flat_store_dword v[0:1], v4 offset:252
+; GFX10-NEXT: flat_store_dword v[0:1], v5 offset:252
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]
entry:
@@ -341,78 +341,78 @@ define void @test_sink_noop_addrspacecast_flat_to_constant_i32(ptr %out, ptr %in
; GFX7-LABEL: test_sink_noop_addrspacecast_flat_to_constant_i32:
; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-NEXT: s_mov_b32 s6, 0
+; GFX7-NEXT: v_mov_b32_e32 v5, 0
; GFX7-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
-; GFX7-NEXT: v_mov_b32_e32 v4, 0
-; GFX7-NEXT: s_and_saveexec_b64 s[8:9], vcc
+; GFX7-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX7-NEXT: s_cbranch_execz .LBB2_2
; GFX7-NEXT: ; %bb.1: ; %if
-; GFX7-NEXT: s_mov_b32 s7, 0xf000
-; GFX7-NEXT: s_mov_b32 s4, s6
-; GFX7-NEXT: s_mov_b32 s5, s6
-; GFX7-NEXT: buffer_load_dword v4, v[2:3], s[4:7], 0 addr64 offset:28
+; GFX7-NEXT: s_mov_b32 s10, 0
+; GFX7-NEXT: s_mov_b32 s11, 0xf000
+; GFX7-NEXT: s_mov_b32 s8, s10
+; GFX7-NEXT: s_mov_b32 s9, s10
+; GFX7-NEXT: buffer_load_dword v5, v[2:3], s[8:11], 0 addr64 offset:28
; GFX7-NEXT: .LBB2_2: ; %endif
-; GFX7-NEXT: s_or_b64 exec, exec, s[8:9]
+; GFX7-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX7-NEXT: v_add_i32_e32 v0, vcc, 0x3d08fc, v0
; GFX7-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; GFX7-NEXT: s_waitcnt vmcnt(0)
-; GFX7-NEXT: flat_store_dword v[0:1], v4
+; GFX7-NEXT: flat_store_dword v[0:1], v5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7-NEXT: s_setpc_b64 s[30:31]
;
; GFX8-LABEL: test_sink_noop_addrspacecast_flat_to_constant_i32:
; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT: v_mov_b32_e32 v5, 0
; GFX8-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
-; GFX8-NEXT: v_mov_b32_e32 v4, 0
; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX8-NEXT: s_cbranch_execz .LBB2_2
; GFX8-NEXT: ; %bb.1: ; %if
; GFX8-NEXT: v_add_u32_e32 v2, vcc, 28, v2
; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
-; GFX8-NEXT: flat_load_dword v4, v[2:3]
+; GFX8-NEXT: flat_load_dword v5, v[2:3]
; GFX8-NEXT: .LBB2_2: ; %endif
; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX8-NEXT: v_add_u32_e32 v0, vcc, 0x3d08fc, v0
; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; GFX8-NEXT: s_waitcnt vmcnt(0)
-; GFX8-NEXT: flat_store_dword v[0:1], v4
+; GFX8-NEXT: flat_store_dword v[0:1], v5
; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX8-NEXT: s_setpc_b64 s[30:31]
;
; GFX9-LABEL: test_sink_noop_addrspacecast_flat_to_constant_i32:
; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_mov_b32_e32 v5, 0
; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
-; GFX9-NEXT: v_mov_b32_e32 v4, 0
; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX9-NEXT: s_cbranch_execz .LBB2_2
; GFX9-NEXT: ; %bb.1: ; %if
-; GFX9-NEXT: global_load_dword v4, v[2:3], off offset:28
+; GFX9-NEXT: global_load_dword v5, v[2:3], off offset:28
; GFX9-NEXT: .LBB2_2: ; %endif
; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x3d0000, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: s_waitcnt vmcnt(0)
-; GFX9-NEXT: flat_store_dword v[0:1], v4 offset:2300
+; GFX9-NEXT: flat_store_dword v[0:1], v5 offset:2300
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]
;
; GFX10-LABEL: test_sink_noop_addrspacecast_flat_to_constant_i32:
; GFX10: ; %bb.0: ; %entry
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v4
-; GFX10-NEXT: v_mov_b32_e32 v4, 0
-; GFX10-NEXT: s_and_saveexec_b32 s4, vcc_lo
+; GFX10-NEXT: v_mov_b32_e32 v5, 0
+; GFX10-NEXT: s_mov_b32 s4, exec_lo
+; GFX10-NEXT: v_cmpx_ne_u32_e32 0, v4
; GFX10-NEXT: s_cbranch_execz .LBB2_2
; GFX10-NEXT: ; %bb.1: ; %if
-; GFX10-NEXT: global_load_dword v4, v[2:3], off offset:28
+; GFX10-NEXT: global_load_dword v5, v[2:3], off offset:28
; GFX10-NEXT: .LBB2_2: ; %endif
; GFX10-NEXT: s_or_b32 exec_lo, exec_lo, s4
; GFX10-NEXT: v_add_co_u32 v0, vcc_lo, 0x3d0800, v0
; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
; GFX10-NEXT: s_waitcnt vmcnt(0)
-; GFX10-NEXT: flat_store_dword v[0:1], v4 offset:252
+; GFX10-NEXT: flat_store_dword v[0:1], v5 offset:252
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]
entry:
@@ -570,10 +570,10 @@ define void @test_sink_flat_small_max_flat_offset(ptr %out, ptr %in) #1 {
; GFX10-LABEL: test_sink_flat_small_max_flat_offset:
; GFX10: ; %bb.0: ; %entry
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT: v_mbcnt_lo_u32_b32 v4, -1, 0
-; GFX10-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v4
+; GFX10-NEXT: v_mbcnt_lo_u32_b32 v5, -1, 0
; GFX10-NEXT: v_mov_b32_e32 v4, 0
-; GFX10-NEXT: s_and_saveexec_b32 s4, vcc_lo
+; GFX10-NEXT: s_mov_b32 s4, exec_lo
+; GFX10-NEXT: v_cmpx_ne_u32_e32 0, v5
; GFX10-NEXT: s_cbranch_execz .LBB3_2
; GFX10-NEXT: ; %bb.1: ; %if
; GFX10-NEXT: v_add_co_u32 v2, vcc_lo, 0x800, v2
@@ -693,10 +693,10 @@ define void @test_sink_flat_small_max_plus_1_flat_offset(ptr %out, ptr %in) #1 {
; GFX10-LABEL: test_sink_fl...
[truncated]
|
@llvm/pr-subscribers-llvm-selectiondag Author: Matt Arsenault (arsenm) ChangesFor most targets, the register class comes from the type so this This avoids an intermediate s_mov_b32 plus a copy in some cases. These This only adjusts the constant input case. It may make sense to do Patch is 160.17 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/129464.diff 34 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index ea28f7262de54..5687147151e9d 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -12022,7 +12022,7 @@ SelectionDAGBuilder::HandlePHINodesInSuccessorBlocks(const BasicBlock *LLVMBB) {
if (const auto *C = dyn_cast<Constant>(PHIOp)) {
unsigned &RegOut = ConstantsOut[C];
if (RegOut == 0) {
- RegOut = FuncInfo.CreateRegs(C);
+ RegOut = FuncInfo.CreateRegs(&PN);
// We need to zero/sign extend ConstantInt phi operands to match
// assumptions in FunctionLoweringInfo::ComputePHILiveOutRegInfo.
ISD::NodeType ExtendType = ISD::ANY_EXTEND;
diff --git a/llvm/test/CodeGen/AMDGPU/bb-prolog-spill-during-regalloc.ll b/llvm/test/CodeGen/AMDGPU/bb-prolog-spill-during-regalloc.ll
index 9988b2fa1eaf0..55a560c8d9b2f 100644
--- a/llvm/test/CodeGen/AMDGPU/bb-prolog-spill-during-regalloc.ll
+++ b/llvm/test/CodeGen/AMDGPU/bb-prolog-spill-during-regalloc.ll
@@ -8,13 +8,11 @@ define i32 @prolog_spill(i32 %arg0, i32 %arg1, i32 %arg2) {
; REGALLOC-NEXT: successors: %bb.3(0x40000000), %bb.1(0x40000000)
; REGALLOC-NEXT: liveins: $vgpr0, $vgpr1, $vgpr2
; REGALLOC-NEXT: {{ $}}
- ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr2, %stack.5, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.5, addrspace 5)
- ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr1, %stack.4, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.4, addrspace 5)
+ ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr2, %stack.4, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.4, addrspace 5)
+ ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr1, %stack.3, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.3, addrspace 5)
; REGALLOC-NEXT: renamable $sgpr4 = S_MOV_B32 49
; REGALLOC-NEXT: renamable $sgpr4_sgpr5 = V_CMP_GT_I32_e64 killed $vgpr0, killed $sgpr4, implicit $exec
- ; REGALLOC-NEXT: renamable $sgpr6 = IMPLICIT_DEF
- ; REGALLOC-NEXT: renamable $vgpr0 = COPY killed renamable $sgpr6
- ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr0, %stack.3, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.3, addrspace 5)
+ ; REGALLOC-NEXT: renamable $vgpr0 = IMPLICIT_DEF
; REGALLOC-NEXT: renamable $sgpr6_sgpr7 = COPY $exec, implicit-def $exec
; REGALLOC-NEXT: renamable $sgpr4_sgpr5 = S_AND_B64 renamable $sgpr6_sgpr7, killed renamable $sgpr4_sgpr5, implicit-def dead $scc
; REGALLOC-NEXT: renamable $sgpr6_sgpr7 = S_XOR_B64 renamable $sgpr4_sgpr5, killed renamable $sgpr6_sgpr7, implicit-def dead $scc
@@ -33,8 +31,8 @@ define i32 @prolog_spill(i32 %arg0, i32 %arg1, i32 %arg2) {
; REGALLOC-NEXT: $sgpr4 = SI_RESTORE_S32_FROM_VGPR $vgpr63, 0, implicit-def $sgpr4_sgpr5
; REGALLOC-NEXT: $sgpr5 = SI_RESTORE_S32_FROM_VGPR $vgpr63, 1
; REGALLOC-NEXT: renamable $sgpr4_sgpr5 = S_OR_SAVEEXEC_B64 killed renamable $sgpr4_sgpr5, implicit-def $exec, implicit-def dead $scc, implicit $exec
- ; REGALLOC-NEXT: $vgpr0 = SI_SPILL_V32_RESTORE %stack.3, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)
- ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr0, %stack.6, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.6, addrspace 5)
+ ; REGALLOC-NEXT: $vgpr0 = SI_SPILL_V32_RESTORE %stack.6, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.6, addrspace 5)
+ ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr0, %stack.5, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.5, addrspace 5)
; REGALLOC-NEXT: renamable $sgpr4_sgpr5 = S_AND_B64 $exec, killed renamable $sgpr4_sgpr5, implicit-def dead $scc
; REGALLOC-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR killed $sgpr4, 2, $vgpr63, implicit-def $sgpr4_sgpr5, implicit $sgpr4_sgpr5
; REGALLOC-NEXT: $vgpr63 = SI_SPILL_S32_TO_VGPR $sgpr5, 3, $vgpr63, implicit $sgpr4_sgpr5
@@ -46,19 +44,19 @@ define i32 @prolog_spill(i32 %arg0, i32 %arg1, i32 %arg2) {
; REGALLOC-NEXT: bb.2.bb.1:
; REGALLOC-NEXT: successors: %bb.4(0x80000000)
; REGALLOC-NEXT: {{ $}}
- ; REGALLOC-NEXT: $vgpr0 = SI_SPILL_V32_RESTORE %stack.4, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.4, addrspace 5)
+ ; REGALLOC-NEXT: $vgpr0 = SI_SPILL_V32_RESTORE %stack.3, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.3, addrspace 5)
; REGALLOC-NEXT: renamable $sgpr4 = S_MOV_B32 10
; REGALLOC-NEXT: renamable $vgpr0 = V_ADD_U32_e64 $vgpr0, killed $sgpr4, 0, implicit $exec
- ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr0, %stack.6, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.6, addrspace 5)
+ ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr0, %stack.5, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.5, addrspace 5)
; REGALLOC-NEXT: S_BRANCH %bb.4
; REGALLOC-NEXT: {{ $}}
; REGALLOC-NEXT: bb.3.bb.2:
; REGALLOC-NEXT: successors: %bb.1(0x80000000)
; REGALLOC-NEXT: {{ $}}
- ; REGALLOC-NEXT: $vgpr0 = SI_SPILL_V32_RESTORE %stack.5, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.5, addrspace 5)
+ ; REGALLOC-NEXT: $vgpr0 = SI_SPILL_V32_RESTORE %stack.4, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.4, addrspace 5)
; REGALLOC-NEXT: renamable $sgpr4 = S_MOV_B32 20
; REGALLOC-NEXT: renamable $vgpr0 = V_ADD_U32_e64 $vgpr0, killed $sgpr4, 0, implicit $exec
- ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr0, %stack.3, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.3, addrspace 5)
+ ; REGALLOC-NEXT: SI_SPILL_V32_SAVE killed $vgpr0, %stack.6, $sgpr32, 0, implicit $exec :: (store (s32) into %stack.6, addrspace 5)
; REGALLOC-NEXT: S_BRANCH %bb.1
; REGALLOC-NEXT: {{ $}}
; REGALLOC-NEXT: bb.4.bb.3:
@@ -66,7 +64,7 @@ define i32 @prolog_spill(i32 %arg0, i32 %arg1, i32 %arg2) {
; REGALLOC-NEXT: $sgpr4 = SI_RESTORE_S32_FROM_VGPR $vgpr63, 2, implicit-def $sgpr4_sgpr5
; REGALLOC-NEXT: $sgpr5 = SI_RESTORE_S32_FROM_VGPR killed $vgpr63, 3
; REGALLOC-NEXT: $exec = S_OR_B64 $exec, killed renamable $sgpr4_sgpr5, implicit-def dead $scc
- ; REGALLOC-NEXT: $vgpr0 = SI_SPILL_V32_RESTORE %stack.6, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.6, addrspace 5)
+ ; REGALLOC-NEXT: $vgpr0 = SI_SPILL_V32_RESTORE %stack.5, $sgpr32, 0, implicit $exec :: (load (s32) from %stack.5, addrspace 5)
; REGALLOC-NEXT: renamable $vgpr0 = V_LSHL_ADD_U32_e64 killed $vgpr0, 2, $vgpr0, implicit $exec
; REGALLOC-NEXT: SI_RETURN implicit killed $vgpr0
bb.0:
diff --git a/llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-flat.ll b/llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-flat.ll
index fdae1696a5a49..3305cac0d7ea6 100644
--- a/llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-flat.ll
+++ b/llvm/test/CodeGen/AMDGPU/cgp-addressing-modes-flat.ll
@@ -73,76 +73,76 @@ define void @test_sinkable_flat_small_offset_i32(ptr %out, ptr %in, i32 %cond) {
; GFX7-LABEL: test_sinkable_flat_small_offset_i32:
; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX7-NEXT: v_mov_b32_e32 v5, 0
; GFX7-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
-; GFX7-NEXT: v_mov_b32_e32 v4, 0
; GFX7-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX7-NEXT: s_cbranch_execz .LBB0_2
; GFX7-NEXT: ; %bb.1: ; %if
; GFX7-NEXT: v_add_i32_e32 v2, vcc, 28, v2
; GFX7-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
-; GFX7-NEXT: flat_load_dword v4, v[2:3]
+; GFX7-NEXT: flat_load_dword v5, v[2:3]
; GFX7-NEXT: .LBB0_2: ; %endif
; GFX7-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX7-NEXT: v_add_i32_e32 v0, vcc, 0x3d08fc, v0
; GFX7-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX7-NEXT: flat_store_dword v[0:1], v4
+; GFX7-NEXT: flat_store_dword v[0:1], v5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7-NEXT: s_setpc_b64 s[30:31]
;
; GFX8-LABEL: test_sinkable_flat_small_offset_i32:
; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT: v_mov_b32_e32 v5, 0
; GFX8-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
-; GFX8-NEXT: v_mov_b32_e32 v4, 0
; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX8-NEXT: s_cbranch_execz .LBB0_2
; GFX8-NEXT: ; %bb.1: ; %if
; GFX8-NEXT: v_add_u32_e32 v2, vcc, 28, v2
; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
-; GFX8-NEXT: flat_load_dword v4, v[2:3]
+; GFX8-NEXT: flat_load_dword v5, v[2:3]
; GFX8-NEXT: .LBB0_2: ; %endif
; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX8-NEXT: v_add_u32_e32 v0, vcc, 0x3d08fc, v0
; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX8-NEXT: flat_store_dword v[0:1], v4
+; GFX8-NEXT: flat_store_dword v[0:1], v5
; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX8-NEXT: s_setpc_b64 s[30:31]
;
; GFX9-LABEL: test_sinkable_flat_small_offset_i32:
; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_mov_b32_e32 v5, 0
; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
-; GFX9-NEXT: v_mov_b32_e32 v4, 0
; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX9-NEXT: s_cbranch_execz .LBB0_2
; GFX9-NEXT: ; %bb.1: ; %if
-; GFX9-NEXT: flat_load_dword v4, v[2:3] offset:28
+; GFX9-NEXT: flat_load_dword v5, v[2:3] offset:28
; GFX9-NEXT: .LBB0_2: ; %endif
; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x3d0000, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX9-NEXT: flat_store_dword v[0:1], v4 offset:2300
+; GFX9-NEXT: flat_store_dword v[0:1], v5 offset:2300
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]
;
; GFX10-LABEL: test_sinkable_flat_small_offset_i32:
; GFX10: ; %bb.0: ; %entry
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v4
-; GFX10-NEXT: v_mov_b32_e32 v4, 0
-; GFX10-NEXT: s_and_saveexec_b32 s4, vcc_lo
+; GFX10-NEXT: v_mov_b32_e32 v5, 0
+; GFX10-NEXT: s_mov_b32 s4, exec_lo
+; GFX10-NEXT: v_cmpx_ne_u32_e32 0, v4
; GFX10-NEXT: s_cbranch_execz .LBB0_2
; GFX10-NEXT: ; %bb.1: ; %if
-; GFX10-NEXT: flat_load_dword v4, v[2:3] offset:28
+; GFX10-NEXT: flat_load_dword v5, v[2:3] offset:28
; GFX10-NEXT: .LBB0_2: ; %endif
; GFX10-NEXT: s_or_b32 exec_lo, exec_lo, s4
; GFX10-NEXT: v_add_co_u32 v0, vcc_lo, 0x3d0800, v0
; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
; GFX10-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
-; GFX10-NEXT: flat_store_dword v[0:1], v4 offset:252
+; GFX10-NEXT: flat_store_dword v[0:1], v5 offset:252
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]
entry:
@@ -228,78 +228,78 @@ define void @test_sink_noop_addrspacecast_flat_to_global_i32(ptr %out, ptr %in,
; GFX7-LABEL: test_sink_noop_addrspacecast_flat_to_global_i32:
; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-NEXT: s_mov_b32 s6, 0
+; GFX7-NEXT: v_mov_b32_e32 v5, 0
; GFX7-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
-; GFX7-NEXT: v_mov_b32_e32 v4, 0
-; GFX7-NEXT: s_and_saveexec_b64 s[8:9], vcc
+; GFX7-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX7-NEXT: s_cbranch_execz .LBB1_2
; GFX7-NEXT: ; %bb.1: ; %if
-; GFX7-NEXT: s_mov_b32 s7, 0xf000
-; GFX7-NEXT: s_mov_b32 s4, s6
-; GFX7-NEXT: s_mov_b32 s5, s6
-; GFX7-NEXT: buffer_load_dword v4, v[2:3], s[4:7], 0 addr64 offset:28
+; GFX7-NEXT: s_mov_b32 s10, 0
+; GFX7-NEXT: s_mov_b32 s11, 0xf000
+; GFX7-NEXT: s_mov_b32 s8, s10
+; GFX7-NEXT: s_mov_b32 s9, s10
+; GFX7-NEXT: buffer_load_dword v5, v[2:3], s[8:11], 0 addr64 offset:28
; GFX7-NEXT: .LBB1_2: ; %endif
-; GFX7-NEXT: s_or_b64 exec, exec, s[8:9]
+; GFX7-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX7-NEXT: v_add_i32_e32 v0, vcc, 0x3d08fc, v0
; GFX7-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; GFX7-NEXT: s_waitcnt vmcnt(0)
-; GFX7-NEXT: flat_store_dword v[0:1], v4
+; GFX7-NEXT: flat_store_dword v[0:1], v5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7-NEXT: s_setpc_b64 s[30:31]
;
; GFX8-LABEL: test_sink_noop_addrspacecast_flat_to_global_i32:
; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT: v_mov_b32_e32 v5, 0
; GFX8-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
-; GFX8-NEXT: v_mov_b32_e32 v4, 0
; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX8-NEXT: s_cbranch_execz .LBB1_2
; GFX8-NEXT: ; %bb.1: ; %if
; GFX8-NEXT: v_add_u32_e32 v2, vcc, 28, v2
; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
-; GFX8-NEXT: flat_load_dword v4, v[2:3]
+; GFX8-NEXT: flat_load_dword v5, v[2:3]
; GFX8-NEXT: .LBB1_2: ; %endif
; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX8-NEXT: v_add_u32_e32 v0, vcc, 0x3d08fc, v0
; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; GFX8-NEXT: s_waitcnt vmcnt(0)
-; GFX8-NEXT: flat_store_dword v[0:1], v4
+; GFX8-NEXT: flat_store_dword v[0:1], v5
; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX8-NEXT: s_setpc_b64 s[30:31]
;
; GFX9-LABEL: test_sink_noop_addrspacecast_flat_to_global_i32:
; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_mov_b32_e32 v5, 0
; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
-; GFX9-NEXT: v_mov_b32_e32 v4, 0
; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX9-NEXT: s_cbranch_execz .LBB1_2
; GFX9-NEXT: ; %bb.1: ; %if
-; GFX9-NEXT: global_load_dword v4, v[2:3], off offset:28
+; GFX9-NEXT: global_load_dword v5, v[2:3], off offset:28
; GFX9-NEXT: .LBB1_2: ; %endif
; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x3d0000, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: s_waitcnt vmcnt(0)
-; GFX9-NEXT: flat_store_dword v[0:1], v4 offset:2300
+; GFX9-NEXT: flat_store_dword v[0:1], v5 offset:2300
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]
;
; GFX10-LABEL: test_sink_noop_addrspacecast_flat_to_global_i32:
; GFX10: ; %bb.0: ; %entry
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v4
-; GFX10-NEXT: v_mov_b32_e32 v4, 0
-; GFX10-NEXT: s_and_saveexec_b32 s4, vcc_lo
+; GFX10-NEXT: v_mov_b32_e32 v5, 0
+; GFX10-NEXT: s_mov_b32 s4, exec_lo
+; GFX10-NEXT: v_cmpx_ne_u32_e32 0, v4
; GFX10-NEXT: s_cbranch_execz .LBB1_2
; GFX10-NEXT: ; %bb.1: ; %if
-; GFX10-NEXT: global_load_dword v4, v[2:3], off offset:28
+; GFX10-NEXT: global_load_dword v5, v[2:3], off offset:28
; GFX10-NEXT: .LBB1_2: ; %endif
; GFX10-NEXT: s_or_b32 exec_lo, exec_lo, s4
; GFX10-NEXT: v_add_co_u32 v0, vcc_lo, 0x3d0800, v0
; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
; GFX10-NEXT: s_waitcnt vmcnt(0)
-; GFX10-NEXT: flat_store_dword v[0:1], v4 offset:252
+; GFX10-NEXT: flat_store_dword v[0:1], v5 offset:252
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]
entry:
@@ -341,78 +341,78 @@ define void @test_sink_noop_addrspacecast_flat_to_constant_i32(ptr %out, ptr %in
; GFX7-LABEL: test_sink_noop_addrspacecast_flat_to_constant_i32:
; GFX7: ; %bb.0: ; %entry
; GFX7-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX7-NEXT: s_mov_b32 s6, 0
+; GFX7-NEXT: v_mov_b32_e32 v5, 0
; GFX7-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
-; GFX7-NEXT: v_mov_b32_e32 v4, 0
-; GFX7-NEXT: s_and_saveexec_b64 s[8:9], vcc
+; GFX7-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX7-NEXT: s_cbranch_execz .LBB2_2
; GFX7-NEXT: ; %bb.1: ; %if
-; GFX7-NEXT: s_mov_b32 s7, 0xf000
-; GFX7-NEXT: s_mov_b32 s4, s6
-; GFX7-NEXT: s_mov_b32 s5, s6
-; GFX7-NEXT: buffer_load_dword v4, v[2:3], s[4:7], 0 addr64 offset:28
+; GFX7-NEXT: s_mov_b32 s10, 0
+; GFX7-NEXT: s_mov_b32 s11, 0xf000
+; GFX7-NEXT: s_mov_b32 s8, s10
+; GFX7-NEXT: s_mov_b32 s9, s10
+; GFX7-NEXT: buffer_load_dword v5, v[2:3], s[8:11], 0 addr64 offset:28
; GFX7-NEXT: .LBB2_2: ; %endif
-; GFX7-NEXT: s_or_b64 exec, exec, s[8:9]
+; GFX7-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX7-NEXT: v_add_i32_e32 v0, vcc, 0x3d08fc, v0
; GFX7-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; GFX7-NEXT: s_waitcnt vmcnt(0)
-; GFX7-NEXT: flat_store_dword v[0:1], v4
+; GFX7-NEXT: flat_store_dword v[0:1], v5
; GFX7-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX7-NEXT: s_setpc_b64 s[30:31]
;
; GFX8-LABEL: test_sink_noop_addrspacecast_flat_to_constant_i32:
; GFX8: ; %bb.0: ; %entry
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT: v_mov_b32_e32 v5, 0
; GFX8-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
-; GFX8-NEXT: v_mov_b32_e32 v4, 0
; GFX8-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX8-NEXT: s_cbranch_execz .LBB2_2
; GFX8-NEXT: ; %bb.1: ; %if
; GFX8-NEXT: v_add_u32_e32 v2, vcc, 28, v2
; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
-; GFX8-NEXT: flat_load_dword v4, v[2:3]
+; GFX8-NEXT: flat_load_dword v5, v[2:3]
; GFX8-NEXT: .LBB2_2: ; %endif
; GFX8-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX8-NEXT: v_add_u32_e32 v0, vcc, 0x3d08fc, v0
; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; GFX8-NEXT: s_waitcnt vmcnt(0)
-; GFX8-NEXT: flat_store_dword v[0:1], v4
+; GFX8-NEXT: flat_store_dword v[0:1], v5
; GFX8-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX8-NEXT: s_setpc_b64 s[30:31]
;
; GFX9-LABEL: test_sink_noop_addrspacecast_flat_to_constant_i32:
; GFX9: ; %bb.0: ; %entry
; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-NEXT: v_mov_b32_e32 v5, 0
; GFX9-NEXT: v_cmp_ne_u32_e32 vcc, 0, v4
-; GFX9-NEXT: v_mov_b32_e32 v4, 0
; GFX9-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GFX9-NEXT: s_cbranch_execz .LBB2_2
; GFX9-NEXT: ; %bb.1: ; %if
-; GFX9-NEXT: global_load_dword v4, v[2:3], off offset:28
+; GFX9-NEXT: global_load_dword v5, v[2:3], off offset:28
; GFX9-NEXT: .LBB2_2: ; %endif
; GFX9-NEXT: s_or_b64 exec, exec, s[4:5]
; GFX9-NEXT: v_add_co_u32_e32 v0, vcc, 0x3d0000, v0
; GFX9-NEXT: v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
; GFX9-NEXT: s_waitcnt vmcnt(0)
-; GFX9-NEXT: flat_store_dword v[0:1], v4 offset:2300
+; GFX9-NEXT: flat_store_dword v[0:1], v5 offset:2300
; GFX9-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
; GFX9-NEXT: s_setpc_b64 s[30:31]
;
; GFX10-LABEL: test_sink_noop_addrspacecast_flat_to_constant_i32:
; GFX10: ; %bb.0: ; %entry
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v4
-; GFX10-NEXT: v_mov_b32_e32 v4, 0
-; GFX10-NEXT: s_and_saveexec_b32 s4, vcc_lo
+; GFX10-NEXT: v_mov_b32_e32 v5, 0
+; GFX10-NEXT: s_mov_b32 s4, exec_lo
+; GFX10-NEXT: v_cmpx_ne_u32_e32 0, v4
; GFX10-NEXT: s_cbranch_execz .LBB2_2
; GFX10-NEXT: ; %bb.1: ; %if
-; GFX10-NEXT: global_load_dword v4, v[2:3], off offset:28
+; GFX10-NEXT: global_load_dword v5, v[2:3], off offset:28
; GFX10-NEXT: .LBB2_2: ; %endif
; GFX10-NEXT: s_or_b32 exec_lo, exec_lo, s4
; GFX10-NEXT: v_add_co_u32 v0, vcc_lo, 0x3d0800, v0
; GFX10-NEXT: v_add_co_ci_u32_e32 v1, vcc_lo, 0, v1, vcc_lo
; GFX10-NEXT: s_waitcnt vmcnt(0)
-; GFX10-NEXT: flat_store_dword v[0:1], v4 offset:252
+; GFX10-NEXT: flat_store_dword v[0:1], v5 offset:252
; GFX10-NEXT: s_waitcnt lgkmcnt(0)
; GFX10-NEXT: s_setpc_b64 s[30:31]
entry:
@@ -570,10 +570,10 @@ define void @test_sink_flat_small_max_flat_offset(ptr %out, ptr %in) #1 {
; GFX10-LABEL: test_sink_flat_small_max_flat_offset:
; GFX10: ; %bb.0: ; %entry
; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT: v_mbcnt_lo_u32_b32 v4, -1, 0
-; GFX10-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v4
+; GFX10-NEXT: v_mbcnt_lo_u32_b32 v5, -1, 0
; GFX10-NEXT: v_mov_b32_e32 v4, 0
-; GFX10-NEXT: s_and_saveexec_b32 s4, vcc_lo
+; GFX10-NEXT: s_mov_b32 s4, exec_lo
+; GFX10-NEXT: v_cmpx_ne_u32_e32 0, v5
; GFX10-NEXT: s_cbranch_execz .LBB3_2
; GFX10-NEXT: ; %bb.1: ; %if
; GFX10-NEXT: v_add_co_u32 v2, vcc_lo, 0x800, v2
@@ -693,10 +693,10 @@ define void @test_sink_flat_small_max_plus_1_flat_offset(ptr %out, ptr %in) #1 {
; GFX10-LABEL: test_sink_fl...
[truncated]
|
f51e3ca
to
a5de66e
Compare
3e1ec4a
to
9813856
Compare
a5de66e
to
d6ca555
Compare
d6ca555
to
0ed5d46
Compare
For most targets, the register class comes from the type so this makes no difference. For AMDGPU, the selected register class depends on the divergence of the value. For a constant phi input, this will always be false. The heuristic for whether to treat the value as a scalar or vector constant based on the uses would then incorrectly think this is a scalar use, when really the phi is a copy from S to V. This avoids an intermediate s_mov_b32 plus a copy in some cases. These would often, but not always, fold out in mi passes. This only adjusts the constant input case. It may make sense to do this for the non-constant case as well.
9813856
to
cba3593
Compare
…lvm#129464)" breaks rocRAND build Reverse, IsConst>::operator*() const [with OptionsT = llvm::ilist_detail::node_options<llvm::MachineInstr, true, true, void, false, void>; bool IsReverse = false; bool IsConst = false; llvm: :ilist_iterator<OptionsT, IsReverse, IsConst>::reference = llvm::MachineInstr&]: Assertion `!NodePtr->isKnownSentinel()' failed. This reverts commit 39bf765.
For most targets, the register class comes from the type so this makes no difference. For AMDGPU, the selected register class depends on the divergence of the value. For a constant phi input, this will always be false. The heuristic for whether to treat the value as a scalar or vector constant based on the uses would then incorrectly think this is a scalar use, when really the phi is a copy from S to V. This avoids an intermediate s_mov_b32 plus a copy in some cases. These would often, but not always, fold out in mi passes. This only adjusts the constant input case. It may make sense to do this for the non-constant case as well.
This commit looks to be the source of a miscompile. To quote the README
I might have a smaller example later, but I thought I'd flag this in the meantime |
Having not looked at the test at all, does #130776 help? |
Doesn't appear to be a fix, sadly |
For most targets, the register class comes from the type so this makes no difference. For AMDGPU, the selected register class depends on the divergence of the value. For a constant phi input, this will always be false. The heuristic for whether to treat the value as a scalar or vector constant based on the uses would then incorrectly think this is a scalar use, when really the phi is a copy from S to V. This avoids an intermediate s_mov_b32 plus a copy in some cases. These would often, but not always, fold out in mi passes. This only adjusts the constant input case. It may make sense to do this for the non-constant case as well.
For most targets, the register class comes from the type so this
makes no difference. For AMDGPU, the selected register class depends
on the divergence of the value. For a constant phi input, this will
always be false. The heuristic for whether to treat the value as
a scalar or vector constant based on the uses would then incorrectly
think this is a scalar use, when really the phi is a copy from S to V.
This avoids an intermediate s_mov_b32 plus a copy in some cases. These
would often, but not always, fold out in mi passes.
This only adjusts the constant input case. It may make sense to do
this for the non-constant case as well.