Skip to content

Commit 25506f4

Browse files
authored
[SDISel][Combine] Constant fold FP16_TO_FP (#94790)
In some case, constant can survive early constant folding optimization because they are hidden behind several layers of type changes. E.g., consider the following sequence (extracted from the arm test that this commit changes): ``` t2: v1f16 = BUILD_VECTOR ConstantFP:f16<APFloat(0)> t4: v1f16 = insert_vector_elt t2, ConstantFP:f16<APFloat(0)>, Constant:i32<0> t5: f16 = bitcast t4 t6: f32 = fp_extend t5 ``` Because the constant (APFloat(0)) is hidden behind a <1 x ty> type, all the constant folding that normally happen for scalar nodes when using `SelectionDAG::getNode` are blocked. As a result the constant manages to survive as an actual conversion instruction down to the select phase: ``` t11: f32 = fp16_to_fp Constant:i32<0> ``` With the change in this patch, we try to do constant folding one more time during dag combine, which in the motivating example result in the much better sequence: ``` t7: ch = CopyToReg t0, Register:f32 %0, ConstantFP:f32<0.000000e+00> ``` Note: I'm sure we have this problem in a lot of other places. Generally speaking I believe SDISel is not that good with <1 x ty> compared to pure scalar. However, I only changed what I could easily test.
1 parent 9ddc014 commit 25506f4

File tree

4 files changed

+9
-8
lines changed

4 files changed

+9
-8
lines changed

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26586,7 +26586,12 @@ SDValue DAGCombiner::visitFP16_TO_FP(SDNode *N) {
2658626586
}
2658726587
}
2658826588

26589-
return SDValue();
26589+
// Sometimes constants manage to survive very late in the pipeline, e.g.,
26590+
// because they are wrapped inside the <1 x f16> type. Try one last time to
26591+
// get rid of them.
26592+
SDValue Folded = DAG.FoldConstantArithmetic(N->getOpcode(), SDLoc(N),
26593+
N->getValueType(0), {N0});
26594+
return Folded;
2659026595
}
2659126596

2659226597
SDValue DAGCombiner::visitFP_TO_BF16(SDNode *N) {

llvm/test/CodeGen/AMDGPU/clamp-modifier.ll

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1489,9 +1489,8 @@ define amdgpu_kernel void @v_no_clamp_add_src_v2f16_f16_src(ptr addrspace(1) %ou
14891489
; SI-NEXT: s_waitcnt lgkmcnt(0)
14901490
; SI-NEXT: s_mov_b64 s[4:5], s[2:3]
14911491
; SI-NEXT: buffer_load_ushort v1, v[1:2], s[4:7], 0 addr64
1492-
; SI-NEXT: v_cvt_f32_f16_e64 v3, s6 clamp
1492+
; SI-NEXT: v_cvt_f16_f32_e32 v3, 0
14931493
; SI-NEXT: s_mov_b64 s[2:3], s[6:7]
1494-
; SI-NEXT: v_cvt_f16_f32_e32 v3, v3
14951494
; SI-NEXT: s_waitcnt vmcnt(0)
14961495
; SI-NEXT: v_cvt_f32_f16_e32 v1, v1
14971496
; SI-NEXT: v_add_f32_e32 v1, 1.0, v1

llvm/test/CodeGen/AMDGPU/select-phi-s16-fp.ll

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,8 @@ define void @phi_vec1half_to_f32_with_const_folding(ptr addrspace(1) %dst) #0 {
1414
; CHECK: ; %bb.0: ; %entry
1515
; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
1616
; CHECK-NEXT: s_mov_b32 s4, 0
17-
; CHECK-NEXT: v_cvt_f32_f16_e64 v2, s4
1817
; CHECK-NEXT: ; %bb.1: ; %bb
19-
; CHECK-NEXT: v_cvt_f16_f32_e64 v2, v2
18+
; CHECK-NEXT: v_cvt_f16_f32_e64 v2, s4
2019
; CHECK-NEXT: s_mov_b32 s7, 0xf000
2120
; CHECK-NEXT: s_mov_b32 s6, 0
2221
; CHECK-NEXT: s_mov_b32 s4, s6

llvm/test/CodeGen/ARM/arm-half-promote.ll

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -116,9 +116,7 @@ define fastcc { <8 x half>, <8 x half> } @f3() {
116116

117117
define void @extract_insert(ptr %dst) optnone noinline {
118118
; CHECK-LABEL: extract_insert:
119-
; CHECK: movs r1, #0
120-
; CHECK: vmov s0, r1
121-
; CHECK: vcvtb.f32.f16 s0, s0
119+
; CHECK: vmov.i32 d0, #0x0
122120
; CHECK: vcvtb.f16.f32 s0, s0
123121
; CHECK: vmov r1, s0
124122
; CHECK: strh r1, [r0]

0 commit comments

Comments
 (0)