Skip to content

Commit e6f9483

Browse files
authored
[SelectionDAG] Flags are dropped when creating a new FMUL (#66701)
While simplifying some vector operators in DAG combine, we may need to create new instructions for simplified vectors. At that time, we need to make sure that all the flags of the new instruction are copied/modified from the old instruction. If "contract" is dropped from an instruction like FMUL, it may not generate FMA instruction which would impact performance. Here's an example where "contract" flag is dropped when FMUL is created. Replacing.2 t42: v2f32 = fmul contract t41, t38 With: t48: v2f32 = fmul t38, t38 Co-authored-by: Sirish Pande <[email protected]>
1 parent 6b4a1f2 commit e6f9483

File tree

2 files changed

+7
-7
lines changed

2 files changed

+7
-7
lines changed

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2990,8 +2990,9 @@ bool TargetLowering::SimplifyDemandedVectorElts(
29902990
SDValue NewOp1 = SimplifyMultipleUseDemandedVectorElts(Op1, DemandedElts,
29912991
TLO.DAG, Depth + 1);
29922992
if (NewOp0 || NewOp1) {
2993-
SDValue NewOp = TLO.DAG.getNode(
2994-
Opcode, SDLoc(Op), VT, NewOp0 ? NewOp0 : Op0, NewOp1 ? NewOp1 : Op1);
2993+
SDValue NewOp =
2994+
TLO.DAG.getNode(Opcode, SDLoc(Op), VT, NewOp0 ? NewOp0 : Op0,
2995+
NewOp1 ? NewOp1 : Op1, Op->getFlags());
29952996
return TLO.CombineTo(Op, NewOp);
29962997
}
29972998
return false;

llvm/test/CodeGen/AMDGPU/fma.ll

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -159,15 +159,14 @@ define float @fold_fmul_distributive(float %x, float %y) {
159159
define amdgpu_kernel void @vec_mul_scalar_add_fma(<2 x float> %a, <2 x float> %b, float %c1, ptr addrspace(1) %inptr) {
160160
; GFX906-LABEL: vec_mul_scalar_add_fma:
161161
; GFX906: ; %bb.0:
162+
; GFX906-NEXT: s_load_dword s8, s[0:1], 0x34
162163
; GFX906-NEXT: s_load_dwordx4 s[4:7], s[0:1], 0x24
163-
; GFX906-NEXT: s_waitcnt lgkmcnt(0)
164-
; GFX906-NEXT: s_load_dword s5, s[0:1], 0x34
165164
; GFX906-NEXT: s_load_dwordx2 s[2:3], s[0:1], 0x3c
166165
; GFX906-NEXT: v_mov_b32_e32 v0, 0
167-
; GFX906-NEXT: v_mov_b32_e32 v1, s6
168-
; GFX906-NEXT: v_mul_f32_e32 v1, s4, v1
169166
; GFX906-NEXT: s_waitcnt lgkmcnt(0)
170-
; GFX906-NEXT: v_add_f32_e32 v1, s5, v1
167+
; GFX906-NEXT: v_mov_b32_e32 v1, s8
168+
; GFX906-NEXT: v_mov_b32_e32 v2, s6
169+
; GFX906-NEXT: v_fmac_f32_e32 v1, s4, v2
171170
; GFX906-NEXT: global_store_dword v0, v1, s[2:3] offset:4
172171
; GFX906-NEXT: s_endpgm
173172
%gep = getelementptr float, ptr addrspace(1) %inptr, i32 1

0 commit comments

Comments
 (0)