Skip to content

Commit 90fd859

Browse files
committed
[x86] use instruction-level fast-math-flags to drive MachineCombiner
The code changes here are hopefully straightforward: 1. Use MachineInstruction flags to decide if FP ops can be reassociated (use both "reassoc" and "nsz" to be consistent with IR transforms; we probably don't need "nsz", but that's a safer interpretation of the FMF). 2. Check that both nodes allow reassociation to change instructions. This is a stronger requirement than we've usually implemented in IR/DAG, but this is needed to solve the motivating bug (see below), and it seems unlikely to impede optimization at this late stage. 3. Intersect/propagate MachineIR flags to enable further reassociation in MachineCombiner. We managed to make MachineCombiner flexible enough that no changes are needed to that pass itself. So this patch should only affect x86 (assuming no other targets have implemented the hooks using MachineIR flags yet). The motivating example in PR43609 is another case of fast-math transforms interacting badly with special FP ops created during lowering: https://bugs.llvm.org/show_bug.cgi?id=43609 The special fadd ops used for converting int to FP assume that they will not be altered, so those are created without FMF. However, the MachineCombiner pass was being enabled for FP ops using the global/function-level TargetOption for "UnsafeFPMath". We managed to run instruction/node-level FMF all the way down to MachineIR sometime in the last 1-2 years though, so we can do better now. The test diffs require some explanation: 1. llvm/test/CodeGen/X86/fmf-flags.ll - no target option for unsafe math was specified here, so MachineCombiner kicks in where it did not previously; to make it behave consistently, we need to specify a CPU schedule model, so use the default model, and there are no code diffs. 2. llvm/test/CodeGen/X86/machine-combiner.ll - replace the target option for unsafe math with the equivalent IR-level flags, and there are no code diffs; we can't remove the NaN/nsz options because those are still used to drive x86 fmin/fmax codegen (special SDAG opcodes). 3. llvm/test/CodeGen/X86/pow.ll - similar to #1 4. llvm/test/CodeGen/X86/sqrt-fastmath.ll - similar to #1, but MachineCombiner does some reassociation of the estimate sequence ops; presumably these are perf wins based on latency/throughput (and we get some reduction of move instructions too); I'm not sure how it affects numerical accuracy, but the test reflects reality better now because we would expect MachineCombiner to be enabled if the IR was generated via something like "-ffast-math" with clang. 5. llvm/test/CodeGen/X86/vec_int_to_fp.ll - this is the test added to model PR43609; the fadds are not reassociated now, so we should get the expected results. 6. llvm/test/CodeGen/X86/vector-reduce-fadd-fast.ll - similar to #1 7. llvm/test/CodeGen/X86/vector-reduce-fmul-fast.ll - similar to #1 Differential Revision: https://reviews.llvm.org/D74851
1 parent 2f090ce commit 90fd859

9 files changed

+210
-207
lines changed

llvm/lib/CodeGen/TargetInstrInfo.cpp

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -699,10 +699,13 @@ bool TargetInstrInfo::hasReassociableSibling(const MachineInstr &Inst,
699699
std::swap(MI1, MI2);
700700

701701
// 1. The previous instruction must be the same type as Inst.
702-
// 2. The previous instruction must have virtual register definitions for its
702+
// 2. The previous instruction must also be associative/commutative (this can
703+
// be different even for instructions with the same opcode if traits like
704+
// fast-math-flags are included).
705+
// 3. The previous instruction must have virtual register definitions for its
703706
// operands in the same basic block as Inst.
704-
// 3. The previous instruction's result must only be used by Inst.
705-
return MI1->getOpcode() == AssocOpcode &&
707+
// 4. The previous instruction's result must only be used by Inst.
708+
return MI1->getOpcode() == AssocOpcode && isAssociativeAndCommutative(*MI1) &&
706709
hasReassociableOperands(*MI1, MBB) &&
707710
MRI.hasOneNonDBGUse(MI1->getOperand(0).getReg());
708711
}

llvm/lib/Target/X86/X86InstrInfo.cpp

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7657,7 +7657,8 @@ bool X86InstrInfo::isAssociativeAndCommutative(const MachineInstr &Inst) const {
76577657
case X86::VMULSSrr:
76587658
case X86::VMULSDZrr:
76597659
case X86::VMULSSZrr:
7660-
return Inst.getParent()->getParent()->getTarget().Options.UnsafeFPMath;
7660+
return Inst.getFlag(MachineInstr::MIFlag::FmReassoc) &&
7661+
Inst.getFlag(MachineInstr::MIFlag::FmNsz);
76617662
default:
76627663
return false;
76637664
}
@@ -7843,6 +7844,20 @@ void X86InstrInfo::setSpecialOperandAttr(MachineInstr &OldMI1,
78437844
MachineInstr &OldMI2,
78447845
MachineInstr &NewMI1,
78457846
MachineInstr &NewMI2) const {
7847+
// Propagate FP flags from the original instructions.
7848+
// But clear poison-generating flags because those may not be valid now.
7849+
// TODO: There should be a helper function for copying only fast-math-flags.
7850+
uint16_t IntersectedFlags = OldMI1.getFlags() & OldMI2.getFlags();
7851+
NewMI1.setFlags(IntersectedFlags);
7852+
NewMI1.clearFlag(MachineInstr::MIFlag::NoSWrap);
7853+
NewMI1.clearFlag(MachineInstr::MIFlag::NoUWrap);
7854+
NewMI1.clearFlag(MachineInstr::MIFlag::IsExact);
7855+
7856+
NewMI2.setFlags(IntersectedFlags);
7857+
NewMI2.clearFlag(MachineInstr::MIFlag::NoSWrap);
7858+
NewMI2.clearFlag(MachineInstr::MIFlag::NoUWrap);
7859+
NewMI2.clearFlag(MachineInstr::MIFlag::IsExact);
7860+
78467861
// Integer instructions may define an implicit EFLAGS dest register operand.
78477862
MachineOperand *OldFlagDef1 = OldMI1.findRegisterDefOperand(X86::EFLAGS);
78487863
MachineOperand *OldFlagDef2 = OldMI2.findRegisterDefOperand(X86::EFLAGS);

llvm/test/CodeGen/X86/fmf-flags.ll

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
2-
; RUN: llc < %s -mtriple=x86_64-unknown | FileCheck %s -check-prefix=X64
2+
; RUN: llc < %s -mtriple=x86_64-- -mcpu=x86-64 | FileCheck %s -check-prefix=X64
33
; RUN: llc < %s -mtriple=i686-unknown | FileCheck %s -check-prefix=X86
44

55
declare float @llvm.sqrt.f32(float %x);

llvm/test/CodeGen/X86/machine-combiner.ll

Lines changed: 79 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
2-
; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=sse -enable-unsafe-fp-math -enable-no-nans-fp-math -enable-no-signed-zeros-fp-math -machine-combiner-verify-pattern-order=true < %s | FileCheck %s --check-prefix=SSE
3-
; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=avx -enable-unsafe-fp-math -enable-no-nans-fp-math -enable-no-signed-zeros-fp-math -machine-combiner-verify-pattern-order=true < %s | FileCheck %s --check-prefixes=AVX,AVX1
4-
; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=avx512vl -enable-unsafe-fp-math -enable-no-nans-fp-math -enable-no-signed-zeros-fp-math -machine-combiner-verify-pattern-order=true < %s | FileCheck %s --check-prefixes=AVX,AVX512
2+
; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=sse -enable-no-nans-fp-math -enable-no-signed-zeros-fp-math -machine-combiner-verify-pattern-order=true < %s | FileCheck %s --check-prefix=SSE
3+
; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=avx -enable-no-nans-fp-math -enable-no-signed-zeros-fp-math -machine-combiner-verify-pattern-order=true < %s | FileCheck %s --check-prefixes=AVX,AVX1
4+
; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=avx512vl -enable-no-nans-fp-math -enable-no-signed-zeros-fp-math -machine-combiner-verify-pattern-order=true < %s | FileCheck %s --check-prefixes=AVX,AVX512
55

66
; Incremental updates of the instruction depths should be enough for this test
77
; case.
8-
; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=sse -enable-unsafe-fp-math -enable-no-nans-fp-math -enable-no-signed-zeros-fp-math -machine-combiner-inc-threshold=0 < %s | FileCheck %s --check-prefix=SSE
9-
; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=avx -enable-unsafe-fp-math -enable-no-nans-fp-math -enable-no-signed-zeros-fp-math -machine-combiner-inc-threshold=0 < %s | FileCheck %s --check-prefixes=AVX,AVX1
10-
; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=avx512vl -enable-unsafe-fp-math -enable-no-nans-fp-math -enable-no-signed-zeros-fp-math -machine-combiner-inc-threshold=0 < %s | FileCheck %s --check-prefixes=AVX,AVX512
8+
; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -enable-no-nans-fp-math -enable-no-signed-zeros-fp-math -mattr=sse -machine-combiner-inc-threshold=0 < %s | FileCheck %s --check-prefix=SSE
9+
; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -enable-no-nans-fp-math -enable-no-signed-zeros-fp-math -mattr=avx -machine-combiner-inc-threshold=0 < %s | FileCheck %s --check-prefixes=AVX,AVX1
10+
; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -enable-no-nans-fp-math -enable-no-signed-zeros-fp-math -mattr=avx512vl -machine-combiner-inc-threshold=0 < %s | FileCheck %s --check-prefixes=AVX,AVX512
1111

1212
; Verify that the first two adds are independent regardless of how the inputs are
1313
; commuted. The destination registers are used as source registers for the third add.
@@ -26,9 +26,9 @@ define float @reassociate_adds1(float %x0, float %x1, float %x2, float %x3) {
2626
; AVX-NEXT: vaddss %xmm3, %xmm2, %xmm1
2727
; AVX-NEXT: vaddss %xmm1, %xmm0, %xmm0
2828
; AVX-NEXT: retq
29-
%t0 = fadd float %x0, %x1
30-
%t1 = fadd float %t0, %x2
31-
%t2 = fadd float %t1, %x3
29+
%t0 = fadd reassoc nsz float %x0, %x1
30+
%t1 = fadd reassoc nsz float %t0, %x2
31+
%t2 = fadd reassoc nsz float %t1, %x3
3232
ret float %t2
3333
}
3434

@@ -46,9 +46,9 @@ define float @reassociate_adds2(float %x0, float %x1, float %x2, float %x3) {
4646
; AVX-NEXT: vaddss %xmm3, %xmm2, %xmm1
4747
; AVX-NEXT: vaddss %xmm1, %xmm0, %xmm0
4848
; AVX-NEXT: retq
49-
%t0 = fadd float %x0, %x1
50-
%t1 = fadd float %x2, %t0
51-
%t2 = fadd float %t1, %x3
49+
%t0 = fadd reassoc nsz float %x0, %x1
50+
%t1 = fadd reassoc nsz float %x2, %t0
51+
%t2 = fadd reassoc nsz float %t1, %x3
5252
ret float %t2
5353
}
5454

@@ -66,9 +66,9 @@ define float @reassociate_adds3(float %x0, float %x1, float %x2, float %x3) {
6666
; AVX-NEXT: vaddss %xmm3, %xmm2, %xmm1
6767
; AVX-NEXT: vaddss %xmm1, %xmm0, %xmm0
6868
; AVX-NEXT: retq
69-
%t0 = fadd float %x0, %x1
70-
%t1 = fadd float %t0, %x2
71-
%t2 = fadd float %x3, %t1
69+
%t0 = fadd reassoc nsz float %x0, %x1
70+
%t1 = fadd reassoc nsz float %t0, %x2
71+
%t2 = fadd reassoc nsz float %x3, %t1
7272
ret float %t2
7373
}
7474

@@ -86,9 +86,9 @@ define float @reassociate_adds4(float %x0, float %x1, float %x2, float %x3) {
8686
; AVX-NEXT: vaddss %xmm3, %xmm2, %xmm1
8787
; AVX-NEXT: vaddss %xmm1, %xmm0, %xmm0
8888
; AVX-NEXT: retq
89-
%t0 = fadd float %x0, %x1
90-
%t1 = fadd float %x2, %t0
91-
%t2 = fadd float %x3, %t1
89+
%t0 = fadd reassoc nsz float %x0, %x1
90+
%t1 = fadd reassoc nsz float %x2, %t0
91+
%t2 = fadd reassoc nsz float %x3, %t1
9292
ret float %t2
9393
}
9494

@@ -117,13 +117,13 @@ define float @reassociate_adds5(float %x0, float %x1, float %x2, float %x3, floa
117117
; AVX-NEXT: vaddss %xmm1, %xmm0, %xmm0
118118
; AVX-NEXT: vaddss %xmm7, %xmm0, %xmm0
119119
; AVX-NEXT: retq
120-
%t0 = fadd float %x0, %x1
121-
%t1 = fadd float %t0, %x2
122-
%t2 = fadd float %t1, %x3
123-
%t3 = fadd float %t2, %x4
124-
%t4 = fadd float %t3, %x5
125-
%t5 = fadd float %t4, %x6
126-
%t6 = fadd float %t5, %x7
120+
%t0 = fadd reassoc nsz float %x0, %x1
121+
%t1 = fadd reassoc nsz float %t0, %x2
122+
%t2 = fadd reassoc nsz float %t1, %x3
123+
%t3 = fadd reassoc nsz float %t2, %x4
124+
%t4 = fadd reassoc nsz float %t3, %x5
125+
%t5 = fadd reassoc nsz float %t4, %x6
126+
%t6 = fadd reassoc nsz float %t5, %x7
127127
ret float %t6
128128
}
129129

@@ -146,9 +146,9 @@ define float @reassociate_adds6(float %x0, float %x1, float %x2, float %x3) {
146146
; AVX-NEXT: vaddss %xmm3, %xmm2, %xmm1
147147
; AVX-NEXT: vaddss %xmm1, %xmm0, %xmm0
148148
; AVX-NEXT: retq
149-
%t0 = fdiv float %x0, %x1
150-
%t1 = fadd float %x2, %t0
151-
%t2 = fadd float %x3, %t1
149+
%t0 = fdiv reassoc nsz float %x0, %x1
150+
%t1 = fadd reassoc nsz float %x2, %t0
151+
%t2 = fadd reassoc nsz float %x3, %t1
152152
ret float %t2
153153
}
154154

@@ -168,9 +168,9 @@ define float @reassociate_muls1(float %x0, float %x1, float %x2, float %x3) {
168168
; AVX-NEXT: vmulss %xmm3, %xmm2, %xmm1
169169
; AVX-NEXT: vmulss %xmm1, %xmm0, %xmm0
170170
; AVX-NEXT: retq
171-
%t0 = fdiv float %x0, %x1
172-
%t1 = fmul float %x2, %t0
173-
%t2 = fmul float %x3, %t1
171+
%t0 = fdiv reassoc nsz float %x0, %x1
172+
%t1 = fmul reassoc nsz float %x2, %t0
173+
%t2 = fmul reassoc nsz float %x3, %t1
174174
ret float %t2
175175
}
176176

@@ -190,9 +190,9 @@ define double @reassociate_adds_double(double %x0, double %x1, double %x2, doubl
190190
; AVX-NEXT: vaddsd %xmm3, %xmm2, %xmm1
191191
; AVX-NEXT: vaddsd %xmm1, %xmm0, %xmm0
192192
; AVX-NEXT: retq
193-
%t0 = fdiv double %x0, %x1
194-
%t1 = fadd double %x2, %t0
195-
%t2 = fadd double %x3, %t1
193+
%t0 = fdiv reassoc nsz double %x0, %x1
194+
%t1 = fadd reassoc nsz double %x2, %t0
195+
%t2 = fadd reassoc nsz double %x3, %t1
196196
ret double %t2
197197
}
198198

@@ -212,9 +212,9 @@ define double @reassociate_muls_double(double %x0, double %x1, double %x2, doubl
212212
; AVX-NEXT: vmulsd %xmm3, %xmm2, %xmm1
213213
; AVX-NEXT: vmulsd %xmm1, %xmm0, %xmm0
214214
; AVX-NEXT: retq
215-
%t0 = fdiv double %x0, %x1
216-
%t1 = fmul double %x2, %t0
217-
%t2 = fmul double %x3, %t1
215+
%t0 = fdiv reassoc nsz double %x0, %x1
216+
%t1 = fmul reassoc nsz double %x2, %t0
217+
%t2 = fmul reassoc nsz double %x3, %t1
218218
ret double %t2
219219
}
220220

@@ -240,9 +240,9 @@ define <4 x float> @reassociate_adds_v4f32(<4 x float> %x0, <4 x float> %x1, <4
240240
; AVX512-NEXT: vfmadd213ps {{.*#+}} xmm0 = (xmm1 * xmm0) + xmm2
241241
; AVX512-NEXT: vaddps %xmm0, %xmm3, %xmm0
242242
; AVX512-NEXT: retq
243-
%t0 = fmul <4 x float> %x0, %x1
244-
%t1 = fadd <4 x float> %x2, %t0
245-
%t2 = fadd <4 x float> %x3, %t1
243+
%t0 = fmul reassoc nsz <4 x float> %x0, %x1
244+
%t1 = fadd reassoc nsz <4 x float> %x2, %t0
245+
%t2 = fadd reassoc nsz <4 x float> %x3, %t1
246246
ret <4 x float> %t2
247247
}
248248

@@ -268,9 +268,9 @@ define <2 x double> @reassociate_adds_v2f64(<2 x double> %x0, <2 x double> %x1,
268268
; AVX512-NEXT: vfmadd213pd {{.*#+}} xmm0 = (xmm1 * xmm0) + xmm2
269269
; AVX512-NEXT: vaddpd %xmm0, %xmm3, %xmm0
270270
; AVX512-NEXT: retq
271-
%t0 = fmul <2 x double> %x0, %x1
272-
%t1 = fadd <2 x double> %x2, %t0
273-
%t2 = fadd <2 x double> %x3, %t1
271+
%t0 = fmul reassoc nsz <2 x double> %x0, %x1
272+
%t1 = fadd reassoc nsz <2 x double> %x2, %t0
273+
%t2 = fadd reassoc nsz <2 x double> %x3, %t1
274274
ret <2 x double> %t2
275275
}
276276

@@ -290,9 +290,9 @@ define <4 x float> @reassociate_muls_v4f32(<4 x float> %x0, <4 x float> %x1, <4
290290
; AVX-NEXT: vmulps %xmm3, %xmm2, %xmm1
291291
; AVX-NEXT: vmulps %xmm1, %xmm0, %xmm0
292292
; AVX-NEXT: retq
293-
%t0 = fadd <4 x float> %x0, %x1
294-
%t1 = fmul <4 x float> %x2, %t0
295-
%t2 = fmul <4 x float> %x3, %t1
293+
%t0 = fadd reassoc nsz <4 x float> %x0, %x1
294+
%t1 = fmul reassoc nsz <4 x float> %x2, %t0
295+
%t2 = fmul reassoc nsz <4 x float> %x3, %t1
296296
ret <4 x float> %t2
297297
}
298298

@@ -312,9 +312,9 @@ define <2 x double> @reassociate_muls_v2f64(<2 x double> %x0, <2 x double> %x1,
312312
; AVX-NEXT: vmulpd %xmm3, %xmm2, %xmm1
313313
; AVX-NEXT: vmulpd %xmm1, %xmm0, %xmm0
314314
; AVX-NEXT: retq
315-
%t0 = fadd <2 x double> %x0, %x1
316-
%t1 = fmul <2 x double> %x2, %t0
317-
%t2 = fmul <2 x double> %x3, %t1
315+
%t0 = fadd reassoc nsz <2 x double> %x0, %x1
316+
%t1 = fmul reassoc nsz <2 x double> %x2, %t0
317+
%t2 = fmul reassoc nsz <2 x double> %x3, %t1
318318
ret <2 x double> %t2
319319
}
320320

@@ -343,9 +343,9 @@ define <8 x float> @reassociate_adds_v8f32(<8 x float> %x0, <8 x float> %x1, <8
343343
; AVX512-NEXT: vfmadd213ps {{.*#+}} ymm0 = (ymm1 * ymm0) + ymm2
344344
; AVX512-NEXT: vaddps %ymm0, %ymm3, %ymm0
345345
; AVX512-NEXT: retq
346-
%t0 = fmul <8 x float> %x0, %x1
347-
%t1 = fadd <8 x float> %x2, %t0
348-
%t2 = fadd <8 x float> %x3, %t1
346+
%t0 = fmul reassoc nsz <8 x float> %x0, %x1
347+
%t1 = fadd reassoc nsz <8 x float> %x2, %t0
348+
%t2 = fadd reassoc nsz <8 x float> %x3, %t1
349349
ret <8 x float> %t2
350350
}
351351

@@ -374,9 +374,9 @@ define <4 x double> @reassociate_adds_v4f64(<4 x double> %x0, <4 x double> %x1,
374374
; AVX512-NEXT: vfmadd213pd {{.*#+}} ymm0 = (ymm1 * ymm0) + ymm2
375375
; AVX512-NEXT: vaddpd %ymm0, %ymm3, %ymm0
376376
; AVX512-NEXT: retq
377-
%t0 = fmul <4 x double> %x0, %x1
378-
%t1 = fadd <4 x double> %x2, %t0
379-
%t2 = fadd <4 x double> %x3, %t1
377+
%t0 = fmul reassoc nsz <4 x double> %x0, %x1
378+
%t1 = fadd reassoc nsz <4 x double> %x2, %t0
379+
%t2 = fadd reassoc nsz <4 x double> %x3, %t1
380380
ret <4 x double> %t2
381381
}
382382

@@ -399,9 +399,9 @@ define <8 x float> @reassociate_muls_v8f32(<8 x float> %x0, <8 x float> %x1, <8
399399
; AVX-NEXT: vmulps %ymm3, %ymm2, %ymm1
400400
; AVX-NEXT: vmulps %ymm1, %ymm0, %ymm0
401401
; AVX-NEXT: retq
402-
%t0 = fadd <8 x float> %x0, %x1
403-
%t1 = fmul <8 x float> %x2, %t0
404-
%t2 = fmul <8 x float> %x3, %t1
402+
%t0 = fadd reassoc nsz <8 x float> %x0, %x1
403+
%t1 = fmul reassoc nsz <8 x float> %x2, %t0
404+
%t2 = fmul reassoc nsz <8 x float> %x3, %t1
405405
ret <8 x float> %t2
406406
}
407407

@@ -424,9 +424,9 @@ define <4 x double> @reassociate_muls_v4f64(<4 x double> %x0, <4 x double> %x1,
424424
; AVX-NEXT: vmulpd %ymm3, %ymm2, %ymm1
425425
; AVX-NEXT: vmulpd %ymm1, %ymm0, %ymm0
426426
; AVX-NEXT: retq
427-
%t0 = fadd <4 x double> %x0, %x1
428-
%t1 = fmul <4 x double> %x2, %t0
429-
%t2 = fmul <4 x double> %x3, %t1
427+
%t0 = fadd reassoc nsz <4 x double> %x0, %x1
428+
%t1 = fmul reassoc nsz <4 x double> %x2, %t0
429+
%t2 = fmul reassoc nsz <4 x double> %x3, %t1
430430
ret <4 x double> %t2
431431
}
432432

@@ -464,9 +464,9 @@ define <16 x float> @reassociate_adds_v16f32(<16 x float> %x0, <16 x float> %x1,
464464
; AVX512-NEXT: vfmadd213ps {{.*#+}} zmm0 = (zmm1 * zmm0) + zmm2
465465
; AVX512-NEXT: vaddps %zmm0, %zmm3, %zmm0
466466
; AVX512-NEXT: retq
467-
%t0 = fmul <16 x float> %x0, %x1
468-
%t1 = fadd <16 x float> %x2, %t0
469-
%t2 = fadd <16 x float> %x3, %t1
467+
%t0 = fmul reassoc nsz <16 x float> %x0, %x1
468+
%t1 = fadd reassoc nsz <16 x float> %x2, %t0
469+
%t2 = fadd reassoc nsz <16 x float> %x3, %t1
470470
ret <16 x float> %t2
471471
}
472472

@@ -504,9 +504,9 @@ define <8 x double> @reassociate_adds_v8f64(<8 x double> %x0, <8 x double> %x1,
504504
; AVX512-NEXT: vfmadd213pd {{.*#+}} zmm0 = (zmm1 * zmm0) + zmm2
505505
; AVX512-NEXT: vaddpd %zmm0, %zmm3, %zmm0
506506
; AVX512-NEXT: retq
507-
%t0 = fmul <8 x double> %x0, %x1
508-
%t1 = fadd <8 x double> %x2, %t0
509-
%t2 = fadd <8 x double> %x3, %t1
507+
%t0 = fmul reassoc nsz <8 x double> %x0, %x1
508+
%t1 = fadd reassoc nsz <8 x double> %x2, %t0
509+
%t2 = fadd reassoc nsz <8 x double> %x3, %t1
510510
ret <8 x double> %t2
511511
}
512512

@@ -545,9 +545,9 @@ define <16 x float> @reassociate_muls_v16f32(<16 x float> %x0, <16 x float> %x1,
545545
; AVX512-NEXT: vmulps %zmm3, %zmm2, %zmm1
546546
; AVX512-NEXT: vmulps %zmm1, %zmm0, %zmm0
547547
; AVX512-NEXT: retq
548-
%t0 = fadd <16 x float> %x0, %x1
549-
%t1 = fmul <16 x float> %x2, %t0
550-
%t2 = fmul <16 x float> %x3, %t1
548+
%t0 = fadd reassoc nsz <16 x float> %x0, %x1
549+
%t1 = fmul reassoc nsz <16 x float> %x2, %t0
550+
%t2 = fmul reassoc nsz <16 x float> %x3, %t1
551551
ret <16 x float> %t2
552552
}
553553

@@ -586,9 +586,9 @@ define <8 x double> @reassociate_muls_v8f64(<8 x double> %x0, <8 x double> %x1,
586586
; AVX512-NEXT: vmulpd %zmm3, %zmm2, %zmm1
587587
; AVX512-NEXT: vmulpd %zmm1, %zmm0, %zmm0
588588
; AVX512-NEXT: retq
589-
%t0 = fadd <8 x double> %x0, %x1
590-
%t1 = fmul <8 x double> %x2, %t0
591-
%t2 = fmul <8 x double> %x3, %t1
589+
%t0 = fadd reassoc nsz <8 x double> %x0, %x1
590+
%t1 = fmul reassoc nsz <8 x double> %x2, %t0
591+
%t2 = fmul reassoc nsz <8 x double> %x3, %t1
592592
ret <8 x double> %t2
593593
}
594594

@@ -1114,9 +1114,9 @@ define double @reassociate_adds_from_calls() {
11141114
%x1 = call double @bar()
11151115
%x2 = call double @bar()
11161116
%x3 = call double @bar()
1117-
%t0 = fadd double %x0, %x1
1118-
%t1 = fadd double %t0, %x2
1119-
%t2 = fadd double %t1, %x3
1117+
%t0 = fadd reassoc nsz double %x0, %x1
1118+
%t1 = fadd reassoc nsz double %t0, %x2
1119+
%t2 = fadd reassoc nsz double %t1, %x3
11201120
ret double %t2
11211121
}
11221122

@@ -1165,9 +1165,9 @@ define double @already_reassociated() {
11651165
%x1 = call double @bar()
11661166
%x2 = call double @bar()
11671167
%x3 = call double @bar()
1168-
%t0 = fadd double %x0, %x1
1169-
%t1 = fadd double %x2, %x3
1170-
%t2 = fadd double %t0, %t1
1168+
%t0 = fadd reassoc nsz double %x0, %x1
1169+
%t1 = fadd reassoc nsz double %x2, %x3
1170+
%t2 = fadd reassoc nsz double %t0, %t1
11711171
ret double %t2
11721172
}
11731173

llvm/test/CodeGen/X86/pow.ll

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
2-
; RUN: llc < %s -mtriple=x86_64-- | FileCheck %s
2+
; RUN: llc < %s -mtriple=x86_64-- -mcpu=x86-64 | FileCheck %s
33

44
declare float @llvm.pow.f32(float, float)
55
declare <4 x float> @llvm.pow.v4f32(<4 x float>, <4 x float>)

0 commit comments

Comments
 (0)