-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[LICM] Support integer mul/add in hoistFPAssociation. #67736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms ChangesThis is a naive copy/paste of the FP code. We could do more integration to share code. Posting for discussion. Patch is 70.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/67736.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp b/llvm/lib/Transforms/Scalar/LICM.cpp
index 4cb70cbdf093b36..d3f3cf84b4bcc1c 100644
--- a/llvm/lib/Transforms/Scalar/LICM.cpp
+++ b/llvm/lib/Transforms/Scalar/LICM.cpp
@@ -110,6 +110,9 @@ STATISTIC(NumAddSubHoisted, "Number of add/subtract expressions reassociated "
"and hoisted out of the loop");
STATISTIC(NumFPAssociationsHoisted, "Number of invariant FP expressions "
"reassociated and hoisted out of the loop");
+STATISTIC(NumIntAssociationsHoisted,
+ "Number of invariant int expressions "
+ "reassociated and hoisted out of the loop");
/// Memory promotion is enabled by default.
static cl::opt<bool>
@@ -135,6 +138,12 @@ cl::opt<unsigned> FPAssociationUpperLimit(
"Set upper limit for the number of transformations performed "
"during a single round of hoisting the reassociated expressions."));
+cl::opt<unsigned> IntAssociationUpperLimit(
+ "licm-max-num-int-reassociations", cl::init(5U), cl::Hidden,
+ cl::desc(
+ "Set upper limit for the number of transformations performed "
+ "during a single round of hoisting the reassociated expressions."));
+
// Experimental option to allow imprecision in LICM in pathological cases, in
// exchange for faster compile. This is to be removed if MemorySSA starts to
// address the same issue. LICM calls MemorySSAWalker's
@@ -2719,6 +2728,65 @@ static bool hoistFPAssociation(Instruction &I, Loop &L,
return true;
}
+static bool hoistIntAssociation(Instruction &I, Loop &L,
+ ICFLoopSafetyInfo &SafetyInfo,
+ MemorySSAUpdater &MSSAU, AssumptionCache *AC,
+ DominatorTree *DT) {
+ using namespace PatternMatch;
+ Value *VariantOp = nullptr, *InvariantOp = nullptr;
+
+ if (!match(&I, m_Mul(m_Value(VariantOp), m_Value(InvariantOp))))
+ return false;
+ if (L.isLoopInvariant(VariantOp))
+ std::swap(VariantOp, InvariantOp);
+ if (L.isLoopInvariant(VariantOp) || !L.isLoopInvariant(InvariantOp))
+ return false;
+ Value *Factor = InvariantOp;
+
+ // First, we need to make sure we should do the transformation.
+ SmallVector<Use *> Changes;
+ SmallVector<BinaryOperator *> Worklist;
+ if (BinaryOperator *VariantBinOp = dyn_cast<BinaryOperator>(VariantOp))
+ Worklist.push_back(VariantBinOp);
+ while (!Worklist.empty()) {
+ BinaryOperator *BO = Worklist.pop_back_val();
+ if (!BO->hasOneUse())
+ return false;
+ BinaryOperator *Op0, *Op1;
+ if (match(BO, m_Add(m_BinOp(Op0), m_BinOp(Op1)))) {
+ Worklist.push_back(Op0);
+ Worklist.push_back(Op1);
+ continue;
+ }
+ if (BO->getOpcode() != Instruction::Mul || L.isLoopInvariant(BO))
+ return false;
+ Use &U0 = BO->getOperandUse(0);
+ Use &U1 = BO->getOperandUse(1);
+ if (L.isLoopInvariant(U0))
+ Changes.push_back(&U0);
+ else if (L.isLoopInvariant(U1))
+ Changes.push_back(&U1);
+ else
+ return false;
+ if (Changes.size() > IntAssociationUpperLimit)
+ return false;
+ }
+ if (Changes.empty())
+ return false;
+
+ // We know we should do it so let's do the transformation.
+ auto *Preheader = L.getLoopPreheader();
+ assert(Preheader && "Loop is not in simplify form?");
+ IRBuilder<> Builder(Preheader->getTerminator());
+ for (auto *U : Changes) {
+ assert(L.isLoopInvariant(U->get()));
+ U->set(Builder.CreateMul(U->get(), Factor, "factor.op.mul"));
+ }
+ I.replaceAllUsesWith(VariantOp);
+ eraseInstruction(I, SafetyInfo, MSSAU);
+ return true;
+}
+
static bool hoistArithmetics(Instruction &I, Loop &L,
ICFLoopSafetyInfo &SafetyInfo,
MemorySSAUpdater &MSSAU, AssumptionCache *AC,
@@ -2752,6 +2820,12 @@ static bool hoistArithmetics(Instruction &I, Loop &L,
return true;
}
+ if (hoistIntAssociation(I, L, SafetyInfo, MSSAU, AC, DT)) {
+ ++NumHoisted;
+ ++NumIntAssociationsHoisted;
+ return true;
+ }
+
return false;
}
diff --git a/llvm/test/Transforms/LICM/expr-reassociate-int.ll b/llvm/test/Transforms/LICM/expr-reassociate-int.ll
new file mode 100644
index 000000000000000..556bcfa4dbfd0bc
--- /dev/null
+++ b/llvm/test/Transforms/LICM/expr-reassociate-int.ll
@@ -0,0 +1,1078 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
+; RUN: opt -passes='reassociate' -S < %s | FileCheck %s --check-prefix=REASSOCIATE_ONLY
+; RUN: opt -passes='licm' -S < %s | FileCheck %s --check-prefix=LICM_ONLY
+; RUN: opt -passes='licm' -licm-max-num-int-reassociations=1 -S < %s | FileCheck %s --check-prefix=LICM_ONLY_CONSTRAINED
+; RUN: opt -passes='reassociate,loop-mssa(licm)' -S < %s | FileCheck %s --check-prefix=LICM_AFTER_REASSOCIATE
+; RUN: opt -passes='reassociate,loop-mssa(licm)' -licm-max-num-int-reassociations=1 -S < %s | FileCheck %s --check-prefix=LICM_AFTER_REASSOCIATE_CONSTRAINED
+
+;
+; A simple loop, should not get modified:
+;
+; int j;
+; const uint64_t d1d = d1 * delta;
+;
+; for (j = 0; j <= i; j++)
+; cells[j] = d1d * cells[j + 1];
+;
+
+define void @innermost_loop_1d(i32 %i, i64 %d1, i64 %delta, ptr %cells) {
+; REASSOCIATE_ONLY-LABEL: define void @innermost_loop_1d
+; REASSOCIATE_ONLY-SAME: (i32 [[I:%.*]], i64 [[D1:%.*]], i64 [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; REASSOCIATE_ONLY-NEXT: entry:
+; REASSOCIATE_ONLY-NEXT: [[FMUL_D1:%.*]] = mul i64 [[DELTA]], [[D1]]
+; REASSOCIATE_ONLY-NEXT: br label [[FOR_COND:%.*]]
+; REASSOCIATE_ONLY: for.cond:
+; REASSOCIATE_ONLY-NEXT: [[J:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[ADD_J_1:%.*]], [[FOR_BODY:%.*]] ]
+; REASSOCIATE_ONLY-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
+; REASSOCIATE_ONLY-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; REASSOCIATE_ONLY: for.body:
+; REASSOCIATE_ONLY-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
+; REASSOCIATE_ONLY-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
+; REASSOCIATE_ONLY-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
+; REASSOCIATE_ONLY-NEXT: [[CELL_1:%.*]] = load i64, ptr [[ARRAYIDX_J_1]], align 8
+; REASSOCIATE_ONLY-NEXT: [[FMUL_1:%.*]] = mul i64 [[FMUL_D1]], [[CELL_1]]
+; REASSOCIATE_ONLY-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
+; REASSOCIATE_ONLY-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J]]
+; REASSOCIATE_ONLY-NEXT: store i64 [[FMUL_1]], ptr [[ARRAYIDX_J]], align 8
+; REASSOCIATE_ONLY-NEXT: br label [[FOR_COND]]
+; REASSOCIATE_ONLY: for.end:
+; REASSOCIATE_ONLY-NEXT: ret void
+;
+; LICM_ONLY-LABEL: define void @innermost_loop_1d
+; LICM_ONLY-SAME: (i32 [[I:%.*]], i64 [[D1:%.*]], i64 [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; LICM_ONLY-NEXT: entry:
+; LICM_ONLY-NEXT: [[FMUL_D1:%.*]] = mul i64 [[D1]], [[DELTA]]
+; LICM_ONLY-NEXT: br label [[FOR_COND:%.*]]
+; LICM_ONLY: for.cond:
+; LICM_ONLY-NEXT: [[J:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[ADD_J_1:%.*]], [[FOR_BODY:%.*]] ]
+; LICM_ONLY-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
+; LICM_ONLY-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; LICM_ONLY: for.body:
+; LICM_ONLY-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
+; LICM_ONLY-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
+; LICM_ONLY-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
+; LICM_ONLY-NEXT: [[CELL_1:%.*]] = load i64, ptr [[ARRAYIDX_J_1]], align 8
+; LICM_ONLY-NEXT: [[FMUL_1:%.*]] = mul i64 [[FMUL_D1]], [[CELL_1]]
+; LICM_ONLY-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
+; LICM_ONLY-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J]]
+; LICM_ONLY-NEXT: store i64 [[FMUL_1]], ptr [[ARRAYIDX_J]], align 8
+; LICM_ONLY-NEXT: br label [[FOR_COND]]
+; LICM_ONLY: for.end:
+; LICM_ONLY-NEXT: ret void
+;
+; LICM_ONLY_CONSTRAINED-LABEL: define void @innermost_loop_1d
+; LICM_ONLY_CONSTRAINED-SAME: (i32 [[I:%.*]], i64 [[D1:%.*]], i64 [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; LICM_ONLY_CONSTRAINED-NEXT: entry:
+; LICM_ONLY_CONSTRAINED-NEXT: [[FMUL_D1:%.*]] = mul i64 [[D1]], [[DELTA]]
+; LICM_ONLY_CONSTRAINED-NEXT: br label [[FOR_COND:%.*]]
+; LICM_ONLY_CONSTRAINED: for.cond:
+; LICM_ONLY_CONSTRAINED-NEXT: [[J:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[ADD_J_1:%.*]], [[FOR_BODY:%.*]] ]
+; LICM_ONLY_CONSTRAINED-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
+; LICM_ONLY_CONSTRAINED-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; LICM_ONLY_CONSTRAINED: for.body:
+; LICM_ONLY_CONSTRAINED-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
+; LICM_ONLY_CONSTRAINED-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
+; LICM_ONLY_CONSTRAINED-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
+; LICM_ONLY_CONSTRAINED-NEXT: [[CELL_1:%.*]] = load i64, ptr [[ARRAYIDX_J_1]], align 8
+; LICM_ONLY_CONSTRAINED-NEXT: [[FMUL_1:%.*]] = mul i64 [[FMUL_D1]], [[CELL_1]]
+; LICM_ONLY_CONSTRAINED-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
+; LICM_ONLY_CONSTRAINED-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J]]
+; LICM_ONLY_CONSTRAINED-NEXT: store i64 [[FMUL_1]], ptr [[ARRAYIDX_J]], align 8
+; LICM_ONLY_CONSTRAINED-NEXT: br label [[FOR_COND]]
+; LICM_ONLY_CONSTRAINED: for.end:
+; LICM_ONLY_CONSTRAINED-NEXT: ret void
+;
+; LICM_AFTER_REASSOCIATE-LABEL: define void @innermost_loop_1d
+; LICM_AFTER_REASSOCIATE-SAME: (i32 [[I:%.*]], i64 [[D1:%.*]], i64 [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; LICM_AFTER_REASSOCIATE-NEXT: entry:
+; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_D1:%.*]] = mul i64 [[DELTA]], [[D1]]
+; LICM_AFTER_REASSOCIATE-NEXT: br label [[FOR_COND:%.*]]
+; LICM_AFTER_REASSOCIATE: for.cond:
+; LICM_AFTER_REASSOCIATE-NEXT: [[J:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[ADD_J_1:%.*]], [[FOR_BODY:%.*]] ]
+; LICM_AFTER_REASSOCIATE-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
+; LICM_AFTER_REASSOCIATE-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; LICM_AFTER_REASSOCIATE: for.body:
+; LICM_AFTER_REASSOCIATE-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
+; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
+; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
+; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_1:%.*]] = load i64, ptr [[ARRAYIDX_J_1]], align 8
+; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_1:%.*]] = mul i64 [[FMUL_D1]], [[CELL_1]]
+; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
+; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J]]
+; LICM_AFTER_REASSOCIATE-NEXT: store i64 [[FMUL_1]], ptr [[ARRAYIDX_J]], align 8
+; LICM_AFTER_REASSOCIATE-NEXT: br label [[FOR_COND]]
+; LICM_AFTER_REASSOCIATE: for.end:
+; LICM_AFTER_REASSOCIATE-NEXT: ret void
+;
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-LABEL: define void @innermost_loop_1d
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-SAME: (i32 [[I:%.*]], i64 [[D1:%.*]], i64 [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT: entry:
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT: [[FMUL_D1:%.*]] = mul i64 [[DELTA]], [[D1]]
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT: br label [[FOR_COND:%.*]]
+; LICM_AFTER_REASSOCIATE_CONSTRAINED: for.cond:
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT: [[J:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[ADD_J_1:%.*]], [[FOR_BODY:%.*]] ]
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; LICM_AFTER_REASSOCIATE_CONSTRAINED: for.body:
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT: [[CELL_1:%.*]] = load i64, ptr [[ARRAYIDX_J_1]], align 8
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT: [[FMUL_1:%.*]] = mul i64 [[FMUL_D1]], [[CELL_1]]
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J]]
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT: store i64 [[FMUL_1]], ptr [[ARRAYIDX_J]], align 8
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT: br label [[FOR_COND]]
+; LICM_AFTER_REASSOCIATE_CONSTRAINED: for.end:
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-NEXT: ret void
+;
+entry:
+ %fmul.d1 = mul i64 %d1, %delta
+ br label %for.cond
+
+for.cond:
+ %j = phi i32 [ 0, %entry ], [ %add.j.1, %for.body ]
+ %cmp.not = icmp sgt i32 %j, %i
+ br i1 %cmp.not, label %for.end, label %for.body
+
+for.body:
+ %add.j.1 = add nuw nsw i32 %j, 1
+ %idxprom.j.1 = zext i32 %add.j.1 to i64
+ %arrayidx.j.1 = getelementptr inbounds i64, ptr %cells, i64 %idxprom.j.1
+ %cell.1 = load i64, ptr %arrayidx.j.1, align 8
+ %fmul.1 = mul i64 %fmul.d1, %cell.1
+ %idxprom.j = zext i32 %j to i64
+ %arrayidx.j = getelementptr inbounds i64, ptr %cells, i64 %idxprom.j
+ store i64 %fmul.1, ptr %arrayidx.j, align 8
+ br label %for.cond
+
+for.end:
+ ret void
+}
+
+;
+; A simple loop:
+;
+; int j;
+;
+; for (j = 0; j <= i; j++)
+; cells[j] = d1 * cells[j + 1] * delta;
+;
+; ...should be transformed by the LICM pass into this:
+;
+; int j;
+; const uint64_t d1d = d1 * delta;
+;
+; for (j = 0; j <= i; j++)
+; cells[j] = d1d * cells[j + 1];
+;
+
+define void @innermost_loop_1d_shouldhoist(i32 %i, i64 %d1, i64 %delta, ptr %cells) {
+; REASSOCIATE_ONLY-LABEL: define void @innermost_loop_1d_shouldhoist
+; REASSOCIATE_ONLY-SAME: (i32 [[I:%.*]], i64 [[D1:%.*]], i64 [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; REASSOCIATE_ONLY-NEXT: entry:
+; REASSOCIATE_ONLY-NEXT: br label [[FOR_COND:%.*]]
+; REASSOCIATE_ONLY: for.cond:
+; REASSOCIATE_ONLY-NEXT: [[J:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[ADD_J_1:%.*]], [[FOR_BODY:%.*]] ]
+; REASSOCIATE_ONLY-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
+; REASSOCIATE_ONLY-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; REASSOCIATE_ONLY: for.body:
+; REASSOCIATE_ONLY-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
+; REASSOCIATE_ONLY-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
+; REASSOCIATE_ONLY-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
+; REASSOCIATE_ONLY-NEXT: [[CELL_1:%.*]] = load i64, ptr [[ARRAYIDX_J_1]], align 8
+; REASSOCIATE_ONLY-NEXT: [[FMUL_1:%.*]] = mul i64 [[DELTA]], [[D1]]
+; REASSOCIATE_ONLY-NEXT: [[FMUL_2:%.*]] = mul i64 [[FMUL_1]], [[CELL_1]]
+; REASSOCIATE_ONLY-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
+; REASSOCIATE_ONLY-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J]]
+; REASSOCIATE_ONLY-NEXT: store i64 [[FMUL_2]], ptr [[ARRAYIDX_J]], align 8
+; REASSOCIATE_ONLY-NEXT: br label [[FOR_COND]]
+; REASSOCIATE_ONLY: for.end:
+; REASSOCIATE_ONLY-NEXT: ret void
+;
+; LICM_ONLY-LABEL: define void @innermost_loop_1d_shouldhoist
+; LICM_ONLY-SAME: (i32 [[I:%.*]], i64 [[D1:%.*]], i64 [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; LICM_ONLY-NEXT: entry:
+; LICM_ONLY-NEXT: [[FACTOR_OP_MUL:%.*]] = mul i64 [[D1]], [[DELTA]]
+; LICM_ONLY-NEXT: br label [[FOR_COND:%.*]]
+; LICM_ONLY: for.cond:
+; LICM_ONLY-NEXT: [[J:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[ADD_J_1:%.*]], [[FOR_BODY:%.*]] ]
+; LICM_ONLY-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
+; LICM_ONLY-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; LICM_ONLY: for.body:
+; LICM_ONLY-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
+; LICM_ONLY-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
+; LICM_ONLY-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
+; LICM_ONLY-NEXT: [[CELL_1:%.*]] = load i64, ptr [[ARRAYIDX_J_1]], align 8
+; LICM_ONLY-NEXT: [[FMUL_1:%.*]] = mul i64 [[FACTOR_OP_MUL]], [[CELL_1]]
+; LICM_ONLY-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
+; LICM_ONLY-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J]]
+; LICM_ONLY-NEXT: store i64 [[FMUL_1]], ptr [[ARRAYIDX_J]], align 8
+; LICM_ONLY-NEXT: br label [[FOR_COND]]
+; LICM_ONLY: for.end:
+; LICM_ONLY-NEXT: ret void
+;
+; LICM_ONLY_CONSTRAINED-LABEL: define void @innermost_loop_1d_shouldhoist
+; LICM_ONLY_CONSTRAINED-SAME: (i32 [[I:%.*]], i64 [[D1:%.*]], i64 [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; LICM_ONLY_CONSTRAINED-NEXT: entry:
+; LICM_ONLY_CONSTRAINED-NEXT: [[FACTOR_OP_MUL:%.*]] = mul i64 [[D1]], [[DELTA]]
+; LICM_ONLY_CONSTRAINED-NEXT: br label [[FOR_COND:%.*]]
+; LICM_ONLY_CONSTRAINED: for.cond:
+; LICM_ONLY_CONSTRAINED-NEXT: [[J:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[ADD_J_1:%.*]], [[FOR_BODY:%.*]] ]
+; LICM_ONLY_CONSTRAINED-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
+; LICM_ONLY_CONSTRAINED-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; LICM_ONLY_CONSTRAINED: for.body:
+; LICM_ONLY_CONSTRAINED-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
+; LICM_ONLY_CONSTRAINED-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
+; LICM_ONLY_CONSTRAINED-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
+; LICM_ONLY_CONSTRAINED-NEXT: [[CELL_1:%.*]] = load i64, ptr [[ARRAYIDX_J_1]], align 8
+; LICM_ONLY_CONSTRAINED-NEXT: [[FMUL_1:%.*]] = mul i64 [[FACTOR_OP_MUL]], [[CELL_1]]
+; LICM_ONLY_CONSTRAINED-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
+; LICM_ONLY_CONSTRAINED-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J]]
+; LICM_ONLY_CONSTRAINED-NEXT: store i64 [[FMUL_1]], ptr [[ARRAYIDX_J]], align 8
+; LICM_ONLY_CONSTRAINED-NEXT: br label [[FOR_COND]]
+; LICM_ONLY_CONSTRAINED: for.end:
+; LICM_ONLY_CONSTRAINED-NEXT: ret void
+;
+; LICM_AFTER_REASSOCIATE-LABEL: define void @innermost_loop_1d_shouldhoist
+; LICM_AFTER_REASSOCIATE-SAME: (i32 [[I:%.*]], i64 [[D1:%.*]], i64 [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; LICM_AFTER_REASSOCIATE-NEXT: entry:
+; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_1:%.*]] = mul i64 [[DELTA]], [[D1]]
+; LICM_AFTER_REASSOCIATE-NEXT: br label [[FOR_COND:%.*]]
+; LICM_AFTER_REASSOCIATE: for.cond:
+; LICM_AFTER_REASSOCIATE-NEXT: [[J:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[ADD_J_1:%.*]], [[FOR_BODY:%.*]] ]
+; LICM_AFTER_REASSOCIATE-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
+; LICM_AFTER_REASSOCIATE-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; LICM_AFTER_REASSOCIATE: for.body:
+; LICM_AFTER_REASSOCIATE-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
+; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
+; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
+; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_1:%.*]] = load i64, ptr [[ARRAYIDX_J_1]], align 8
+; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_2:%.*]] = mul i64 [[FMUL_1]], [[CELL_1]]
+; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
+; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds i64, ptr [[CELLS]], i64 [[IDXPROM_J]]
+; LICM_AFTER_REASSOCIATE-NEXT: store i64 [[FMUL_2]], ptr [[ARRAYIDX_J]], align 8
+; LICM_AFTER_REASSOCIATE-NEXT: br label [[FOR_COND]]
+; LICM_AFTER_REASSOCIATE: for.end:
+; LICM_AFTER_REASSOCIATE-NEXT: ret void
+;
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-LABEL: define void @innermost_loop_1d_shouldhoist
+; LICM_AFTER_REASSOCIATE_CONSTRAINED-SAME: (i32 [[I:%.*]], i64 [[D1:%.*]], i64 [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; LICM_AF...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high level, this makes exactly as much since to me as the FP variant does.
llvm/lib/Transforms/Scalar/LICM.cpp
Outdated
@@ -2719,6 +2728,65 @@ static bool hoistFPAssociation(Instruction &I, Loop &L, | |||
return true; | |||
} | |||
|
|||
static bool hoistIntAssociation(Instruction &I, Loop &L, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment as much on the existing code as the new code. We're going through the following steps:
0: ((A1 * B1) + (A2 * B2) + ...) * C where
1: ((A1 * C * B1) + (A2 * C * B2) + ...)
2: ((A1 * C) * B1 + (A2 * C) * B2 + ...)
Isn't the step going from 0 to 1 distribution, not associativity? If so, some comments need clarified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, if I'm right here about there being a distribution step here, isn't the prior FP code unsound? At least according to wikipedia (https://en.wikipedia.org/wiki/Distributive_property), floating point multiplication does not distribute over addition, and I don't know hasAllowReassoc gives enough freedom. (Since this isn't re-association).
Note that 2s complement integer math doesn't have this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be explicitly mentioned in LangRef, but we widely assume that reassoc also permits distribution.
llvm/lib/Transforms/Scalar/LICM.cpp
Outdated
} | ||
if (BO->getOpcode() != Instruction::Mul || L.isLoopInvariant(BO)) | ||
return false; | ||
Use &U0 = BO->getOperandUse(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, we should match the entire mul reduce tree, and perform the transform if any of the operands are loop invariant. Again, this is really a generalization on the prior code as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
; RUN: opt -passes='licm' -S < %s | FileCheck %s --check-prefix=LICM_ONLY | ||
; RUN: opt -passes='licm' -licm-max-num-int-reassociations=1 -S < %s | FileCheck %s --check-prefix=LICM_ONLY_CONSTRAINED | ||
; RUN: opt -passes='reassociate,loop-mssa(licm)' -S < %s | FileCheck %s --check-prefix=LICM_AFTER_REASSOCIATE | ||
; RUN: opt -passes='reassociate,loop-mssa(licm)' -licm-max-num-int-reassociations=1 -S < %s | FileCheck %s --check-prefix=LICM_AFTER_REASSOCIATE_CONSTRAINED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the tests LICM only.
This is a naive copy/paste of the FP code. We could do more integration to share code. Posting for discussion.
e6b713f
to
e4fa2d1
Compare
cl::opt<unsigned> IntAssociationUpperLimit( | ||
"licm-max-num-int-reassociations", cl::init(5U), cl::Hidden, | ||
cl::desc( | ||
"Set upper limit for the number of transformations performed " | ||
"during a single round of hoisting the reassociated expressions.")); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the type relevant to the limit? I'm not against the change just wondering if this is a compile time reduction flag whether it's worth just renaming the current one.
…aryOperator::getOperand.
Looks Like this might cause crashes on Mac: http://45.33.8.238/macm1/78525/step_11.txt Please take a look and revert for now if it takes a while to fix. |
I also saw the same crashes on 2 mac bots: |
This reverts commit 7ff5dfb. Causing crashes on Mac build bots.
…)" With a fix for build bot failure. I was accessing the type of a deleted Instruction. Original message: The reassociation this is trying to repair can happen for integer types too. This patch adds support for integer mul/add to hoistFPAssociation. The function has been renamed to hoistMulAddAssociation. I've used separate statistics and limits for integer to allow tuning flexibility.
I wrote an issue #91957 about a miscompile that starts happening with this patch. I have no idea if this patch is really to blame though or if it just uncovers something already existing. |
The reassociation this is trying to repair can happen for integer types too.
This patch adds support for integer mul/add to hoistFPAssociation. The function has been renamed to hoistMulAddAssociation. I've used separate statistics and limits for integer to allow tuning flexibility.