-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[flang] Insert stacksave/stackrestore when alloca are present in loops #95173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-flang-fir-hlfir Author: Valentin Clement (バレンタイン クレメン) (clementval) ChangesSome passes in the flang pipeline are creating This behavior can be seen in https://github.com/jacobwilliams/json-fortran/blob/master/src/tests/jf_test_36.F90 This patch insert a call to LLVM stacksave/stackrestore in the body of the loop to reclaim the alloca in its scope. note: this would require another solution for unstructured loop that are not lowered to fir.do_loop. Full diff: https://github.com/llvm/llvm-project/pull/95173.diff 2 Files Affected:
diff --git a/flang/lib/Optimizer/Transforms/ControlFlowConverter.cpp b/flang/lib/Optimizer/Transforms/ControlFlowConverter.cpp
index a233e7fbdcd1e..e75803d9571cb 100644
--- a/flang/lib/Optimizer/Transforms/ControlFlowConverter.cpp
+++ b/flang/lib/Optimizer/Transforms/ControlFlowConverter.cpp
@@ -6,6 +6,8 @@
//
//===----------------------------------------------------------------------===//
+#include "flang/Optimizer/Builder/FIRBuilder.h"
+#include "flang/Optimizer/Builder/LowLevelIntrinsics.h"
#include "flang/Optimizer/Dialect/FIRDialect.h"
#include "flang/Optimizer/Dialect/FIROps.h"
#include "flang/Optimizer/Dialect/FIROpsSupport.h"
@@ -51,6 +53,7 @@ class CfgLoopConv : public mlir::OpRewritePattern<fir::DoLoopOp> {
matchAndRewrite(DoLoopOp loop,
mlir::PatternRewriter &rewriter) const override {
auto loc = loop.getLoc();
+ bool hasAllocas = !loop.getBody()->getOps<fir::AllocaOp>().empty();
mlir::arith::IntegerOverflowFlags flags{};
if (setNSW)
flags = bitEnumSet(flags, mlir::arith::IntegerOverflowFlags::nsw);
@@ -72,6 +75,22 @@ class CfgLoopConv : public mlir::OpRewritePattern<fir::DoLoopOp> {
rewriter.splitBlock(conditionalBlock, conditionalBlock->begin());
auto *lastBlock = &loop.getRegion().back();
+ // Insert stacksave/stackrestore if there is fir.alloca operation in the
+ // loop.
+ if (hasAllocas) {
+ auto mod = loop.getOperation()->getParentOfType<mlir::ModuleOp>();
+ fir::FirOpBuilder builder(rewriter, mod);
+ builder.setInsertionPointToStart(firstBlock);
+ mlir::func::FuncOp stackSave = fir::factory::getLlvmStackSave(builder);
+ mlir::func::FuncOp stackRestore =
+ fir::factory::getLlvmStackRestore(builder);
+ mlir::Value stackPtr =
+ builder.create<fir::CallOp>(loc, stackSave).getResult(0);
+ auto *terminator = lastBlock->getTerminator();
+ builder.setInsertionPoint(terminator);
+ builder.create<fir::CallOp>(loc, stackRestore, stackPtr);
+ }
+
// Move the blocks from the DoLoopOp between initBlock and endBlock
rewriter.inlineRegionBefore(loop.getRegion(), endBlock);
diff --git a/flang/test/Fir/loop01.fir b/flang/test/Fir/loop01.fir
index c1cbb522c378c..55de3bd67b32b 100644
--- a/flang/test/Fir/loop01.fir
+++ b/flang/test/Fir/loop01.fir
@@ -542,3 +542,22 @@ func.func @y5(%lo : index, %up : index) -> index {
// NSW: fir.call @f3(%[[VAL_7]]) : (i16) -> ()
// NSW: return %[[VAL_5]] : index
// NSW: }
+
+// -----
+
+func.func @alloca_in_loop(%lb : index, %ub : index, %step : index, %b : i1, %addr : !fir.ref<index>) {
+ fir.do_loop %iv = %lb to %ub step %step unordered {
+ %0 = fir.alloca !fir.box<!fir.heap<!fir.char<1,?>>>
+ }
+ return
+}
+
+// CHECK-LABEL: func.func @alloca_in_loop
+// CHECK: ^bb1
+// CHECK: ^bb2
+// CHECK: %[[STACKPTR:.*]] = fir.call @llvm.stacksave.p0() : () -> !fir.ref<i8>
+// CHECK: fir.alloca !fir.box<!fir.heap<!fir.char<1,?>>>
+// CHECK: fir.call @llvm.stackrestore.p0(%[[STACKPTR]]) : (!fir.ref<i8>) -> ()
+// CHECK: cf.br ^bb1
+// CHECK: ^bb3:
+// CHECK: return
|
auto mod = loop.getOperation()->getParentOfType<mlir::ModuleOp>(); | ||
fir::FirOpBuilder builder(rewriter, mod); | ||
builder.setInsertionPointToStart(firstBlock); | ||
mlir::func::FuncOp stackSave = fir::factory::getLlvmStackSave(builder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to require making the pass a module pass. This is quite unfortunate.
We can try to use LLVM dialect stacksave/stackrestore
intrinsics operations, but they do not look too reliable to me, e.g. they do not have any side effects that prevent moving them around fir.alloca
. At the same time, fir.call
seems to have the same problem.
Would that be possible to insert stacksave/stackrestore calls during hlfir.concat
bufferization? It might be inefficient, but we will get correct code at least. The redundant stacksave/stackrestore might be optimized after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to require making the pass a module pass. This is quite unfortunate.
Yeah if we can avoid that it would probably be better.
Would that be possible to insert stacksave/stackrestore calls during hlfir.concat bufferization? It might be inefficient, but we will get correct code at least. The redundant stacksave/stackrestore might be optimized after that.
Yes we can be then we would miss other case (if any) of fir.alloca with dynamic size in loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they do not have any side effects that prevent moving them around fir.alloca
Are you sure? It seems the ops do not define any side effect interface, which means they can have any side effects and cannot be moved by a generic pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, Jean.
@@ -51,6 +53,7 @@ class CfgLoopConv : public mlir::OpRewritePattern<fir::DoLoopOp> { | |||
matchAndRewrite(DoLoopOp loop, | |||
mlir::PatternRewriter &rewriter) const override { | |||
auto loc = loop.getLoc(); | |||
bool hasAllocas = !loop.getBody()->getOps<fir::AllocaOp>().empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also note that @VijayKandiah is going to make changes in the code-gen to hoist constant-sized allocas to an entry block, so stacksave/stackrestore might be avoided in such cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This should be refined to count only dyn-sized alloca.
#95309 move to a specific pass |
Some passes in the flang pipeline are creating `fir.alloca` operation like `hlfir.concat`. When these allocas are located in a loop, the stack can quickly be used too much leading to segfaults. This behavior can be seen in https://github.com/jacobwilliams/json-fortran/blob/master/src/tests/jf_test_36.F90 This patch insert a call to LLVM stacksave/stackrestore in the body of the loop to reclaim the alloca in its scope. This PR is an alternative implementation to #95173
Some passes in the flang pipeline are creating
fir.alloca
operation likehlfir.concat
. When these allocas are located in a loop, the stack can quickly be used too much leading to segfaults.This behavior can be seen in https://github.com/jacobwilliams/json-fortran/blob/master/src/tests/jf_test_36.F90
This patch insert a call to LLVM stacksave/stackrestore in the body of the loop to reclaim the alloca in its scope.
note: this would require another solution for unstructured loop that are not lowered to fir.do_loop.