Skip to content

[flang] Insert stacksave/stackrestore when alloca are present in loops #95173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

clementval
Copy link
Contributor

Some passes in the flang pipeline are creating fir.alloca operation like hlfir.concat. When these allocas are located in a loop, the stack can quickly be used too much leading to segfaults.

This behavior can be seen in https://github.com/jacobwilliams/json-fortran/blob/master/src/tests/jf_test_36.F90

This patch insert a call to LLVM stacksave/stackrestore in the body of the loop to reclaim the alloca in its scope.

note: this would require another solution for unstructured loop that are not lowered to fir.do_loop.

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Jun 11, 2024
@llvmbot
Copy link
Member

llvmbot commented Jun 11, 2024

@llvm/pr-subscribers-flang-fir-hlfir

Author: Valentin Clement (バレンタイン クレメン) (clementval)

Changes

Some passes in the flang pipeline are creating fir.alloca operation like hlfir.concat. When these allocas are located in a loop, the stack can quickly be used too much leading to segfaults.

This behavior can be seen in https://github.com/jacobwilliams/json-fortran/blob/master/src/tests/jf_test_36.F90

This patch insert a call to LLVM stacksave/stackrestore in the body of the loop to reclaim the alloca in its scope.

note: this would require another solution for unstructured loop that are not lowered to fir.do_loop.


Full diff: https://github.com/llvm/llvm-project/pull/95173.diff

2 Files Affected:

  • (modified) flang/lib/Optimizer/Transforms/ControlFlowConverter.cpp (+19)
  • (modified) flang/test/Fir/loop01.fir (+19)
diff --git a/flang/lib/Optimizer/Transforms/ControlFlowConverter.cpp b/flang/lib/Optimizer/Transforms/ControlFlowConverter.cpp
index a233e7fbdcd1e..e75803d9571cb 100644
--- a/flang/lib/Optimizer/Transforms/ControlFlowConverter.cpp
+++ b/flang/lib/Optimizer/Transforms/ControlFlowConverter.cpp
@@ -6,6 +6,8 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include "flang/Optimizer/Builder/FIRBuilder.h"
+#include "flang/Optimizer/Builder/LowLevelIntrinsics.h"
 #include "flang/Optimizer/Dialect/FIRDialect.h"
 #include "flang/Optimizer/Dialect/FIROps.h"
 #include "flang/Optimizer/Dialect/FIROpsSupport.h"
@@ -51,6 +53,7 @@ class CfgLoopConv : public mlir::OpRewritePattern<fir::DoLoopOp> {
   matchAndRewrite(DoLoopOp loop,
                   mlir::PatternRewriter &rewriter) const override {
     auto loc = loop.getLoc();
+    bool hasAllocas = !loop.getBody()->getOps<fir::AllocaOp>().empty();
     mlir::arith::IntegerOverflowFlags flags{};
     if (setNSW)
       flags = bitEnumSet(flags, mlir::arith::IntegerOverflowFlags::nsw);
@@ -72,6 +75,22 @@ class CfgLoopConv : public mlir::OpRewritePattern<fir::DoLoopOp> {
         rewriter.splitBlock(conditionalBlock, conditionalBlock->begin());
     auto *lastBlock = &loop.getRegion().back();
 
+    // Insert stacksave/stackrestore if there is fir.alloca operation in the
+    // loop.
+    if (hasAllocas) {
+      auto mod = loop.getOperation()->getParentOfType<mlir::ModuleOp>();
+      fir::FirOpBuilder builder(rewriter, mod);
+      builder.setInsertionPointToStart(firstBlock);
+      mlir::func::FuncOp stackSave = fir::factory::getLlvmStackSave(builder);
+      mlir::func::FuncOp stackRestore =
+          fir::factory::getLlvmStackRestore(builder);
+      mlir::Value stackPtr =
+          builder.create<fir::CallOp>(loc, stackSave).getResult(0);
+      auto *terminator = lastBlock->getTerminator();
+      builder.setInsertionPoint(terminator);
+      builder.create<fir::CallOp>(loc, stackRestore, stackPtr);
+    }
+
     // Move the blocks from the DoLoopOp between initBlock and endBlock
     rewriter.inlineRegionBefore(loop.getRegion(), endBlock);
 
diff --git a/flang/test/Fir/loop01.fir b/flang/test/Fir/loop01.fir
index c1cbb522c378c..55de3bd67b32b 100644
--- a/flang/test/Fir/loop01.fir
+++ b/flang/test/Fir/loop01.fir
@@ -542,3 +542,22 @@ func.func @y5(%lo : index, %up : index) -> index {
 // NSW:           fir.call @f3(%[[VAL_7]]) : (i16) -> ()
 // NSW:           return %[[VAL_5]] : index
 // NSW:         }
+
+// -----
+
+func.func @alloca_in_loop(%lb : index, %ub : index, %step : index, %b : i1, %addr : !fir.ref<index>) {
+  fir.do_loop %iv = %lb to %ub step %step unordered {
+    %0 = fir.alloca !fir.box<!fir.heap<!fir.char<1,?>>>
+  }
+  return
+}
+
+// CHECK-LABEL:  func.func @alloca_in_loop
+// CHECK: ^bb1
+// CHECK: ^bb2
+// CHECK:   %[[STACKPTR:.*]] = fir.call @llvm.stacksave.p0() : () -> !fir.ref<i8>
+// CHECK:   fir.alloca !fir.box<!fir.heap<!fir.char<1,?>>>
+// CHECK:   fir.call @llvm.stackrestore.p0(%[[STACKPTR]]) : (!fir.ref<i8>) -> ()
+// CHECK:   cf.br ^bb1
+// CHECK: ^bb3:
+// CHECK:   return

auto mod = loop.getOperation()->getParentOfType<mlir::ModuleOp>();
fir::FirOpBuilder builder(rewriter, mod);
builder.setInsertionPointToStart(firstBlock);
mlir::func::FuncOp stackSave = fir::factory::getLlvmStackSave(builder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to require making the pass a module pass. This is quite unfortunate.

We can try to use LLVM dialect stacksave/stackrestore intrinsics operations, but they do not look too reliable to me, e.g. they do not have any side effects that prevent moving them around fir.alloca. At the same time, fir.call seems to have the same problem.

Would that be possible to insert stacksave/stackrestore calls during hlfir.concat bufferization? It might be inefficient, but we will get correct code at least. The redundant stacksave/stackrestore might be optimized after that.

Copy link
Contributor Author

@clementval clementval Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to require making the pass a module pass. This is quite unfortunate.

Yeah if we can avoid that it would probably be better.

Would that be possible to insert stacksave/stackrestore calls during hlfir.concat bufferization? It might be inefficient, but we will get correct code at least. The redundant stacksave/stackrestore might be optimized after that.

Yes we can be then we would miss other case (if any) of fir.alloca with dynamic size in loops.

Copy link
Contributor

@jeanPerier jeanPerier Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they do not have any side effects that prevent moving them around fir.alloca

Are you sure? It seems the ops do not define any side effect interface, which means they can have any side effects and cannot be moved by a generic pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, Jean.

@@ -51,6 +53,7 @@ class CfgLoopConv : public mlir::OpRewritePattern<fir::DoLoopOp> {
matchAndRewrite(DoLoopOp loop,
mlir::PatternRewriter &rewriter) const override {
auto loc = loop.getLoc();
bool hasAllocas = !loop.getBody()->getOps<fir::AllocaOp>().empty();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also note that @VijayKandiah is going to make changes in the code-gen to hoist constant-sized allocas to an entry block, so stacksave/stackrestore might be avoided in such cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This should be refined to count only dyn-sized alloca.

@clementval
Copy link
Contributor Author

#95309 move to a specific pass

@clementval clementval closed this Jun 12, 2024
@clementval clementval deleted the stack_loop branch June 12, 2024 21:49
clementval added a commit that referenced this pull request Jun 14, 2024
Some passes in the flang pipeline are creating `fir.alloca` operation
like `hlfir.concat`. When these allocas are located in a loop, the stack
can quickly be used too much leading to segfaults.

This behavior can be seen in
https://github.com/jacobwilliams/json-fortran/blob/master/src/tests/jf_test_36.F90

This patch insert a call to LLVM stacksave/stackrestore in the body of
the loop to reclaim the alloca in its scope.

This PR is an alternative implementation to #95173
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants