[flang] Insert stacksave/stackrestore when alloca are present in loops #95173

clementval · 2024-06-11T21:17:16Z

Some passes in the flang pipeline are creating fir.alloca operation like hlfir.concat. When these allocas are located in a loop, the stack can quickly be used too much leading to segfaults.

This behavior can be seen in https://github.com/jacobwilliams/json-fortran/blob/master/src/tests/jf_test_36.F90

This patch insert a call to LLVM stacksave/stackrestore in the body of the loop to reclaim the alloca in its scope.

note: this would require another solution for unstructured loop that are not lowered to fir.do_loop.

llvmbot · 2024-06-11T21:26:28Z

@llvm/pr-subscribers-flang-fir-hlfir

Author: Valentin Clement (バレンタインクレメン) (clementval)

Changes

Some passes in the flang pipeline are creating fir.alloca operation like hlfir.concat. When these allocas are located in a loop, the stack can quickly be used too much leading to segfaults.

This behavior can be seen in https://github.com/jacobwilliams/json-fortran/blob/master/src/tests/jf_test_36.F90

This patch insert a call to LLVM stacksave/stackrestore in the body of the loop to reclaim the alloca in its scope.

note: this would require another solution for unstructured loop that are not lowered to fir.do_loop.

Full diff: https://github.com/llvm/llvm-project/pull/95173.diff

2 Files Affected:

(modified) flang/lib/Optimizer/Transforms/ControlFlowConverter.cpp (+19)
(modified) flang/test/Fir/loop01.fir (+19)

diff --git a/flang/lib/Optimizer/Transforms/ControlFlowConverter.cpp b/flang/lib/Optimizer/Transforms/ControlFlowConverter.cpp
index a233e7fbdcd1e..e75803d9571cb 100644
--- a/flang/lib/Optimizer/Transforms/ControlFlowConverter.cpp
+++ b/flang/lib/Optimizer/Transforms/ControlFlowConverter.cpp
@@ -6,6 +6,8 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include "flang/Optimizer/Builder/FIRBuilder.h"
+#include "flang/Optimizer/Builder/LowLevelIntrinsics.h"
 #include "flang/Optimizer/Dialect/FIRDialect.h"
 #include "flang/Optimizer/Dialect/FIROps.h"
 #include "flang/Optimizer/Dialect/FIROpsSupport.h"
@@ -51,6 +53,7 @@ class CfgLoopConv : public mlir::OpRewritePattern<fir::DoLoopOp> {
   matchAndRewrite(DoLoopOp loop,
                   mlir::PatternRewriter &rewriter) const override {
     auto loc = loop.getLoc();
+    bool hasAllocas = !loop.getBody()->getOps<fir::AllocaOp>().empty();
     mlir::arith::IntegerOverflowFlags flags{};
     if (setNSW)
       flags = bitEnumSet(flags, mlir::arith::IntegerOverflowFlags::nsw);
@@ -72,6 +75,22 @@ class CfgLoopConv : public mlir::OpRewritePattern<fir::DoLoopOp> {
         rewriter.splitBlock(conditionalBlock, conditionalBlock->begin());
     auto *lastBlock = &loop.getRegion().back();
 
+    // Insert stacksave/stackrestore if there is fir.alloca operation in the
+    // loop.
+    if (hasAllocas) {
+      auto mod = loop.getOperation()->getParentOfType<mlir::ModuleOp>();
+      fir::FirOpBuilder builder(rewriter, mod);
+      builder.setInsertionPointToStart(firstBlock);
+      mlir::func::FuncOp stackSave = fir::factory::getLlvmStackSave(builder);
+      mlir::func::FuncOp stackRestore =
+          fir::factory::getLlvmStackRestore(builder);
+      mlir::Value stackPtr =
+          builder.create<fir::CallOp>(loc, stackSave).getResult(0);
+      auto *terminator = lastBlock->getTerminator();
+      builder.setInsertionPoint(terminator);
+      builder.create<fir::CallOp>(loc, stackRestore, stackPtr);
+    }
+
     // Move the blocks from the DoLoopOp between initBlock and endBlock
     rewriter.inlineRegionBefore(loop.getRegion(), endBlock);
 
diff --git a/flang/test/Fir/loop01.fir b/flang/test/Fir/loop01.fir
index c1cbb522c378c..55de3bd67b32b 100644
--- a/flang/test/Fir/loop01.fir
+++ b/flang/test/Fir/loop01.fir
@@ -542,3 +542,22 @@ func.func @y5(%lo : index, %up : index) -> index {
 // NSW:           fir.call @f3(%[[VAL_7]]) : (i16) -> ()
 // NSW:           return %[[VAL_5]] : index
 // NSW:         }
+
+// -----
+
+func.func @alloca_in_loop(%lb : index, %ub : index, %step : index, %b : i1, %addr : !fir.ref<index>) {
+  fir.do_loop %iv = %lb to %ub step %step unordered {
+    %0 = fir.alloca !fir.box<!fir.heap<!fir.char<1,?>>>
+  }
+  return
+}
+
+// CHECK-LABEL:  func.func @alloca_in_loop
+// CHECK: ^bb1
+// CHECK: ^bb2
+// CHECK:   %[[STACKPTR:.*]] = fir.call @llvm.stacksave.p0() : () -> !fir.ref<i8>
+// CHECK:   fir.alloca !fir.box<!fir.heap<!fir.char<1,?>>>
+// CHECK:   fir.call @llvm.stackrestore.p0(%[[STACKPTR]]) : (!fir.ref<i8>) -> ()
+// CHECK:   cf.br ^bb1
+// CHECK: ^bb3:
+// CHECK:   return

vzakhari · 2024-06-11T21:39:23Z

flang/lib/Optimizer/Transforms/ControlFlowConverter.cpp

+      auto mod = loop.getOperation()->getParentOfType<mlir::ModuleOp>();
+      fir::FirOpBuilder builder(rewriter, mod);
+      builder.setInsertionPointToStart(firstBlock);
+      mlir::func::FuncOp stackSave = fir::factory::getLlvmStackSave(builder);


This seems to require making the pass a module pass. This is quite unfortunate.

We can try to use LLVM dialect stacksave/stackrestore intrinsics operations, but they do not look too reliable to me, e.g. they do not have any side effects that prevent moving them around fir.alloca. At the same time, fir.call seems to have the same problem.

Would that be possible to insert stacksave/stackrestore calls during hlfir.concat bufferization? It might be inefficient, but we will get correct code at least. The redundant stacksave/stackrestore might be optimized after that.

This seems to require making the pass a module pass. This is quite unfortunate.

Yeah if we can avoid that it would probably be better.

Would that be possible to insert stacksave/stackrestore calls during hlfir.concat bufferization? It might be inefficient, but we will get correct code at least. The redundant stacksave/stackrestore might be optimized after that.

Yes we can be then we would miss other case (if any) of fir.alloca with dynamic size in loops.

they do not have any side effects that prevent moving them around fir.alloca

Are you sure? It seems the ops do not define any side effect interface, which means they can have any side effects and cannot be moved by a generic pass.

You are right, Jean.

vzakhari · 2024-06-11T21:39:34Z

flang/lib/Optimizer/Transforms/ControlFlowConverter.cpp

@@ -51,6 +53,7 @@ class CfgLoopConv : public mlir::OpRewritePattern<fir::DoLoopOp> {
  matchAndRewrite(DoLoopOp loop,
                  mlir::PatternRewriter &rewriter) const override {
    auto loc = loop.getLoc();
+    bool hasAllocas = !loop.getBody()->getOps<fir::AllocaOp>().empty();


Please also note that @VijayKandiah is going to make changes in the code-gen to hoist constant-sized allocas to an entry block, so stacksave/stackrestore might be avoided in such cases.

Yes. This should be refined to count only dyn-sized alloca.

clementval · 2024-06-12T20:49:08Z

#95309 move to a specific pass

Some passes in the flang pipeline are creating `fir.alloca` operation like `hlfir.concat`. When these allocas are located in a loop, the stack can quickly be used too much leading to segfaults. This behavior can be seen in https://github.com/jacobwilliams/json-fortran/blob/master/src/tests/jf_test_36.F90 This patch insert a call to LLVM stacksave/stackrestore in the body of the loop to reclaim the alloca in its scope. This PR is an alternative implementation to #95173

[flang] Insert stacksave/stackrestore when alloca are present in loops

ec2499b

clementval requested review from jeanPerier and vzakhari June 11, 2024 21:17

llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Jun 11, 2024

vzakhari reviewed Jun 11, 2024

View reviewed changes

clementval mentioned this pull request Jun 12, 2024

[flang] Add stack reclaim pass to reclaim allocas in loop #95309

Merged

clementval closed this Jun 12, 2024

clementval deleted the stack_loop branch June 12, 2024 21:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[flang] Insert stacksave/stackrestore when alloca are present in loops #95173

[flang] Insert stacksave/stackrestore when alloca are present in loops #95173

Uh oh!

clementval commented Jun 11, 2024

Uh oh!

llvmbot commented Jun 11, 2024

Uh oh!

vzakhari Jun 11, 2024

Uh oh!

clementval Jun 11, 2024 •

edited

Loading

Uh oh!

jeanPerier Jun 12, 2024 •

edited

Loading

Uh oh!

vzakhari Jun 12, 2024

Uh oh!

vzakhari Jun 11, 2024

Uh oh!

clementval Jun 11, 2024

Uh oh!

clementval commented Jun 12, 2024

Uh oh!

Uh oh!

[flang] Insert stacksave/stackrestore when alloca are present in loops #95173

[flang] Insert stacksave/stackrestore when alloca are present in loops #95173

Uh oh!

Conversation

clementval commented Jun 11, 2024

Uh oh!

llvmbot commented Jun 11, 2024

Uh oh!

vzakhari Jun 11, 2024

Choose a reason for hiding this comment

Uh oh!

clementval Jun 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeanPerier Jun 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vzakhari Jun 12, 2024

Choose a reason for hiding this comment

Uh oh!

vzakhari Jun 11, 2024

Choose a reason for hiding this comment

Uh oh!

clementval Jun 11, 2024

Choose a reason for hiding this comment

Uh oh!

clementval commented Jun 12, 2024

Uh oh!

Uh oh!

clementval Jun 11, 2024 •

edited

Loading

jeanPerier Jun 12, 2024 •

edited

Loading