[flang] Fetch the initial reduction value from the input array. #136790

vzakhari · 2025-04-23T00:14:12Z

Instead of using loop-carried IsFirst predicate, we can fetch
the initial reduction values for MIN/MAX LOC/VAL reductions
from the array itself. This results in a little bit cleaner
loop nests, especially, generated for total reductions.
Otherwise, LLVM is able to peel the first iteration of the innermost
loop, but the surroudings of the peeled code are executed
multiple times withing the outer loop(s).

This patch does the manual peeling, which only works for
non-masked reductions where the input array is not empty.

Instead of using loop-carried IsFirst predicate, we can fetch the initial reduction values for MIN/MAX LOC/VAL reductions from the array itself. This results in a little bit cleaner loop nests, especially, generated for total reductions. Otherwise, LLVM is able to peel the first iteration of the innermost loop, but the surroudings of the peeled code are executed multiple times withing the outer loop(s). This patch does the manual peeling, which only works for non-masked reductions where the input array is not empty.

llvmbot · 2025-04-23T00:14:43Z

@llvm/pr-subscribers-flang-fir-hlfir

Author: Slava Zakharin (vzakhari)

Changes

Instead of using loop-carried IsFirst predicate, we can fetch
the initial reduction values for MIN/MAX LOC/VAL reductions
from the array itself. This results in a little bit cleaner
loop nests, especially, generated for total reductions.
Otherwise, LLVM is able to peel the first iteration of the innermost
loop, but the surroudings of the peeled code are executed
multiple times withing the outer loop(s).

This patch does the manual peeling, which only works for
non-masked reductions where the input array is not empty.

Patch is 55.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/136790.diff

5 Files Affected:

(modified) flang/lib/Optimizer/HLFIR/Transforms/SimplifyHLFIRIntrinsics.cpp (+219-40)
(modified) flang/test/HLFIR/simplify-hlfir-intrinsics-maxloc.fir (+123)
(modified) flang/test/HLFIR/simplify-hlfir-intrinsics-maxval.fir (+84)
(modified) flang/test/HLFIR/simplify-hlfir-intrinsics-minloc.fir (+123)
(modified) flang/test/HLFIR/simplify-hlfir-intrinsics-minval.fir (+84)

diff --git a/flang/lib/Optimizer/HLFIR/Transforms/SimplifyHLFIRIntrinsics.cpp b/flang/lib/Optimizer/HLFIR/Transforms/SimplifyHLFIRIntrinsics.cpp
index e9d820adbd22b..54746a45b1aec 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/SimplifyHLFIRIntrinsics.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/SimplifyHLFIRIntrinsics.cpp
@@ -232,7 +232,17 @@ class ReductionAsElementalConverter {
   /// by the reduction loop. In general, there is a single
   /// loop-carried reduction value (e.g. for SUM), but, for example,
   /// MAXLOC/MINLOC implementation uses multiple reductions.
-  virtual llvm::SmallVector<mlir::Value> genReductionInitValues() = 0;
+  /// \p oneBasedIndices contains any array indices predefined
+  /// before the reduction loop, i.e. it is empty for total
+  /// reductions, and contains the one-based indices of the wrapping
+  /// hlfir.elemental.
+  /// \p extents are the pre-computed extents of the input array.
+  /// For total reductions, \p extents holds extents of all dimensions.
+  /// For partial reductions, \p extents holds a single extent
+  /// of the DIM dimension.
+  virtual llvm::SmallVector<mlir::Value>
+  genReductionInitValues(mlir::ValueRange oneBasedIndices,
+                         const llvm::SmallVectorImpl<mlir::Value> &extents) = 0;
 
   /// Perform reduction(s) update given a single input array's element
   /// identified by \p array and \p oneBasedIndices coordinates.
@@ -396,6 +406,54 @@ genMinMaxComparison(mlir::Location loc, fir::FirOpBuilder &builder,
   llvm_unreachable("unsupported type");
 }
 
+// Generate a predicate value indicating that an array with the given
+// extents is not empty.
+static mlir::Value
+genIsNotEmptyArrayExtents(mlir::Location loc, fir::FirOpBuilder &builder,
+                          const llvm::SmallVectorImpl<mlir::Value> &extents) {
+  mlir::Value isNotEmpty = builder.createBool(loc, true);
+  for (auto extent : extents) {
+    mlir::Value zero =
+        fir::factory::createZeroValue(builder, loc, extent.getType());
+    mlir::Value cmp = builder.create<mlir::arith::CmpIOp>(
+        loc, mlir::arith::CmpIPredicate::ne, extent, zero);
+    isNotEmpty = builder.create<mlir::arith::AndIOp>(loc, isNotEmpty, cmp);
+  }
+  return isNotEmpty;
+}
+
+// Helper method for MIN/MAX LOC/VAL reductions.
+// It returns a vector of indices such that they address
+// the first element of an array (in case of total reduction)
+// or its section (in case of partial reduction).
+//
+// If case of total reduction oneBasedIndices must be empty,
+// otherwise, they contain the one based indices of the wrapping
+// hlfir.elemental.
+// Basically, the method adds the necessary number of constant-one
+// indices into oneBasedIndices.
+static llvm::SmallVector<mlir::Value> genFirstElementIndicesForReduction(
+    mlir::Location loc, fir::FirOpBuilder &builder, bool isTotalReduction,
+    mlir::FailureOr<int64_t> dim, unsigned rank,
+    mlir::ValueRange oneBasedIndices) {
+  llvm::SmallVector<mlir::Value> indices{oneBasedIndices};
+  mlir::Value one =
+      builder.createIntegerConstant(loc, builder.getIndexType(), 1);
+  if (isTotalReduction) {
+    assert(oneBasedIndices.size() == 0 &&
+           "wrong number of indices for total reduction");
+    // Set indices to all-ones.
+    indices.append(rank, one);
+  } else {
+    assert(oneBasedIndices.size() == rank - 1 &&
+           "there must be RANK-1 indices for partial reduction");
+    assert(mlir::succeeded(dim) && "partial reduction with invalid DIM");
+    // Insert constant-one index at DIM dimension.
+    indices.insert(indices.begin() + *dim - 1, one);
+  }
+  return indices;
+}
+
 /// Implementation of ReductionAsElementalConverter interface
 /// for MAXLOC/MINLOC.
 template <typename T>
@@ -410,6 +468,9 @@ class MinMaxlocAsElementalConverter : public ReductionAsElementalConverter {
   //   * 1 reduction value holding the current MIN/MAX.
   //   * 1 boolean indicating whether it is the first time
   //     the mask is true.
+  //
+  // If precomputeFirst() returns true, then the boolean loop-carried
+  // value is not used.
   static constexpr unsigned maxNumReductions = Fortran::common::maxRank + 2;
   static constexpr bool isMax = std::is_same_v<T, hlfir::MaxlocOp>;
   using Base = ReductionAsElementalConverter;
@@ -444,7 +505,9 @@ class MinMaxlocAsElementalConverter : public ReductionAsElementalConverter {
     return getResultRank() == 0 || !getDim();
   }
 
-  virtual llvm::SmallVector<mlir::Value> genReductionInitValues() final;
+  virtual llvm::SmallVector<mlir::Value> genReductionInitValues(
+      mlir::ValueRange oneBasedIndices,
+      const llvm::SmallVectorImpl<mlir::Value> &extents) final;
   virtual llvm::SmallVector<mlir::Value>
   reduceOneElement(const llvm::SmallVectorImpl<mlir::Value> &currentValue,
                    hlfir::Entity array, mlir::ValueRange oneBasedIndices) final;
@@ -460,8 +523,12 @@ class MinMaxlocAsElementalConverter : public ReductionAsElementalConverter {
 
   void
   checkReductions(const llvm::SmallVectorImpl<mlir::Value> &reductions) const {
-    assert(reductions.size() == getNumCoors() + 2 &&
-           "invalid number of reductions for MINLOC/MAXLOC");
+    if (precomputeFirst())
+      assert(reductions.size() == getNumCoors() + 1 &&
+             "invalid number of reductions for MINLOC/MAXLOC");
+    else
+      assert(reductions.size() == getNumCoors() + 2 &&
+             "invalid number of reductions for MINLOC/MAXLOC");
   }
 
   mlir::Value
@@ -473,13 +540,52 @@ class MinMaxlocAsElementalConverter : public ReductionAsElementalConverter {
   mlir::Value
   getIsFirst(const llvm::SmallVectorImpl<mlir::Value> &reductions) const {
     checkReductions(reductions);
+    assert(!precomputeFirst() && "IsFirst predicate must not be used");
     return reductions[getNumCoors() + 1];
   }
+
+  // Return true iff the reductions can be initialized
+  // by reading the first element of the array (or its section).
+  // If it returns false, then we use an auxiliary boolean
+  // to identify the very first reduction update.
+  bool precomputeFirst() const { return !getMask(); }
 };
 
 template <typename T>
 llvm::SmallVector<mlir::Value>
-MinMaxlocAsElementalConverter<T>::genReductionInitValues() {
+MinMaxlocAsElementalConverter<T>::genReductionInitValues(
+    mlir::ValueRange oneBasedIndices,
+    const llvm::SmallVectorImpl<mlir::Value> &extents) {
+  fir::IfOp ifOp;
+  if (precomputeFirst()) {
+    // Check if we can load the value of the first element in the array
+    // or its section (for partial reduction).
+    assert(extents.size() == getNumCoors() &&
+           "wrong number of extents for MINLOC/MAXLOC reduction");
+    mlir::Value isNotEmpty = genIsNotEmptyArrayExtents(loc, builder, extents);
+
+    llvm::SmallVector<mlir::Value> indices = genFirstElementIndicesForReduction(
+        loc, builder, isTotalReduction(), getConstDim(), getSourceRank(),
+        oneBasedIndices);
+
+    llvm::SmallVector<mlir::Type> ifTypes(getNumCoors(),
+                                          getResultElementType());
+    ifTypes.push_back(getSourceElementType());
+    ifOp = builder.create<fir::IfOp>(loc, ifTypes, isNotEmpty,
+                                     /*withElseRegion=*/true);
+    builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
+    mlir::Value one =
+        builder.createIntegerConstant(loc, getResultElementType(), 1);
+    llvm::SmallVector<mlir::Value> results(getNumCoors(), one);
+    mlir::Value minMaxFirst =
+        hlfir::loadElementAt(loc, builder, hlfir::Entity{getSource()}, indices);
+    results.push_back(minMaxFirst);
+    builder.create<fir::ResultOp>(loc, results);
+
+    // In the 'else' block use default init values.
+    builder.setInsertionPointToStart(&ifOp.getElseRegion().front());
+  }
+
   // Initial value for the coordinate(s) is zero.
   mlir::Value zeroCoor =
       fir::factory::createZeroValue(builder, loc, getResultElementType());
@@ -490,11 +596,17 @@ MinMaxlocAsElementalConverter<T>::genReductionInitValues() {
       genMinMaxInitValue<isMax>(loc, builder, getSourceElementType());
   result.push_back(minMaxInit);
 
-  // Initial value for isFirst predicate. It is switched to false,
-  // when the reduction update dynamically happens inside the reduction
-  // loop.
-  mlir::Value trueVal = builder.createBool(loc, true);
-  result.push_back(trueVal);
+  if (ifOp) {
+    builder.create<fir::ResultOp>(loc, result);
+    builder.setInsertionPointAfter(ifOp);
+    result = ifOp.getResults();
+  } else {
+    // Initial value for isFirst predicate. It is switched to false,
+    // when the reduction update dynamically happens inside the reduction
+    // loop.
+    mlir::Value trueVal = builder.createBool(loc, true);
+    result.push_back(trueVal);
+  }
 
   return result;
 }
@@ -509,9 +621,12 @@ MinMaxlocAsElementalConverter<T>::reduceOneElement(
       hlfir::loadElementAt(loc, builder, array, oneBasedIndices);
   mlir::Value cmp = genMinMaxComparison<isMax>(loc, builder, elementValue,
                                                getCurrentMinMax(currentValue));
-  // If isFirst is true, then do the reduction update regardless
-  // of the FP comparison.
-  cmp = builder.create<mlir::arith::OrIOp>(loc, cmp, getIsFirst(currentValue));
+  if (!precomputeFirst()) {
+    // If isFirst is true, then do the reduction update regardless
+    // of the FP comparison.
+    cmp =
+        builder.create<mlir::arith::OrIOp>(loc, cmp, getIsFirst(currentValue));
+  }
 
   llvm::SmallVector<mlir::Value> newIndices;
   int64_t dim = 1;
@@ -537,8 +652,10 @@ MinMaxlocAsElementalConverter<T>::reduceOneElement(
       loc, cmp, elementValue, getCurrentMinMax(currentValue));
   newIndices.push_back(newMinMax);
 
-  mlir::Value newIsFirst = builder.createBool(loc, false);
-  newIndices.push_back(newIsFirst);
+  if (!precomputeFirst()) {
+    mlir::Value newIsFirst = builder.createBool(loc, false);
+    newIndices.push_back(newIsFirst);
+  }
 
   assert(currentValue.size() == newIndices.size() &&
          "invalid number of updated reductions");
@@ -629,7 +746,8 @@ class MinMaxvalAsElementalConverter
   //
   // The boolean flag is used to replace the initial value
   // with the first input element even if it is NaN.
-  static constexpr unsigned numReductions = 2;
+  // If precomputeFirst() returns true, then the boolean loop-carried
+  // value is not used.
   static constexpr bool isMax = std::is_same_v<T, hlfir::MaxvalOp>;
   using Base = NumericReductionAsElementalConverterBase<T>;
 
@@ -646,19 +764,9 @@ class MinMaxvalAsElementalConverter
     return mlir::success();
   }
 
-  virtual llvm::SmallVector<mlir::Value> genReductionInitValues() final {
-    llvm::SmallVector<mlir::Value> result;
-    fir::FirOpBuilder &builder = this->builder;
-    mlir::Location loc = this->loc;
-    mlir::Value init =
-        genMinMaxInitValue<isMax>(loc, builder, this->getResultElementType());
-    result.push_back(init);
-    // Initial value for isFirst predicate. It is switched to false,
-    // when the reduction update dynamically happens inside the reduction
-    // loop.
-    result.push_back(builder.createBool(loc, true));
-    return result;
-  }
+  virtual llvm::SmallVector<mlir::Value> genReductionInitValues(
+      mlir::ValueRange oneBasedIndices,
+      const llvm::SmallVectorImpl<mlir::Value> &extents) final;
 
   virtual llvm::SmallVector<mlir::Value>
   reduceOneElement(const llvm::SmallVectorImpl<mlir::Value> &currentValue,
@@ -673,12 +781,14 @@ class MinMaxvalAsElementalConverter
     mlir::Value currentMinMax = getCurrentMinMax(currentValue);
     mlir::Value cmp =
         genMinMaxComparison<isMax>(loc, builder, elementValue, currentMinMax);
-    cmp =
-        builder.create<mlir::arith::OrIOp>(loc, cmp, getIsFirst(currentValue));
+    if (!precomputeFirst())
+      cmp = builder.create<mlir::arith::OrIOp>(loc, cmp,
+                                               getIsFirst(currentValue));
     mlir::Value newMinMax = builder.create<mlir::arith::SelectOp>(
         loc, cmp, elementValue, currentMinMax);
     result.push_back(newMinMax);
-    result.push_back(builder.createBool(loc, false));
+    if (!precomputeFirst())
+      result.push_back(builder.createBool(loc, false));
     return result;
   }
 
@@ -690,7 +800,7 @@ class MinMaxvalAsElementalConverter
 
   void
   checkReductions(const llvm::SmallVectorImpl<mlir::Value> &reductions) const {
-    assert(reductions.size() == numReductions &&
+    assert(reductions.size() == getNumReductions() &&
            "invalid number of reductions for MINVAL/MAXVAL");
   }
 
@@ -703,10 +813,70 @@ class MinMaxvalAsElementalConverter
   mlir::Value
   getIsFirst(const llvm::SmallVectorImpl<mlir::Value> &reductions) const {
     this->checkReductions(reductions);
+    assert(!precomputeFirst() && "IsFirst predicate must not be used");
     return reductions[1];
   }
+
+  // Return true iff the reductions can be initialized
+  // by reading the first element of the array (or its section).
+  // If it returns false, then we use an auxiliary boolean
+  // to identify the very first reduction update.
+  bool precomputeFirst() const { return !this->getMask(); }
+
+  std::size_t getNumReductions() const { return precomputeFirst() ? 1 : 2; }
 };
 
+template <typename T>
+llvm::SmallVector<mlir::Value>
+MinMaxvalAsElementalConverter<T>::genReductionInitValues(
+    mlir::ValueRange oneBasedIndices,
+    const llvm::SmallVectorImpl<mlir::Value> &extents) {
+  llvm::SmallVector<mlir::Value> result;
+  fir::FirOpBuilder &builder = this->builder;
+  mlir::Location loc = this->loc;
+
+  fir::IfOp ifOp;
+  if (precomputeFirst()) {
+    // Check if we can load the value of the first element in the array
+    // or its section (for partial reduction).
+    assert(extents.size() == this->isTotalReduction()
+               ? this->getSourceRank()
+               : 1u && "wrong number of extents for MINVAL/MAXVAL reduction");
+    mlir::Value isNotEmpty = genIsNotEmptyArrayExtents(loc, builder, extents);
+    llvm::SmallVector<mlir::Value> indices = genFirstElementIndicesForReduction(
+        loc, builder, this->isTotalReduction(), this->getConstDim(),
+        this->getSourceRank(), oneBasedIndices);
+
+    ifOp =
+        builder.create<fir::IfOp>(loc, this->getResultElementType(), isNotEmpty,
+                                  /*withElseRegion=*/true);
+    builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
+    mlir::Value minMaxFirst = hlfir::loadElementAt(
+        loc, builder, hlfir::Entity{this->getSource()}, indices);
+    builder.create<fir::ResultOp>(loc, minMaxFirst);
+
+    // In the 'else' block use default init values.
+    builder.setInsertionPointToStart(&ifOp.getElseRegion().front());
+  }
+
+  mlir::Value init =
+      genMinMaxInitValue<isMax>(loc, builder, this->getResultElementType());
+  result.push_back(init);
+
+  if (ifOp) {
+    builder.create<fir::ResultOp>(loc, result);
+    builder.setInsertionPointAfter(ifOp);
+    result = ifOp.getResults();
+  } else {
+    // Initial value for isFirst predicate. It is switched to false,
+    // when the reduction update dynamically happens inside the reduction
+    // loop.
+    result.push_back(builder.createBool(loc, true));
+  }
+
+  return result;
+}
+
 /// Reduction converter for SUM.
 class SumAsElementalConverter
     : public NumericReductionAsElementalConverterBase<hlfir::SumOp> {
@@ -717,7 +887,10 @@ class SumAsElementalConverter
       : Base{op, rewriter} {}
 
 private:
-  virtual llvm::SmallVector<mlir::Value> genReductionInitValues() final {
+  virtual llvm::SmallVector<mlir::Value> genReductionInitValues(
+      [[maybe_unused]] mlir::ValueRange oneBasedIndices,
+      [[maybe_unused]] const llvm::SmallVectorImpl<mlir::Value> &extents)
+      final {
     return {
         fir::factory::createZeroValue(builder, loc, getResultElementType())};
   }
@@ -781,7 +954,10 @@ class AllAnyAsElementalConverter
       : Base{op, rewriter} {}
 
 private:
-  virtual llvm::SmallVector<mlir::Value> genReductionInitValues() final {
+  virtual llvm::SmallVector<mlir::Value> genReductionInitValues(
+      [[maybe_unused]] mlir::ValueRange oneBasedIndices,
+      [[maybe_unused]] const llvm::SmallVectorImpl<mlir::Value> &extents)
+      final {
     return {this->builder.createBool(this->loc, isAll ? true : false)};
   }
   virtual llvm::SmallVector<mlir::Value>
@@ -819,7 +995,10 @@ class CountAsElementalConverter
       : Base{op, rewriter} {}
 
 private:
-  virtual llvm::SmallVector<mlir::Value> genReductionInitValues() final {
+  virtual llvm::SmallVector<mlir::Value> genReductionInitValues(
+      [[maybe_unused]] mlir::ValueRange oneBasedIndices,
+      [[maybe_unused]] const llvm::SmallVectorImpl<mlir::Value> &extents)
+      final {
     return {
         fir::factory::createZeroValue(builder, loc, getResultElementType())};
   }
@@ -881,10 +1060,6 @@ mlir::LogicalResult ReductionAsElementalConverter::convert() {
     // Loop over all indices in the DIM dimension, and reduce all values.
     // If DIM is not present, do total reduction.
 
-    // Initial value for the reduction.
-    llvm::SmallVector<mlir::Value, 1> reductionInitValues =
-        genReductionInitValues();
-
     llvm::SmallVector<mlir::Value> extents;
     if (isTotalReduce)
       extents = arrayExtents;
@@ -892,6 +1067,10 @@ mlir::LogicalResult ReductionAsElementalConverter::convert() {
       extents.push_back(
           builder.createConvert(loc, builder.getIndexType(), dimExtent));
 
+    // Initial value for the reduction.
+    llvm::SmallVector<mlir::Value, 1> reductionInitValues =
+        genReductionInitValues(inputIndices, extents);
+
     auto genBody = [&](mlir::Location loc, fir::FirOpBuilder &builder,
                        mlir::ValueRange oneBasedIndices,
                        mlir::ValueRange reductionArgs)
diff --git a/flang/test/HLFIR/simplify-hlfir-intrinsics-maxloc.fir b/flang/test/HLFIR/simplify-hlfir-intrinsics-maxloc.fir
index 4e9f5d0ebb08a..f917f99c2b9ab 100644
--- a/flang/test/HLFIR/simplify-hlfir-intrinsics-maxloc.fir
+++ b/flang/test/HLFIR/simplify-hlfir-intrinsics-maxloc.fir
@@ -294,6 +294,129 @@ func.func @test_partial_var(%input: !fir.box<!fir.array<?x?x?xf32>>, %mask: !fir
 // CHECK:           return %[[VAL_14]] : !hlfir.expr<?x?xi32>
 // CHECK:         }
 
+func.func @test_total_expr_nomask(%input: !hlfir.expr<?x?x?xf32>) -> !hlfir.expr<3xi32> {
+  %0 = hlfir.maxloc %input {fastmath = #arith.fastmath<reassoc>} : (!hlfir.expr<?x?x?xf32>) -> !hlfir.expr<3xi32>
+  return %0 : !hlfir.expr<3xi32>
+}
+// CHECK-LABEL:   func.func @test_total_expr_nomask(
+// CHECK-SAME:                                      %[[VAL_0:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: !hlfir.expr<?x?x?xf32>) -> !hlfir.expr<3xi32> {
+// CHECK:           %[[VAL_1:.*]] = arith.constant false
+// CHECK:           %[[VAL_2:.*]] = arith.constant 3 : index
+// CHECK:           %[[VAL_3:.*]] = arith.constant 2 : index
+// CHECK:           %[[VAL_4:.*]] = arith.constant -3.40282347E+38 : f32
+// CHECK:           %[[VAL_5:.*]] = arith.constant 0 : i32
+// CHECK:           %[[VAL_6:.*]] = arith.constant 1 : i32
+// CHECK:           %[[VAL_7:.*]] = arith.constant 1 : index
+// CHECK:           %[[VAL_8:.*]] = arith.constant 0 : index
+// CHECK:           %[[VAL_9:.*]] = fir.alloca !fir.array<3xi32>
+// CHECK:           %[[VAL_10:.*]] = hlfir.shape_of %[[VAL_0]] : (!hlfir.expr<?x?x?xf32>) -> !fir.shape<3>
+// CHECK:           %[[VAL_11:.*]] = hlfir.get_extent %[[VAL_10]] {dim = 0 : index} : (!fir.shape<3>) -> index
+// CHECK:           %[[VAL_12:.*]] = hlfir.get_extent %[[VAL_10]] {dim = 1 : index} : (!fir.shape<3>) -> index
+// CHECK:           %[[VAL_13:.*]] = hlfir.get_extent %[[VAL_10]] {dim = 2 : index} : (!fir.shape<3>) -> index
+// CHECK:           %[[VAL_14:.*]] = arith.cmpi ne, %[[VAL_11]], %[[VAL_8]] : index
+// CHECK:           %[[VAL_15:.*]] = arith.cmpi ne, %[[VAL_12]], %[[VAL_8]] : index
+// CHECK:           %[[VAL_16:.*]] = arith.andi %[[VAL_14]], %[[VAL_15]] : i1
+// CHECK:           %[[VAL_17:.*]] = arith.cmpi ne, %[[VAL_13]], %[[VAL_8]] : index
+// CHECK:           %[[VAL_18:.*]] = arith.andi %[[VAL_16]], %[[VAL_17]] : i1
+// CHECK:           %[[VAL_19:.*]]:4 = fir.if %[[VAL_18]] -> (i32, i32, i32, f32) {
+// CHECK:             %[[VAL_20:.*]] = hlfir.apply %[[VAL_0]], %[[VAL_7]], %[[VAL_7]], %[[VAL_7]] : (!...
[truncated]

jeanPerier

Naïve question: why is a first element needed for numerical min/max val/loc?

Isn't there a comparison + iteration order that could be used so that the type representation min/max can be used as a starting point instead of some value from the input array?

For instance, assuming MAXLOC with BACK being absent, we could start with zero coordinates and a temporary max being the numerical min for that type, then the array could be iterated backwards (maybe that is a bad idea from a memory access perspective though) always replacing the result when something bigger or equal is found. Then there is no need to deal with the array size and mask.

Again, this is a naïve question, it may be a terrible idea from a performance perspective and just be incorrect with some corner cases (nans probably? although I think fcmp uge would work in my example to select the first NaN in array order as being the result location (assuming that is what we want)).

Otherwise, the implementation of your solution looks good to me.

vzakhari · 2025-04-23T18:08:56Z

Naïve question: why is a first element needed for numerical min/max val/loc?

Isn't there a comparison + iteration order that could be used so that the type representation min/max can be used as a starting point instead of some value from the input array?

For instance, assuming MAXLOC with BACK being absent, we could start with zero coordinates and a temporary max being the numerical min for that type, then the array could be iterated backwards (maybe that is a bad idea from a memory access perspective though) always replacing the result when something bigger or equal is found. Then there is no need to deal with the array size and mask.

Again, this is a naïve question, it may be a terrible idea from a performance perspective and just be incorrect with some corner cases (nans probably? although I think fcmp uge would work in my example to select the first NaN in array order as being the result location (assuming that is what we want)).

Otherwise, the implementation of your solution looks good to me.

Thank you for the review, Jean!

As you pointed out, I think we have to do it this way to properly handle NaNs. I believe your approach with uge will not work in this case:

input = (/NaN, 1.0/)
result = smallest;
result = (1.0 .uge. result) ? 1.0 : result; // yields 1.0
result (NaN .uge. result) ? NaN : result; // yields NaN

The actual result should be 1.0.

jeanPerier · 2025-04-24T12:02:20Z

As you pointed out, I think we have to do it this way to properly handle NaNs. I believe your approach with uge will not work in this case:
input = (/NaN, 1.0/)
result = smallest;
result = (1.0 .uge. result) ? 1.0 : result; // yields 1.0
result (NaN .uge. result) ? NaN : result; // yields NaN
The actual result should be 1.0.

I see, thanks, I expected NaN should be the result here, but that is indeed not what other compilers are doing (I am not quite sure what is the requirement with regards to NaN in the Fortran spec though...)... That indeed makes it impossible to find a comparison function+comparison order here because NaNs should be selected if this is this is the first element (to cover the all NaNs case), but not otherwise...

BTW, I see that IFX/Ifort and nvfortran/classic flang are a bit different in behavior and return -Inf for maxval on an all NaN inputs, so `they probably have an easier time getting optimal code here which could explain better perf (to get this result, it is possible to just init to -Inf and never select NaNs).

Are we sure about the requirement that all NaNs should be NaN and not -Inf for MAXVAL?

subroutine foo(x, y)
real :: x(10000), y
y = maxval(x)
end


real :: x(10000), y, zero
zero = 0.
x = 0./zero
call foo(x, y)
print *, y
end

vzakhari · 2025-04-24T14:54:27Z

We used to use +/-Inf in the OptimizedBufferization, but with my changes I started getting failures on some tests. It was probably in gfortran suite (which is following gfortran behavior, of course). I am not sure if I saw failures in other suites. I will try to recollect the results.

@klausler can you please comment on Jean's question above? The current HLFIR inlining follows the Fortran runtime implementation, but should we change both? I did not find any handling of NaNs in MINMAXLOCVAL specification in the standard, but it might be specified in some other section.

klausler · 2025-04-24T15:53:15Z

Extensions.md documents our treatment of NaN values with these transformational intrinsics:

For real `MAXVAL`, `MINVAL`, `MAXLOC`, and `MINLOC`, NaN values are essentially ignored unless there are some unmasked array entries and *all* of them are NaNs.

vzakhari · 2025-04-24T19:27:52Z

I do not see much consistency between the compilers for Jean's maxval example. Here is the results for x86:

nvfortran: -Inf
nagfor: -3.4028235E+38
gfortran/flang: NaN
ifx: -Infinity
xlf: -0.3402823466E+39

Results for an empty array:

nvfortran: -Inf
nagfor: -3.4028235E
gfortran/flang: -3.40282347E+38
ifx: -3.4028235E+38
xlf: -0.3402823466E+39

Array of all NaN with 1.0 in the middle: all compilers produce 1.0.

So it looks like Flang's implementation adds to the diversity by joining gfortran.

If I use +/-Inf init value, Flang will work as nvfortran for all-NaNs and empty arrays. Nvfortran empty arrays handling is not per standard. I can make Flang return +/-largest, so the results will be:
For all-NaNs: +/-Inf
For empty arrays: +/-largest

This implementation will only completely match ifx. On a positive note, it will probably be faster, because there will be just one unodered comparison between the current reduction value and the new element.

Again, there is not much consistency here, so I am okay with either implementation. Please let me know what you think.

vzakhari · 2025-04-24T19:36:40Z

Another options is to stick to xlf/nagfor:
For all-NaNs: +/-largest
For empty arrays: +/-largest

klausler · 2025-04-24T19:37:58Z

It was because of the lack of clear precedent that we defined the NaN treatment in a way that seemed the most sensible in a mathematical sense. NaNs are not ordered, and fail any comparison, so ignoring them for minval/maxval &c. is defensible. It's also the behavior specified for IEEE-754 floating-point maxNum/minNum operations (and their Fortran bindings).

vzakhari · 2025-04-24T19:56:05Z

Thanks! So it looks like we can switch to using +/-largest init and ignoring NaNs in both Fortran runtime and HLFIR/FIR inlining.

I will disable those gfortran tests that fail in llvm-test-suite.

klausler · 2025-04-24T20:08:35Z

Thanks! So it looks like we can switch to using +/-largest init and ignoring NaNs in both Fortran runtime and HLFIR/FIR inlining.

I will disable those gfortran tests that fail in llvm-test-suite.

Are there tests that are failing today with the runtime implementation?

vzakhari · 2025-04-24T20:10:06Z

I am not aware of any current failures. If we change the behavior, some gfortran tests will fail.

klausler · 2025-04-24T20:13:40Z

I am not aware of any current failures. If we change the behavior, some gfortran tests will fail.

I'm confused. Those tests are running today with the f18 runtime implementation of maximal, minval, maxloc, and minloc, yes? Are you changing the f18 runtime semantics for those routines? I thought you're modifying just the open-coded implementation in fir+hlfir of these functions.

vzakhari · 2025-04-24T20:16:20Z

Yes, those tests run successfuly with F18 runtime and with HLFIR inlining.

Jean suggested changing/simplifying HLFIR inlining. In order to do that and keep getting consitent results for -O0 and -O2, I believe we need to change F18 runtime as well. These changes will cause gfortran tests to fail.

vzakhari · 2025-04-24T20:16:43Z

Created an RFC: https://discourse.llvm.org/t/rfc-ignore-nans-for-minval-maxval-minloc-maxloc/86036

vzakhari · 2025-04-24T22:25:15Z

Peter made a good point that we should return a NaN for all-NaNs case. I think for MINLOC/MAXLOC this means that we return the position of the first unmasked NaN.

So I do not see how I can change my PR to make the code faster, in general. For nnan case, though, I think we can always select +/-LARGEST, and we do not need to read the init value from the array. That should provide cleaner/faster code for -ffast-math users.

klausler · 2025-04-24T23:39:17Z

return the position of the first unmasked NaN.

Or last, of course, depending on BACK=.

jeanPerier

Thanks for looking into our requirement rational here. LGTM then.

tblah

LGTM. Thanks for sticking with this through the difficult issues in the RFC.

vzakhari · 2025-04-28T17:40:59Z

Thank you for the reviews and the discussions! Can you please review additional change to simplify the generated code for 'nnan' case?

jeanPerier

Thank you Slava!

…#136790) Instead of using loop-carried IsFirst predicate, we can fetch the initial reduction values for MIN/MAX LOC/VAL reductions from the array itself. This results in a little bit cleaner loop nests, especially, generated for total reductions. Otherwise, LLVM is able to peel the first iteration of the innermost loop, but the surroudings of the peeled code are executed multiple times withing the outer loop(s). This patch does the manual peeling, which only works for non-masked reductions where the input array is not empty.

vzakhari requested review from tblah and jeanPerier April 23, 2025 00:14

llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Apr 23, 2025

jeanPerier reviewed Apr 23, 2025

View reviewed changes

jeanPerier approved these changes Apr 25, 2025

View reviewed changes

tblah approved these changes Apr 26, 2025

View reviewed changes

Simplified generated code for fast-math 'nnan'.

7380b75

vzakhari requested review from tblah and jeanPerier April 29, 2025 17:18

jeanPerier approved these changes Apr 30, 2025

View reviewed changes

vzakhari merged commit 7dad8b9 into llvm:main Apr 30, 2025
11 checks passed

[flang] Fetch the initial reduction value from the input array. #136790

[flang] Fetch the initial reduction value from the input array. #136790

Uh oh!

Conversation

vzakhari commented Apr 23, 2025

Uh oh!

llvmbot commented Apr 23, 2025

Uh oh!

jeanPerier left a comment

Choose a reason for hiding this comment

Uh oh!

vzakhari commented Apr 23, 2025

Uh oh!

jeanPerier commented Apr 24, 2025

Uh oh!

vzakhari commented Apr 24, 2025

Uh oh!

klausler commented Apr 24, 2025

Uh oh!

vzakhari commented Apr 24, 2025

Uh oh!

vzakhari commented Apr 24, 2025

Uh oh!

klausler commented Apr 24, 2025

Uh oh!

vzakhari commented Apr 24, 2025

Uh oh!

klausler commented Apr 24, 2025

Uh oh!

vzakhari commented Apr 24, 2025

Uh oh!

klausler commented Apr 24, 2025

Uh oh!

vzakhari commented Apr 24, 2025

Uh oh!

vzakhari commented Apr 24, 2025

Uh oh!

vzakhari commented Apr 24, 2025

Uh oh!

klausler commented Apr 24, 2025

Uh oh!

jeanPerier left a comment

Choose a reason for hiding this comment

Uh oh!

tblah left a comment

Choose a reason for hiding this comment

Uh oh!

vzakhari commented Apr 28, 2025

Uh oh!

jeanPerier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!