Skip to content

[flang] Fetch the initial reduction value from the input array. #136790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 30, 2025

Conversation

vzakhari
Copy link
Contributor

Instead of using loop-carried IsFirst predicate, we can fetch
the initial reduction values for MIN/MAX LOC/VAL reductions
from the array itself. This results in a little bit cleaner
loop nests, especially, generated for total reductions.
Otherwise, LLVM is able to peel the first iteration of the innermost
loop, but the surroudings of the peeled code are executed
multiple times withing the outer loop(s).

This patch does the manual peeling, which only works for
non-masked reductions where the input array is not empty.

Instead of using loop-carried IsFirst predicate, we can fetch
the initial reduction values for MIN/MAX LOC/VAL reductions
from the array itself. This results in a little bit cleaner
loop nests, especially, generated for total reductions.
Otherwise, LLVM is able to peel the first iteration of the innermost
loop, but the surroudings of the peeled code are executed
multiple times withing the outer loop(s).

This patch does the manual peeling, which only works for
non-masked reductions where the input array is not empty.
@vzakhari vzakhari requested review from tblah and jeanPerier April 23, 2025 00:14
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Apr 23, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 23, 2025

@llvm/pr-subscribers-flang-fir-hlfir

Author: Slava Zakharin (vzakhari)

Changes

Instead of using loop-carried IsFirst predicate, we can fetch
the initial reduction values for MIN/MAX LOC/VAL reductions
from the array itself. This results in a little bit cleaner
loop nests, especially, generated for total reductions.
Otherwise, LLVM is able to peel the first iteration of the innermost
loop, but the surroudings of the peeled code are executed
multiple times withing the outer loop(s).

This patch does the manual peeling, which only works for
non-masked reductions where the input array is not empty.


Patch is 55.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/136790.diff

5 Files Affected:

  • (modified) flang/lib/Optimizer/HLFIR/Transforms/SimplifyHLFIRIntrinsics.cpp (+219-40)
  • (modified) flang/test/HLFIR/simplify-hlfir-intrinsics-maxloc.fir (+123)
  • (modified) flang/test/HLFIR/simplify-hlfir-intrinsics-maxval.fir (+84)
  • (modified) flang/test/HLFIR/simplify-hlfir-intrinsics-minloc.fir (+123)
  • (modified) flang/test/HLFIR/simplify-hlfir-intrinsics-minval.fir (+84)
diff --git a/flang/lib/Optimizer/HLFIR/Transforms/SimplifyHLFIRIntrinsics.cpp b/flang/lib/Optimizer/HLFIR/Transforms/SimplifyHLFIRIntrinsics.cpp
index e9d820adbd22b..54746a45b1aec 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/SimplifyHLFIRIntrinsics.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/SimplifyHLFIRIntrinsics.cpp
@@ -232,7 +232,17 @@ class ReductionAsElementalConverter {
   /// by the reduction loop. In general, there is a single
   /// loop-carried reduction value (e.g. for SUM), but, for example,
   /// MAXLOC/MINLOC implementation uses multiple reductions.
-  virtual llvm::SmallVector<mlir::Value> genReductionInitValues() = 0;
+  /// \p oneBasedIndices contains any array indices predefined
+  /// before the reduction loop, i.e. it is empty for total
+  /// reductions, and contains the one-based indices of the wrapping
+  /// hlfir.elemental.
+  /// \p extents are the pre-computed extents of the input array.
+  /// For total reductions, \p extents holds extents of all dimensions.
+  /// For partial reductions, \p extents holds a single extent
+  /// of the DIM dimension.
+  virtual llvm::SmallVector<mlir::Value>
+  genReductionInitValues(mlir::ValueRange oneBasedIndices,
+                         const llvm::SmallVectorImpl<mlir::Value> &extents) = 0;
 
   /// Perform reduction(s) update given a single input array's element
   /// identified by \p array and \p oneBasedIndices coordinates.
@@ -396,6 +406,54 @@ genMinMaxComparison(mlir::Location loc, fir::FirOpBuilder &builder,
   llvm_unreachable("unsupported type");
 }
 
+// Generate a predicate value indicating that an array with the given
+// extents is not empty.
+static mlir::Value
+genIsNotEmptyArrayExtents(mlir::Location loc, fir::FirOpBuilder &builder,
+                          const llvm::SmallVectorImpl<mlir::Value> &extents) {
+  mlir::Value isNotEmpty = builder.createBool(loc, true);
+  for (auto extent : extents) {
+    mlir::Value zero =
+        fir::factory::createZeroValue(builder, loc, extent.getType());
+    mlir::Value cmp = builder.create<mlir::arith::CmpIOp>(
+        loc, mlir::arith::CmpIPredicate::ne, extent, zero);
+    isNotEmpty = builder.create<mlir::arith::AndIOp>(loc, isNotEmpty, cmp);
+  }
+  return isNotEmpty;
+}
+
+// Helper method for MIN/MAX LOC/VAL reductions.
+// It returns a vector of indices such that they address
+// the first element of an array (in case of total reduction)
+// or its section (in case of partial reduction).
+//
+// If case of total reduction oneBasedIndices must be empty,
+// otherwise, they contain the one based indices of the wrapping
+// hlfir.elemental.
+// Basically, the method adds the necessary number of constant-one
+// indices into oneBasedIndices.
+static llvm::SmallVector<mlir::Value> genFirstElementIndicesForReduction(
+    mlir::Location loc, fir::FirOpBuilder &builder, bool isTotalReduction,
+    mlir::FailureOr<int64_t> dim, unsigned rank,
+    mlir::ValueRange oneBasedIndices) {
+  llvm::SmallVector<mlir::Value> indices{oneBasedIndices};
+  mlir::Value one =
+      builder.createIntegerConstant(loc, builder.getIndexType(), 1);
+  if (isTotalReduction) {
+    assert(oneBasedIndices.size() == 0 &&
+           "wrong number of indices for total reduction");
+    // Set indices to all-ones.
+    indices.append(rank, one);
+  } else {
+    assert(oneBasedIndices.size() == rank - 1 &&
+           "there must be RANK-1 indices for partial reduction");
+    assert(mlir::succeeded(dim) && "partial reduction with invalid DIM");
+    // Insert constant-one index at DIM dimension.
+    indices.insert(indices.begin() + *dim - 1, one);
+  }
+  return indices;
+}
+
 /// Implementation of ReductionAsElementalConverter interface
 /// for MAXLOC/MINLOC.
 template <typename T>
@@ -410,6 +468,9 @@ class MinMaxlocAsElementalConverter : public ReductionAsElementalConverter {
   //   * 1 reduction value holding the current MIN/MAX.
   //   * 1 boolean indicating whether it is the first time
   //     the mask is true.
+  //
+  // If precomputeFirst() returns true, then the boolean loop-carried
+  // value is not used.
   static constexpr unsigned maxNumReductions = Fortran::common::maxRank + 2;
   static constexpr bool isMax = std::is_same_v<T, hlfir::MaxlocOp>;
   using Base = ReductionAsElementalConverter;
@@ -444,7 +505,9 @@ class MinMaxlocAsElementalConverter : public ReductionAsElementalConverter {
     return getResultRank() == 0 || !getDim();
   }
 
-  virtual llvm::SmallVector<mlir::Value> genReductionInitValues() final;
+  virtual llvm::SmallVector<mlir::Value> genReductionInitValues(
+      mlir::ValueRange oneBasedIndices,
+      const llvm::SmallVectorImpl<mlir::Value> &extents) final;
   virtual llvm::SmallVector<mlir::Value>
   reduceOneElement(const llvm::SmallVectorImpl<mlir::Value> &currentValue,
                    hlfir::Entity array, mlir::ValueRange oneBasedIndices) final;
@@ -460,8 +523,12 @@ class MinMaxlocAsElementalConverter : public ReductionAsElementalConverter {
 
   void
   checkReductions(const llvm::SmallVectorImpl<mlir::Value> &reductions) const {
-    assert(reductions.size() == getNumCoors() + 2 &&
-           "invalid number of reductions for MINLOC/MAXLOC");
+    if (precomputeFirst())
+      assert(reductions.size() == getNumCoors() + 1 &&
+             "invalid number of reductions for MINLOC/MAXLOC");
+    else
+      assert(reductions.size() == getNumCoors() + 2 &&
+             "invalid number of reductions for MINLOC/MAXLOC");
   }
 
   mlir::Value
@@ -473,13 +540,52 @@ class MinMaxlocAsElementalConverter : public ReductionAsElementalConverter {
   mlir::Value
   getIsFirst(const llvm::SmallVectorImpl<mlir::Value> &reductions) const {
     checkReductions(reductions);
+    assert(!precomputeFirst() && "IsFirst predicate must not be used");
     return reductions[getNumCoors() + 1];
   }
+
+  // Return true iff the reductions can be initialized
+  // by reading the first element of the array (or its section).
+  // If it returns false, then we use an auxiliary boolean
+  // to identify the very first reduction update.
+  bool precomputeFirst() const { return !getMask(); }
 };
 
 template <typename T>
 llvm::SmallVector<mlir::Value>
-MinMaxlocAsElementalConverter<T>::genReductionInitValues() {
+MinMaxlocAsElementalConverter<T>::genReductionInitValues(
+    mlir::ValueRange oneBasedIndices,
+    const llvm::SmallVectorImpl<mlir::Value> &extents) {
+  fir::IfOp ifOp;
+  if (precomputeFirst()) {
+    // Check if we can load the value of the first element in the array
+    // or its section (for partial reduction).
+    assert(extents.size() == getNumCoors() &&
+           "wrong number of extents for MINLOC/MAXLOC reduction");
+    mlir::Value isNotEmpty = genIsNotEmptyArrayExtents(loc, builder, extents);
+
+    llvm::SmallVector<mlir::Value> indices = genFirstElementIndicesForReduction(
+        loc, builder, isTotalReduction(), getConstDim(), getSourceRank(),
+        oneBasedIndices);
+
+    llvm::SmallVector<mlir::Type> ifTypes(getNumCoors(),
+                                          getResultElementType());
+    ifTypes.push_back(getSourceElementType());
+    ifOp = builder.create<fir::IfOp>(loc, ifTypes, isNotEmpty,
+                                     /*withElseRegion=*/true);
+    builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
+    mlir::Value one =
+        builder.createIntegerConstant(loc, getResultElementType(), 1);
+    llvm::SmallVector<mlir::Value> results(getNumCoors(), one);
+    mlir::Value minMaxFirst =
+        hlfir::loadElementAt(loc, builder, hlfir::Entity{getSource()}, indices);
+    results.push_back(minMaxFirst);
+    builder.create<fir::ResultOp>(loc, results);
+
+    // In the 'else' block use default init values.
+    builder.setInsertionPointToStart(&ifOp.getElseRegion().front());
+  }
+
   // Initial value for the coordinate(s) is zero.
   mlir::Value zeroCoor =
       fir::factory::createZeroValue(builder, loc, getResultElementType());
@@ -490,11 +596,17 @@ MinMaxlocAsElementalConverter<T>::genReductionInitValues() {
       genMinMaxInitValue<isMax>(loc, builder, getSourceElementType());
   result.push_back(minMaxInit);
 
-  // Initial value for isFirst predicate. It is switched to false,
-  // when the reduction update dynamically happens inside the reduction
-  // loop.
-  mlir::Value trueVal = builder.createBool(loc, true);
-  result.push_back(trueVal);
+  if (ifOp) {
+    builder.create<fir::ResultOp>(loc, result);
+    builder.setInsertionPointAfter(ifOp);
+    result = ifOp.getResults();
+  } else {
+    // Initial value for isFirst predicate. It is switched to false,
+    // when the reduction update dynamically happens inside the reduction
+    // loop.
+    mlir::Value trueVal = builder.createBool(loc, true);
+    result.push_back(trueVal);
+  }
 
   return result;
 }
@@ -509,9 +621,12 @@ MinMaxlocAsElementalConverter<T>::reduceOneElement(
       hlfir::loadElementAt(loc, builder, array, oneBasedIndices);
   mlir::Value cmp = genMinMaxComparison<isMax>(loc, builder, elementValue,
                                                getCurrentMinMax(currentValue));
-  // If isFirst is true, then do the reduction update regardless
-  // of the FP comparison.
-  cmp = builder.create<mlir::arith::OrIOp>(loc, cmp, getIsFirst(currentValue));
+  if (!precomputeFirst()) {
+    // If isFirst is true, then do the reduction update regardless
+    // of the FP comparison.
+    cmp =
+        builder.create<mlir::arith::OrIOp>(loc, cmp, getIsFirst(currentValue));
+  }
 
   llvm::SmallVector<mlir::Value> newIndices;
   int64_t dim = 1;
@@ -537,8 +652,10 @@ MinMaxlocAsElementalConverter<T>::reduceOneElement(
       loc, cmp, elementValue, getCurrentMinMax(currentValue));
   newIndices.push_back(newMinMax);
 
-  mlir::Value newIsFirst = builder.createBool(loc, false);
-  newIndices.push_back(newIsFirst);
+  if (!precomputeFirst()) {
+    mlir::Value newIsFirst = builder.createBool(loc, false);
+    newIndices.push_back(newIsFirst);
+  }
 
   assert(currentValue.size() == newIndices.size() &&
          "invalid number of updated reductions");
@@ -629,7 +746,8 @@ class MinMaxvalAsElementalConverter
   //
   // The boolean flag is used to replace the initial value
   // with the first input element even if it is NaN.
-  static constexpr unsigned numReductions = 2;
+  // If precomputeFirst() returns true, then the boolean loop-carried
+  // value is not used.
   static constexpr bool isMax = std::is_same_v<T, hlfir::MaxvalOp>;
   using Base = NumericReductionAsElementalConverterBase<T>;
 
@@ -646,19 +764,9 @@ class MinMaxvalAsElementalConverter
     return mlir::success();
   }
 
-  virtual llvm::SmallVector<mlir::Value> genReductionInitValues() final {
-    llvm::SmallVector<mlir::Value> result;
-    fir::FirOpBuilder &builder = this->builder;
-    mlir::Location loc = this->loc;
-    mlir::Value init =
-        genMinMaxInitValue<isMax>(loc, builder, this->getResultElementType());
-    result.push_back(init);
-    // Initial value for isFirst predicate. It is switched to false,
-    // when the reduction update dynamically happens inside the reduction
-    // loop.
-    result.push_back(builder.createBool(loc, true));
-    return result;
-  }
+  virtual llvm::SmallVector<mlir::Value> genReductionInitValues(
+      mlir::ValueRange oneBasedIndices,
+      const llvm::SmallVectorImpl<mlir::Value> &extents) final;
 
   virtual llvm::SmallVector<mlir::Value>
   reduceOneElement(const llvm::SmallVectorImpl<mlir::Value> &currentValue,
@@ -673,12 +781,14 @@ class MinMaxvalAsElementalConverter
     mlir::Value currentMinMax = getCurrentMinMax(currentValue);
     mlir::Value cmp =
         genMinMaxComparison<isMax>(loc, builder, elementValue, currentMinMax);
-    cmp =
-        builder.create<mlir::arith::OrIOp>(loc, cmp, getIsFirst(currentValue));
+    if (!precomputeFirst())
+      cmp = builder.create<mlir::arith::OrIOp>(loc, cmp,
+                                               getIsFirst(currentValue));
     mlir::Value newMinMax = builder.create<mlir::arith::SelectOp>(
         loc, cmp, elementValue, currentMinMax);
     result.push_back(newMinMax);
-    result.push_back(builder.createBool(loc, false));
+    if (!precomputeFirst())
+      result.push_back(builder.createBool(loc, false));
     return result;
   }
 
@@ -690,7 +800,7 @@ class MinMaxvalAsElementalConverter
 
   void
   checkReductions(const llvm::SmallVectorImpl<mlir::Value> &reductions) const {
-    assert(reductions.size() == numReductions &&
+    assert(reductions.size() == getNumReductions() &&
            "invalid number of reductions for MINVAL/MAXVAL");
   }
 
@@ -703,10 +813,70 @@ class MinMaxvalAsElementalConverter
   mlir::Value
   getIsFirst(const llvm::SmallVectorImpl<mlir::Value> &reductions) const {
     this->checkReductions(reductions);
+    assert(!precomputeFirst() && "IsFirst predicate must not be used");
     return reductions[1];
   }
+
+  // Return true iff the reductions can be initialized
+  // by reading the first element of the array (or its section).
+  // If it returns false, then we use an auxiliary boolean
+  // to identify the very first reduction update.
+  bool precomputeFirst() const { return !this->getMask(); }
+
+  std::size_t getNumReductions() const { return precomputeFirst() ? 1 : 2; }
 };
 
+template <typename T>
+llvm::SmallVector<mlir::Value>
+MinMaxvalAsElementalConverter<T>::genReductionInitValues(
+    mlir::ValueRange oneBasedIndices,
+    const llvm::SmallVectorImpl<mlir::Value> &extents) {
+  llvm::SmallVector<mlir::Value> result;
+  fir::FirOpBuilder &builder = this->builder;
+  mlir::Location loc = this->loc;
+
+  fir::IfOp ifOp;
+  if (precomputeFirst()) {
+    // Check if we can load the value of the first element in the array
+    // or its section (for partial reduction).
+    assert(extents.size() == this->isTotalReduction()
+               ? this->getSourceRank()
+               : 1u && "wrong number of extents for MINVAL/MAXVAL reduction");
+    mlir::Value isNotEmpty = genIsNotEmptyArrayExtents(loc, builder, extents);
+    llvm::SmallVector<mlir::Value> indices = genFirstElementIndicesForReduction(
+        loc, builder, this->isTotalReduction(), this->getConstDim(),
+        this->getSourceRank(), oneBasedIndices);
+
+    ifOp =
+        builder.create<fir::IfOp>(loc, this->getResultElementType(), isNotEmpty,
+                                  /*withElseRegion=*/true);
+    builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
+    mlir::Value minMaxFirst = hlfir::loadElementAt(
+        loc, builder, hlfir::Entity{this->getSource()}, indices);
+    builder.create<fir::ResultOp>(loc, minMaxFirst);
+
+    // In the 'else' block use default init values.
+    builder.setInsertionPointToStart(&ifOp.getElseRegion().front());
+  }
+
+  mlir::Value init =
+      genMinMaxInitValue<isMax>(loc, builder, this->getResultElementType());
+  result.push_back(init);
+
+  if (ifOp) {
+    builder.create<fir::ResultOp>(loc, result);
+    builder.setInsertionPointAfter(ifOp);
+    result = ifOp.getResults();
+  } else {
+    // Initial value for isFirst predicate. It is switched to false,
+    // when the reduction update dynamically happens inside the reduction
+    // loop.
+    result.push_back(builder.createBool(loc, true));
+  }
+
+  return result;
+}
+
 /// Reduction converter for SUM.
 class SumAsElementalConverter
     : public NumericReductionAsElementalConverterBase<hlfir::SumOp> {
@@ -717,7 +887,10 @@ class SumAsElementalConverter
       : Base{op, rewriter} {}
 
 private:
-  virtual llvm::SmallVector<mlir::Value> genReductionInitValues() final {
+  virtual llvm::SmallVector<mlir::Value> genReductionInitValues(
+      [[maybe_unused]] mlir::ValueRange oneBasedIndices,
+      [[maybe_unused]] const llvm::SmallVectorImpl<mlir::Value> &extents)
+      final {
     return {
         fir::factory::createZeroValue(builder, loc, getResultElementType())};
   }
@@ -781,7 +954,10 @@ class AllAnyAsElementalConverter
       : Base{op, rewriter} {}
 
 private:
-  virtual llvm::SmallVector<mlir::Value> genReductionInitValues() final {
+  virtual llvm::SmallVector<mlir::Value> genReductionInitValues(
+      [[maybe_unused]] mlir::ValueRange oneBasedIndices,
+      [[maybe_unused]] const llvm::SmallVectorImpl<mlir::Value> &extents)
+      final {
     return {this->builder.createBool(this->loc, isAll ? true : false)};
   }
   virtual llvm::SmallVector<mlir::Value>
@@ -819,7 +995,10 @@ class CountAsElementalConverter
       : Base{op, rewriter} {}
 
 private:
-  virtual llvm::SmallVector<mlir::Value> genReductionInitValues() final {
+  virtual llvm::SmallVector<mlir::Value> genReductionInitValues(
+      [[maybe_unused]] mlir::ValueRange oneBasedIndices,
+      [[maybe_unused]] const llvm::SmallVectorImpl<mlir::Value> &extents)
+      final {
     return {
         fir::factory::createZeroValue(builder, loc, getResultElementType())};
   }
@@ -881,10 +1060,6 @@ mlir::LogicalResult ReductionAsElementalConverter::convert() {
     // Loop over all indices in the DIM dimension, and reduce all values.
     // If DIM is not present, do total reduction.
 
-    // Initial value for the reduction.
-    llvm::SmallVector<mlir::Value, 1> reductionInitValues =
-        genReductionInitValues();
-
     llvm::SmallVector<mlir::Value> extents;
     if (isTotalReduce)
       extents = arrayExtents;
@@ -892,6 +1067,10 @@ mlir::LogicalResult ReductionAsElementalConverter::convert() {
       extents.push_back(
           builder.createConvert(loc, builder.getIndexType(), dimExtent));
 
+    // Initial value for the reduction.
+    llvm::SmallVector<mlir::Value, 1> reductionInitValues =
+        genReductionInitValues(inputIndices, extents);
+
     auto genBody = [&](mlir::Location loc, fir::FirOpBuilder &builder,
                        mlir::ValueRange oneBasedIndices,
                        mlir::ValueRange reductionArgs)
diff --git a/flang/test/HLFIR/simplify-hlfir-intrinsics-maxloc.fir b/flang/test/HLFIR/simplify-hlfir-intrinsics-maxloc.fir
index 4e9f5d0ebb08a..f917f99c2b9ab 100644
--- a/flang/test/HLFIR/simplify-hlfir-intrinsics-maxloc.fir
+++ b/flang/test/HLFIR/simplify-hlfir-intrinsics-maxloc.fir
@@ -294,6 +294,129 @@ func.func @test_partial_var(%input: !fir.box<!fir.array<?x?x?xf32>>, %mask: !fir
 // CHECK:           return %[[VAL_14]] : !hlfir.expr<?x?xi32>
 // CHECK:         }
 
+func.func @test_total_expr_nomask(%input: !hlfir.expr<?x?x?xf32>) -> !hlfir.expr<3xi32> {
+  %0 = hlfir.maxloc %input {fastmath = #arith.fastmath<reassoc>} : (!hlfir.expr<?x?x?xf32>) -> !hlfir.expr<3xi32>
+  return %0 : !hlfir.expr<3xi32>
+}
+// CHECK-LABEL:   func.func @test_total_expr_nomask(
+// CHECK-SAME:                                      %[[VAL_0:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: !hlfir.expr<?x?x?xf32>) -> !hlfir.expr<3xi32> {
+// CHECK:           %[[VAL_1:.*]] = arith.constant false
+// CHECK:           %[[VAL_2:.*]] = arith.constant 3 : index
+// CHECK:           %[[VAL_3:.*]] = arith.constant 2 : index
+// CHECK:           %[[VAL_4:.*]] = arith.constant -3.40282347E+38 : f32
+// CHECK:           %[[VAL_5:.*]] = arith.constant 0 : i32
+// CHECK:           %[[VAL_6:.*]] = arith.constant 1 : i32
+// CHECK:           %[[VAL_7:.*]] = arith.constant 1 : index
+// CHECK:           %[[VAL_8:.*]] = arith.constant 0 : index
+// CHECK:           %[[VAL_9:.*]] = fir.alloca !fir.array<3xi32>
+// CHECK:           %[[VAL_10:.*]] = hlfir.shape_of %[[VAL_0]] : (!hlfir.expr<?x?x?xf32>) -> !fir.shape<3>
+// CHECK:           %[[VAL_11:.*]] = hlfir.get_extent %[[VAL_10]] {dim = 0 : index} : (!fir.shape<3>) -> index
+// CHECK:           %[[VAL_12:.*]] = hlfir.get_extent %[[VAL_10]] {dim = 1 : index} : (!fir.shape<3>) -> index
+// CHECK:           %[[VAL_13:.*]] = hlfir.get_extent %[[VAL_10]] {dim = 2 : index} : (!fir.shape<3>) -> index
+// CHECK:           %[[VAL_14:.*]] = arith.cmpi ne, %[[VAL_11]], %[[VAL_8]] : index
+// CHECK:           %[[VAL_15:.*]] = arith.cmpi ne, %[[VAL_12]], %[[VAL_8]] : index
+// CHECK:           %[[VAL_16:.*]] = arith.andi %[[VAL_14]], %[[VAL_15]] : i1
+// CHECK:           %[[VAL_17:.*]] = arith.cmpi ne, %[[VAL_13]], %[[VAL_8]] : index
+// CHECK:           %[[VAL_18:.*]] = arith.andi %[[VAL_16]], %[[VAL_17]] : i1
+// CHECK:           %[[VAL_19:.*]]:4 = fir.if %[[VAL_18]] -> (i32, i32, i32, f32) {
+// CHECK:             %[[VAL_20:.*]] = hlfir.apply %[[VAL_0]], %[[VAL_7]], %[[VAL_7]], %[[VAL_7]] : (!...
[truncated]

Copy link
Contributor

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naïve question: why is a first element needed for numerical min/max val/loc?

Isn't there a comparison + iteration order that could be used so that the type representation min/max can be used as a starting point instead of some value from the input array?

For instance, assuming MAXLOC with BACK being absent, we could start with zero coordinates and a temporary max being the numerical min for that type, then the array could be iterated backwards (maybe that is a bad idea from a memory access perspective though) always replacing the result when something bigger or equal is found. Then there is no need to deal with the array size and mask.

Again, this is a naïve question, it may be a terrible idea from a performance perspective and just be incorrect with some corner cases (nans probably? although I think fcmp uge would work in my example to select the first NaN in array order as being the result location (assuming that is what we want)).

Otherwise, the implementation of your solution looks good to me.

@vzakhari
Copy link
Contributor Author

Naïve question: why is a first element needed for numerical min/max val/loc?

Isn't there a comparison + iteration order that could be used so that the type representation min/max can be used as a starting point instead of some value from the input array?

For instance, assuming MAXLOC with BACK being absent, we could start with zero coordinates and a temporary max being the numerical min for that type, then the array could be iterated backwards (maybe that is a bad idea from a memory access perspective though) always replacing the result when something bigger or equal is found. Then there is no need to deal with the array size and mask.

Again, this is a naïve question, it may be a terrible idea from a performance perspective and just be incorrect with some corner cases (nans probably? although I think fcmp uge would work in my example to select the first NaN in array order as being the result location (assuming that is what we want)).

Otherwise, the implementation of your solution looks good to me.

Thank you for the review, Jean!

As you pointed out, I think we have to do it this way to properly handle NaNs. I believe your approach with uge will not work in this case:

input = (/NaN, 1.0/)
result = smallest;
result = (1.0 .uge. result) ? 1.0 : result; // yields 1.0
result (NaN .uge. result) ? NaN : result; // yields NaN

The actual result should be 1.0.

@jeanPerier
Copy link
Contributor

As you pointed out, I think we have to do it this way to properly handle NaNs. I believe your approach with uge will not work in this case:

input = (/NaN, 1.0/)
result = smallest;
result = (1.0 .uge. result) ? 1.0 : result; // yields 1.0
result (NaN .uge. result) ? NaN : result; // yields NaN

The actual result should be 1.0.

I see, thanks, I expected NaN should be the result here, but that is indeed not what other compilers are doing (I am not quite sure what is the requirement with regards to NaN in the Fortran spec though...)... That indeed makes it impossible to find a comparison function+comparison order here because NaNs should be selected if this is this is the first element (to cover the all NaNs case), but not otherwise...

BTW, I see that IFX/Ifort and nvfortran/classic flang are a bit different in behavior and return -Inf for maxval on an all NaN inputs, so `they probably have an easier time getting optimal code here which could explain better perf (to get this result, it is possible to just init to -Inf and never select NaNs).

Are we sure about the requirement that all NaNs should be NaN and not -Inf for MAXVAL?

subroutine foo(x, y)
real :: x(10000), y
y = maxval(x)
end


real :: x(10000), y, zero
zero = 0.
x = 0./zero
call foo(x, y)
print *, y
end

@vzakhari
Copy link
Contributor Author

We used to use +/-Inf in the OptimizedBufferization, but with my changes I started getting failures on some tests. It was probably in gfortran suite (which is following gfortran behavior, of course). I am not sure if I saw failures in other suites. I will try to recollect the results.

@klausler can you please comment on Jean's question above? The current HLFIR inlining follows the Fortran runtime implementation, but should we change both? I did not find any handling of NaNs in MINMAXLOCVAL specification in the standard, but it might be specified in some other section.

@klausler
Copy link
Contributor

Extensions.md documents our treatment of NaN values with these transformational intrinsics:

For real `MAXVAL`, `MINVAL`, `MAXLOC`, and `MINLOC`, NaN values are essentially ignored unless there are some unmasked array entries and *all* of them are NaNs.

@vzakhari
Copy link
Contributor Author

I do not see much consistency between the compilers for Jean's maxval example. Here is the results for x86:

  • nvfortran: -Inf
  • nagfor: -3.4028235E+38
  • gfortran/flang: NaN
  • ifx: -Infinity
  • xlf: -0.3402823466E+39

Results for an empty array:

  • nvfortran: -Inf
  • nagfor: -3.4028235E
  • gfortran/flang: -3.40282347E+38
  • ifx: -3.4028235E+38
  • xlf: -0.3402823466E+39

Array of all NaN with 1.0 in the middle: all compilers produce 1.0.

So it looks like Flang's implementation adds to the diversity by joining gfortran.

If I use +/-Inf init value, Flang will work as nvfortran for all-NaNs and empty arrays. Nvfortran empty arrays handling is not per standard. I can make Flang return +/-largest, so the results will be:
For all-NaNs: +/-Inf
For empty arrays: +/-largest

This implementation will only completely match ifx. On a positive note, it will probably be faster, because there will be just one unodered comparison between the current reduction value and the new element.

Again, there is not much consistency here, so I am okay with either implementation. Please let me know what you think.

@vzakhari
Copy link
Contributor Author

Another options is to stick to xlf/nagfor:
For all-NaNs: +/-largest
For empty arrays: +/-largest

@klausler
Copy link
Contributor

It was because of the lack of clear precedent that we defined the NaN treatment in a way that seemed the most sensible in a mathematical sense. NaNs are not ordered, and fail any comparison, so ignoring them for minval/maxval &c. is defensible. It's also the behavior specified for IEEE-754 floating-point maxNum/minNum operations (and their Fortran bindings).

@vzakhari
Copy link
Contributor Author

Thanks! So it looks like we can switch to using +/-largest init and ignoring NaNs in both Fortran runtime and HLFIR/FIR inlining.

I will disable those gfortran tests that fail in llvm-test-suite.

@klausler
Copy link
Contributor

Thanks! So it looks like we can switch to using +/-largest init and ignoring NaNs in both Fortran runtime and HLFIR/FIR inlining.

I will disable those gfortran tests that fail in llvm-test-suite.

Are there tests that are failing today with the runtime implementation?

@vzakhari
Copy link
Contributor Author

I am not aware of any current failures. If we change the behavior, some gfortran tests will fail.

@klausler
Copy link
Contributor

I am not aware of any current failures. If we change the behavior, some gfortran tests will fail.

I'm confused. Those tests are running today with the f18 runtime implementation of maximal, minval, maxloc, and minloc, yes? Are you changing the f18 runtime semantics for those routines? I thought you're modifying just the open-coded implementation in fir+hlfir of these functions.

@vzakhari
Copy link
Contributor Author

Yes, those tests run successfuly with F18 runtime and with HLFIR inlining.

Jean suggested changing/simplifying HLFIR inlining. In order to do that and keep getting consitent results for -O0 and -O2, I believe we need to change F18 runtime as well. These changes will cause gfortran tests to fail.

@vzakhari
Copy link
Contributor Author

@vzakhari
Copy link
Contributor Author

Peter made a good point that we should return a NaN for all-NaNs case. I think for MINLOC/MAXLOC this means that we return the position of the first unmasked NaN.

So I do not see how I can change my PR to make the code faster, in general. For nnan case, though, I think we can always select +/-LARGEST, and we do not need to read the init value from the array. That should provide cleaner/faster code for -ffast-math users.

@klausler
Copy link
Contributor

return the position of the first unmasked NaN.

Or last, of course, depending on BACK=.

Copy link
Contributor

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into our requirement rational here. LGTM then.

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for sticking with this through the difficult issues in the RFC.

@vzakhari
Copy link
Contributor Author

Thank you for the reviews and the discussions! Can you please review additional change to simplify the generated code for 'nnan' case?

@vzakhari vzakhari requested review from tblah and jeanPerier April 29, 2025 17:18
Copy link
Contributor

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Slava!

@vzakhari vzakhari merged commit 7dad8b9 into llvm:main Apr 30, 2025
11 checks passed
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…#136790)

Instead of using loop-carried IsFirst predicate, we can fetch
the initial reduction values for MIN/MAX LOC/VAL reductions
from the array itself. This results in a little bit cleaner
loop nests, especially, generated for total reductions.
Otherwise, LLVM is able to peel the first iteration of the innermost
loop, but the surroudings of the peeled code are executed
multiple times withing the outer loop(s).

This patch does the manual peeling, which only works for
non-masked reductions where the input array is not empty.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…#136790)

Instead of using loop-carried IsFirst predicate, we can fetch
the initial reduction values for MIN/MAX LOC/VAL reductions
from the array itself. This results in a little bit cleaner
loop nests, especially, generated for total reductions.
Otherwise, LLVM is able to peel the first iteration of the innermost
loop, but the surroudings of the peeled code are executed
multiple times withing the outer loop(s).

This patch does the manual peeling, which only works for
non-masked reductions where the input array is not empty.
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…#136790)

Instead of using loop-carried IsFirst predicate, we can fetch
the initial reduction values for MIN/MAX LOC/VAL reductions
from the array itself. This results in a little bit cleaner
loop nests, especially, generated for total reductions.
Otherwise, LLVM is able to peel the first iteration of the innermost
loop, but the surroudings of the peeled code are executed
multiple times withing the outer loop(s).

This patch does the manual peeling, which only works for
non-masked reductions where the input array is not empty.
GeorgeARM pushed a commit to GeorgeARM/llvm-project that referenced this pull request May 7, 2025
…#136790)

Instead of using loop-carried IsFirst predicate, we can fetch
the initial reduction values for MIN/MAX LOC/VAL reductions
from the array itself. This results in a little bit cleaner
loop nests, especially, generated for total reductions.
Otherwise, LLVM is able to peel the first iteration of the innermost
loop, but the surroudings of the peeled code are executed
multiple times withing the outer loop(s).

This patch does the manual peeling, which only works for
non-masked reductions where the input array is not empty.
Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request May 9, 2025
…#136790)

Instead of using loop-carried IsFirst predicate, we can fetch
the initial reduction values for MIN/MAX LOC/VAL reductions
from the array itself. This results in a little bit cleaner
loop nests, especially, generated for total reductions.
Otherwise, LLVM is able to peel the first iteration of the innermost
loop, but the surroudings of the peeled code are executed
multiple times withing the outer loop(s).

This patch does the manual peeling, which only works for
non-masked reductions where the input array is not empty.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants