-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[flang] Fetch the initial reduction value from the input array. #136790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Instead of using loop-carried IsFirst predicate, we can fetch the initial reduction values for MIN/MAX LOC/VAL reductions from the array itself. This results in a little bit cleaner loop nests, especially, generated for total reductions. Otherwise, LLVM is able to peel the first iteration of the innermost loop, but the surroudings of the peeled code are executed multiple times withing the outer loop(s). This patch does the manual peeling, which only works for non-masked reductions where the input array is not empty.
@llvm/pr-subscribers-flang-fir-hlfir Author: Slava Zakharin (vzakhari) ChangesInstead of using loop-carried IsFirst predicate, we can fetch This patch does the manual peeling, which only works for Patch is 55.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/136790.diff 5 Files Affected:
diff --git a/flang/lib/Optimizer/HLFIR/Transforms/SimplifyHLFIRIntrinsics.cpp b/flang/lib/Optimizer/HLFIR/Transforms/SimplifyHLFIRIntrinsics.cpp
index e9d820adbd22b..54746a45b1aec 100644
--- a/flang/lib/Optimizer/HLFIR/Transforms/SimplifyHLFIRIntrinsics.cpp
+++ b/flang/lib/Optimizer/HLFIR/Transforms/SimplifyHLFIRIntrinsics.cpp
@@ -232,7 +232,17 @@ class ReductionAsElementalConverter {
/// by the reduction loop. In general, there is a single
/// loop-carried reduction value (e.g. for SUM), but, for example,
/// MAXLOC/MINLOC implementation uses multiple reductions.
- virtual llvm::SmallVector<mlir::Value> genReductionInitValues() = 0;
+ /// \p oneBasedIndices contains any array indices predefined
+ /// before the reduction loop, i.e. it is empty for total
+ /// reductions, and contains the one-based indices of the wrapping
+ /// hlfir.elemental.
+ /// \p extents are the pre-computed extents of the input array.
+ /// For total reductions, \p extents holds extents of all dimensions.
+ /// For partial reductions, \p extents holds a single extent
+ /// of the DIM dimension.
+ virtual llvm::SmallVector<mlir::Value>
+ genReductionInitValues(mlir::ValueRange oneBasedIndices,
+ const llvm::SmallVectorImpl<mlir::Value> &extents) = 0;
/// Perform reduction(s) update given a single input array's element
/// identified by \p array and \p oneBasedIndices coordinates.
@@ -396,6 +406,54 @@ genMinMaxComparison(mlir::Location loc, fir::FirOpBuilder &builder,
llvm_unreachable("unsupported type");
}
+// Generate a predicate value indicating that an array with the given
+// extents is not empty.
+static mlir::Value
+genIsNotEmptyArrayExtents(mlir::Location loc, fir::FirOpBuilder &builder,
+ const llvm::SmallVectorImpl<mlir::Value> &extents) {
+ mlir::Value isNotEmpty = builder.createBool(loc, true);
+ for (auto extent : extents) {
+ mlir::Value zero =
+ fir::factory::createZeroValue(builder, loc, extent.getType());
+ mlir::Value cmp = builder.create<mlir::arith::CmpIOp>(
+ loc, mlir::arith::CmpIPredicate::ne, extent, zero);
+ isNotEmpty = builder.create<mlir::arith::AndIOp>(loc, isNotEmpty, cmp);
+ }
+ return isNotEmpty;
+}
+
+// Helper method for MIN/MAX LOC/VAL reductions.
+// It returns a vector of indices such that they address
+// the first element of an array (in case of total reduction)
+// or its section (in case of partial reduction).
+//
+// If case of total reduction oneBasedIndices must be empty,
+// otherwise, they contain the one based indices of the wrapping
+// hlfir.elemental.
+// Basically, the method adds the necessary number of constant-one
+// indices into oneBasedIndices.
+static llvm::SmallVector<mlir::Value> genFirstElementIndicesForReduction(
+ mlir::Location loc, fir::FirOpBuilder &builder, bool isTotalReduction,
+ mlir::FailureOr<int64_t> dim, unsigned rank,
+ mlir::ValueRange oneBasedIndices) {
+ llvm::SmallVector<mlir::Value> indices{oneBasedIndices};
+ mlir::Value one =
+ builder.createIntegerConstant(loc, builder.getIndexType(), 1);
+ if (isTotalReduction) {
+ assert(oneBasedIndices.size() == 0 &&
+ "wrong number of indices for total reduction");
+ // Set indices to all-ones.
+ indices.append(rank, one);
+ } else {
+ assert(oneBasedIndices.size() == rank - 1 &&
+ "there must be RANK-1 indices for partial reduction");
+ assert(mlir::succeeded(dim) && "partial reduction with invalid DIM");
+ // Insert constant-one index at DIM dimension.
+ indices.insert(indices.begin() + *dim - 1, one);
+ }
+ return indices;
+}
+
/// Implementation of ReductionAsElementalConverter interface
/// for MAXLOC/MINLOC.
template <typename T>
@@ -410,6 +468,9 @@ class MinMaxlocAsElementalConverter : public ReductionAsElementalConverter {
// * 1 reduction value holding the current MIN/MAX.
// * 1 boolean indicating whether it is the first time
// the mask is true.
+ //
+ // If precomputeFirst() returns true, then the boolean loop-carried
+ // value is not used.
static constexpr unsigned maxNumReductions = Fortran::common::maxRank + 2;
static constexpr bool isMax = std::is_same_v<T, hlfir::MaxlocOp>;
using Base = ReductionAsElementalConverter;
@@ -444,7 +505,9 @@ class MinMaxlocAsElementalConverter : public ReductionAsElementalConverter {
return getResultRank() == 0 || !getDim();
}
- virtual llvm::SmallVector<mlir::Value> genReductionInitValues() final;
+ virtual llvm::SmallVector<mlir::Value> genReductionInitValues(
+ mlir::ValueRange oneBasedIndices,
+ const llvm::SmallVectorImpl<mlir::Value> &extents) final;
virtual llvm::SmallVector<mlir::Value>
reduceOneElement(const llvm::SmallVectorImpl<mlir::Value> ¤tValue,
hlfir::Entity array, mlir::ValueRange oneBasedIndices) final;
@@ -460,8 +523,12 @@ class MinMaxlocAsElementalConverter : public ReductionAsElementalConverter {
void
checkReductions(const llvm::SmallVectorImpl<mlir::Value> &reductions) const {
- assert(reductions.size() == getNumCoors() + 2 &&
- "invalid number of reductions for MINLOC/MAXLOC");
+ if (precomputeFirst())
+ assert(reductions.size() == getNumCoors() + 1 &&
+ "invalid number of reductions for MINLOC/MAXLOC");
+ else
+ assert(reductions.size() == getNumCoors() + 2 &&
+ "invalid number of reductions for MINLOC/MAXLOC");
}
mlir::Value
@@ -473,13 +540,52 @@ class MinMaxlocAsElementalConverter : public ReductionAsElementalConverter {
mlir::Value
getIsFirst(const llvm::SmallVectorImpl<mlir::Value> &reductions) const {
checkReductions(reductions);
+ assert(!precomputeFirst() && "IsFirst predicate must not be used");
return reductions[getNumCoors() + 1];
}
+
+ // Return true iff the reductions can be initialized
+ // by reading the first element of the array (or its section).
+ // If it returns false, then we use an auxiliary boolean
+ // to identify the very first reduction update.
+ bool precomputeFirst() const { return !getMask(); }
};
template <typename T>
llvm::SmallVector<mlir::Value>
-MinMaxlocAsElementalConverter<T>::genReductionInitValues() {
+MinMaxlocAsElementalConverter<T>::genReductionInitValues(
+ mlir::ValueRange oneBasedIndices,
+ const llvm::SmallVectorImpl<mlir::Value> &extents) {
+ fir::IfOp ifOp;
+ if (precomputeFirst()) {
+ // Check if we can load the value of the first element in the array
+ // or its section (for partial reduction).
+ assert(extents.size() == getNumCoors() &&
+ "wrong number of extents for MINLOC/MAXLOC reduction");
+ mlir::Value isNotEmpty = genIsNotEmptyArrayExtents(loc, builder, extents);
+
+ llvm::SmallVector<mlir::Value> indices = genFirstElementIndicesForReduction(
+ loc, builder, isTotalReduction(), getConstDim(), getSourceRank(),
+ oneBasedIndices);
+
+ llvm::SmallVector<mlir::Type> ifTypes(getNumCoors(),
+ getResultElementType());
+ ifTypes.push_back(getSourceElementType());
+ ifOp = builder.create<fir::IfOp>(loc, ifTypes, isNotEmpty,
+ /*withElseRegion=*/true);
+ builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
+ mlir::Value one =
+ builder.createIntegerConstant(loc, getResultElementType(), 1);
+ llvm::SmallVector<mlir::Value> results(getNumCoors(), one);
+ mlir::Value minMaxFirst =
+ hlfir::loadElementAt(loc, builder, hlfir::Entity{getSource()}, indices);
+ results.push_back(minMaxFirst);
+ builder.create<fir::ResultOp>(loc, results);
+
+ // In the 'else' block use default init values.
+ builder.setInsertionPointToStart(&ifOp.getElseRegion().front());
+ }
+
// Initial value for the coordinate(s) is zero.
mlir::Value zeroCoor =
fir::factory::createZeroValue(builder, loc, getResultElementType());
@@ -490,11 +596,17 @@ MinMaxlocAsElementalConverter<T>::genReductionInitValues() {
genMinMaxInitValue<isMax>(loc, builder, getSourceElementType());
result.push_back(minMaxInit);
- // Initial value for isFirst predicate. It is switched to false,
- // when the reduction update dynamically happens inside the reduction
- // loop.
- mlir::Value trueVal = builder.createBool(loc, true);
- result.push_back(trueVal);
+ if (ifOp) {
+ builder.create<fir::ResultOp>(loc, result);
+ builder.setInsertionPointAfter(ifOp);
+ result = ifOp.getResults();
+ } else {
+ // Initial value for isFirst predicate. It is switched to false,
+ // when the reduction update dynamically happens inside the reduction
+ // loop.
+ mlir::Value trueVal = builder.createBool(loc, true);
+ result.push_back(trueVal);
+ }
return result;
}
@@ -509,9 +621,12 @@ MinMaxlocAsElementalConverter<T>::reduceOneElement(
hlfir::loadElementAt(loc, builder, array, oneBasedIndices);
mlir::Value cmp = genMinMaxComparison<isMax>(loc, builder, elementValue,
getCurrentMinMax(currentValue));
- // If isFirst is true, then do the reduction update regardless
- // of the FP comparison.
- cmp = builder.create<mlir::arith::OrIOp>(loc, cmp, getIsFirst(currentValue));
+ if (!precomputeFirst()) {
+ // If isFirst is true, then do the reduction update regardless
+ // of the FP comparison.
+ cmp =
+ builder.create<mlir::arith::OrIOp>(loc, cmp, getIsFirst(currentValue));
+ }
llvm::SmallVector<mlir::Value> newIndices;
int64_t dim = 1;
@@ -537,8 +652,10 @@ MinMaxlocAsElementalConverter<T>::reduceOneElement(
loc, cmp, elementValue, getCurrentMinMax(currentValue));
newIndices.push_back(newMinMax);
- mlir::Value newIsFirst = builder.createBool(loc, false);
- newIndices.push_back(newIsFirst);
+ if (!precomputeFirst()) {
+ mlir::Value newIsFirst = builder.createBool(loc, false);
+ newIndices.push_back(newIsFirst);
+ }
assert(currentValue.size() == newIndices.size() &&
"invalid number of updated reductions");
@@ -629,7 +746,8 @@ class MinMaxvalAsElementalConverter
//
// The boolean flag is used to replace the initial value
// with the first input element even if it is NaN.
- static constexpr unsigned numReductions = 2;
+ // If precomputeFirst() returns true, then the boolean loop-carried
+ // value is not used.
static constexpr bool isMax = std::is_same_v<T, hlfir::MaxvalOp>;
using Base = NumericReductionAsElementalConverterBase<T>;
@@ -646,19 +764,9 @@ class MinMaxvalAsElementalConverter
return mlir::success();
}
- virtual llvm::SmallVector<mlir::Value> genReductionInitValues() final {
- llvm::SmallVector<mlir::Value> result;
- fir::FirOpBuilder &builder = this->builder;
- mlir::Location loc = this->loc;
- mlir::Value init =
- genMinMaxInitValue<isMax>(loc, builder, this->getResultElementType());
- result.push_back(init);
- // Initial value for isFirst predicate. It is switched to false,
- // when the reduction update dynamically happens inside the reduction
- // loop.
- result.push_back(builder.createBool(loc, true));
- return result;
- }
+ virtual llvm::SmallVector<mlir::Value> genReductionInitValues(
+ mlir::ValueRange oneBasedIndices,
+ const llvm::SmallVectorImpl<mlir::Value> &extents) final;
virtual llvm::SmallVector<mlir::Value>
reduceOneElement(const llvm::SmallVectorImpl<mlir::Value> ¤tValue,
@@ -673,12 +781,14 @@ class MinMaxvalAsElementalConverter
mlir::Value currentMinMax = getCurrentMinMax(currentValue);
mlir::Value cmp =
genMinMaxComparison<isMax>(loc, builder, elementValue, currentMinMax);
- cmp =
- builder.create<mlir::arith::OrIOp>(loc, cmp, getIsFirst(currentValue));
+ if (!precomputeFirst())
+ cmp = builder.create<mlir::arith::OrIOp>(loc, cmp,
+ getIsFirst(currentValue));
mlir::Value newMinMax = builder.create<mlir::arith::SelectOp>(
loc, cmp, elementValue, currentMinMax);
result.push_back(newMinMax);
- result.push_back(builder.createBool(loc, false));
+ if (!precomputeFirst())
+ result.push_back(builder.createBool(loc, false));
return result;
}
@@ -690,7 +800,7 @@ class MinMaxvalAsElementalConverter
void
checkReductions(const llvm::SmallVectorImpl<mlir::Value> &reductions) const {
- assert(reductions.size() == numReductions &&
+ assert(reductions.size() == getNumReductions() &&
"invalid number of reductions for MINVAL/MAXVAL");
}
@@ -703,10 +813,70 @@ class MinMaxvalAsElementalConverter
mlir::Value
getIsFirst(const llvm::SmallVectorImpl<mlir::Value> &reductions) const {
this->checkReductions(reductions);
+ assert(!precomputeFirst() && "IsFirst predicate must not be used");
return reductions[1];
}
+
+ // Return true iff the reductions can be initialized
+ // by reading the first element of the array (or its section).
+ // If it returns false, then we use an auxiliary boolean
+ // to identify the very first reduction update.
+ bool precomputeFirst() const { return !this->getMask(); }
+
+ std::size_t getNumReductions() const { return precomputeFirst() ? 1 : 2; }
};
+template <typename T>
+llvm::SmallVector<mlir::Value>
+MinMaxvalAsElementalConverter<T>::genReductionInitValues(
+ mlir::ValueRange oneBasedIndices,
+ const llvm::SmallVectorImpl<mlir::Value> &extents) {
+ llvm::SmallVector<mlir::Value> result;
+ fir::FirOpBuilder &builder = this->builder;
+ mlir::Location loc = this->loc;
+
+ fir::IfOp ifOp;
+ if (precomputeFirst()) {
+ // Check if we can load the value of the first element in the array
+ // or its section (for partial reduction).
+ assert(extents.size() == this->isTotalReduction()
+ ? this->getSourceRank()
+ : 1u && "wrong number of extents for MINVAL/MAXVAL reduction");
+ mlir::Value isNotEmpty = genIsNotEmptyArrayExtents(loc, builder, extents);
+ llvm::SmallVector<mlir::Value> indices = genFirstElementIndicesForReduction(
+ loc, builder, this->isTotalReduction(), this->getConstDim(),
+ this->getSourceRank(), oneBasedIndices);
+
+ ifOp =
+ builder.create<fir::IfOp>(loc, this->getResultElementType(), isNotEmpty,
+ /*withElseRegion=*/true);
+ builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
+ mlir::Value minMaxFirst = hlfir::loadElementAt(
+ loc, builder, hlfir::Entity{this->getSource()}, indices);
+ builder.create<fir::ResultOp>(loc, minMaxFirst);
+
+ // In the 'else' block use default init values.
+ builder.setInsertionPointToStart(&ifOp.getElseRegion().front());
+ }
+
+ mlir::Value init =
+ genMinMaxInitValue<isMax>(loc, builder, this->getResultElementType());
+ result.push_back(init);
+
+ if (ifOp) {
+ builder.create<fir::ResultOp>(loc, result);
+ builder.setInsertionPointAfter(ifOp);
+ result = ifOp.getResults();
+ } else {
+ // Initial value for isFirst predicate. It is switched to false,
+ // when the reduction update dynamically happens inside the reduction
+ // loop.
+ result.push_back(builder.createBool(loc, true));
+ }
+
+ return result;
+}
+
/// Reduction converter for SUM.
class SumAsElementalConverter
: public NumericReductionAsElementalConverterBase<hlfir::SumOp> {
@@ -717,7 +887,10 @@ class SumAsElementalConverter
: Base{op, rewriter} {}
private:
- virtual llvm::SmallVector<mlir::Value> genReductionInitValues() final {
+ virtual llvm::SmallVector<mlir::Value> genReductionInitValues(
+ [[maybe_unused]] mlir::ValueRange oneBasedIndices,
+ [[maybe_unused]] const llvm::SmallVectorImpl<mlir::Value> &extents)
+ final {
return {
fir::factory::createZeroValue(builder, loc, getResultElementType())};
}
@@ -781,7 +954,10 @@ class AllAnyAsElementalConverter
: Base{op, rewriter} {}
private:
- virtual llvm::SmallVector<mlir::Value> genReductionInitValues() final {
+ virtual llvm::SmallVector<mlir::Value> genReductionInitValues(
+ [[maybe_unused]] mlir::ValueRange oneBasedIndices,
+ [[maybe_unused]] const llvm::SmallVectorImpl<mlir::Value> &extents)
+ final {
return {this->builder.createBool(this->loc, isAll ? true : false)};
}
virtual llvm::SmallVector<mlir::Value>
@@ -819,7 +995,10 @@ class CountAsElementalConverter
: Base{op, rewriter} {}
private:
- virtual llvm::SmallVector<mlir::Value> genReductionInitValues() final {
+ virtual llvm::SmallVector<mlir::Value> genReductionInitValues(
+ [[maybe_unused]] mlir::ValueRange oneBasedIndices,
+ [[maybe_unused]] const llvm::SmallVectorImpl<mlir::Value> &extents)
+ final {
return {
fir::factory::createZeroValue(builder, loc, getResultElementType())};
}
@@ -881,10 +1060,6 @@ mlir::LogicalResult ReductionAsElementalConverter::convert() {
// Loop over all indices in the DIM dimension, and reduce all values.
// If DIM is not present, do total reduction.
- // Initial value for the reduction.
- llvm::SmallVector<mlir::Value, 1> reductionInitValues =
- genReductionInitValues();
-
llvm::SmallVector<mlir::Value> extents;
if (isTotalReduce)
extents = arrayExtents;
@@ -892,6 +1067,10 @@ mlir::LogicalResult ReductionAsElementalConverter::convert() {
extents.push_back(
builder.createConvert(loc, builder.getIndexType(), dimExtent));
+ // Initial value for the reduction.
+ llvm::SmallVector<mlir::Value, 1> reductionInitValues =
+ genReductionInitValues(inputIndices, extents);
+
auto genBody = [&](mlir::Location loc, fir::FirOpBuilder &builder,
mlir::ValueRange oneBasedIndices,
mlir::ValueRange reductionArgs)
diff --git a/flang/test/HLFIR/simplify-hlfir-intrinsics-maxloc.fir b/flang/test/HLFIR/simplify-hlfir-intrinsics-maxloc.fir
index 4e9f5d0ebb08a..f917f99c2b9ab 100644
--- a/flang/test/HLFIR/simplify-hlfir-intrinsics-maxloc.fir
+++ b/flang/test/HLFIR/simplify-hlfir-intrinsics-maxloc.fir
@@ -294,6 +294,129 @@ func.func @test_partial_var(%input: !fir.box<!fir.array<?x?x?xf32>>, %mask: !fir
// CHECK: return %[[VAL_14]] : !hlfir.expr<?x?xi32>
// CHECK: }
+func.func @test_total_expr_nomask(%input: !hlfir.expr<?x?x?xf32>) -> !hlfir.expr<3xi32> {
+ %0 = hlfir.maxloc %input {fastmath = #arith.fastmath<reassoc>} : (!hlfir.expr<?x?x?xf32>) -> !hlfir.expr<3xi32>
+ return %0 : !hlfir.expr<3xi32>
+}
+// CHECK-LABEL: func.func @test_total_expr_nomask(
+// CHECK-SAME: %[[VAL_0:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: !hlfir.expr<?x?x?xf32>) -> !hlfir.expr<3xi32> {
+// CHECK: %[[VAL_1:.*]] = arith.constant false
+// CHECK: %[[VAL_2:.*]] = arith.constant 3 : index
+// CHECK: %[[VAL_3:.*]] = arith.constant 2 : index
+// CHECK: %[[VAL_4:.*]] = arith.constant -3.40282347E+38 : f32
+// CHECK: %[[VAL_5:.*]] = arith.constant 0 : i32
+// CHECK: %[[VAL_6:.*]] = arith.constant 1 : i32
+// CHECK: %[[VAL_7:.*]] = arith.constant 1 : index
+// CHECK: %[[VAL_8:.*]] = arith.constant 0 : index
+// CHECK: %[[VAL_9:.*]] = fir.alloca !fir.array<3xi32>
+// CHECK: %[[VAL_10:.*]] = hlfir.shape_of %[[VAL_0]] : (!hlfir.expr<?x?x?xf32>) -> !fir.shape<3>
+// CHECK: %[[VAL_11:.*]] = hlfir.get_extent %[[VAL_10]] {dim = 0 : index} : (!fir.shape<3>) -> index
+// CHECK: %[[VAL_12:.*]] = hlfir.get_extent %[[VAL_10]] {dim = 1 : index} : (!fir.shape<3>) -> index
+// CHECK: %[[VAL_13:.*]] = hlfir.get_extent %[[VAL_10]] {dim = 2 : index} : (!fir.shape<3>) -> index
+// CHECK: %[[VAL_14:.*]] = arith.cmpi ne, %[[VAL_11]], %[[VAL_8]] : index
+// CHECK: %[[VAL_15:.*]] = arith.cmpi ne, %[[VAL_12]], %[[VAL_8]] : index
+// CHECK: %[[VAL_16:.*]] = arith.andi %[[VAL_14]], %[[VAL_15]] : i1
+// CHECK: %[[VAL_17:.*]] = arith.cmpi ne, %[[VAL_13]], %[[VAL_8]] : index
+// CHECK: %[[VAL_18:.*]] = arith.andi %[[VAL_16]], %[[VAL_17]] : i1
+// CHECK: %[[VAL_19:.*]]:4 = fir.if %[[VAL_18]] -> (i32, i32, i32, f32) {
+// CHECK: %[[VAL_20:.*]] = hlfir.apply %[[VAL_0]], %[[VAL_7]], %[[VAL_7]], %[[VAL_7]] : (!...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naïve question: why is a first element needed for numerical min/max val/loc?
Isn't there a comparison + iteration order that could be used so that the type representation min/max can be used as a starting point instead of some value from the input array?
For instance, assuming MAXLOC with BACK being absent, we could start with zero coordinates and a temporary max being the numerical min for that type, then the array could be iterated backwards (maybe that is a bad idea from a memory access perspective though) always replacing the result when something bigger or equal is found. Then there is no need to deal with the array size and mask.
Again, this is a naïve question, it may be a terrible idea from a performance perspective and just be incorrect with some corner cases (nans probably? although I think fcmp uge
would work in my example to select the first NaN in array order as being the result location (assuming that is what we want)).
Otherwise, the implementation of your solution looks good to me.
Thank you for the review, Jean! As you pointed out, I think we have to do it this way to properly handle NaNs. I believe your approach with
The actual result should be 1.0. |
I see, thanks, I expected NaN should be the result here, but that is indeed not what other compilers are doing (I am not quite sure what is the requirement with regards to NaN in the Fortran spec though...)... That indeed makes it impossible to find a comparison function+comparison order here because NaNs should be selected if this is this is the first element (to cover the all NaNs case), but not otherwise... BTW, I see that IFX/Ifort and nvfortran/classic flang are a bit different in behavior and return Are we sure about the requirement that all NaNs should be NaN and not -Inf for MAXVAL?
|
We used to use +/-Inf in the @klausler can you please comment on Jean's question above? The current HLFIR inlining follows the Fortran runtime implementation, but should we change both? I did not find any handling of NaNs in MINMAXLOCVAL specification in the standard, but it might be specified in some other section. |
|
I do not see much consistency between the compilers for Jean's maxval example. Here is the results for x86:
Results for an empty array:
Array of all NaN with 1.0 in the middle: all compilers produce 1.0. So it looks like Flang's implementation adds to the diversity by joining gfortran. If I use +/-Inf init value, Flang will work as nvfortran for all-NaNs and empty arrays. Nvfortran empty arrays handling is not per standard. I can make Flang return +/-largest, so the results will be: This implementation will only completely match ifx. On a positive note, it will probably be faster, because there will be just one unodered comparison between the current reduction value and the new element. Again, there is not much consistency here, so I am okay with either implementation. Please let me know what you think. |
Another options is to stick to xlf/nagfor: |
It was because of the lack of clear precedent that we defined the NaN treatment in a way that seemed the most sensible in a mathematical sense. NaNs are not ordered, and fail any comparison, so ignoring them for minval/maxval &c. is defensible. It's also the behavior specified for IEEE-754 floating-point maxNum/minNum operations (and their Fortran bindings). |
Thanks! So it looks like we can switch to using +/-largest init and ignoring NaNs in both Fortran runtime and HLFIR/FIR inlining. I will disable those gfortran tests that fail in llvm-test-suite. |
Are there tests that are failing today with the runtime implementation? |
I am not aware of any current failures. If we change the behavior, some gfortran tests will fail. |
I'm confused. Those tests are running today with the f18 runtime implementation of maximal, minval, maxloc, and minloc, yes? Are you changing the f18 runtime semantics for those routines? I thought you're modifying just the open-coded implementation in fir+hlfir of these functions. |
Yes, those tests run successfuly with F18 runtime and with HLFIR inlining. Jean suggested changing/simplifying HLFIR inlining. In order to do that and keep getting consitent results for |
Peter made a good point that we should return a NaN for all-NaNs case. I think for MINLOC/MAXLOC this means that we return the position of the first unmasked NaN. So I do not see how I can change my PR to make the code faster, in general. For |
Or last, of course, depending on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into our requirement rational here. LGTM then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for sticking with this through the difficult issues in the RFC.
Thank you for the reviews and the discussions! Can you please review additional change to simplify the generated code for 'nnan' case? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Slava!
…#136790) Instead of using loop-carried IsFirst predicate, we can fetch the initial reduction values for MIN/MAX LOC/VAL reductions from the array itself. This results in a little bit cleaner loop nests, especially, generated for total reductions. Otherwise, LLVM is able to peel the first iteration of the innermost loop, but the surroudings of the peeled code are executed multiple times withing the outer loop(s). This patch does the manual peeling, which only works for non-masked reductions where the input array is not empty.
…#136790) Instead of using loop-carried IsFirst predicate, we can fetch the initial reduction values for MIN/MAX LOC/VAL reductions from the array itself. This results in a little bit cleaner loop nests, especially, generated for total reductions. Otherwise, LLVM is able to peel the first iteration of the innermost loop, but the surroudings of the peeled code are executed multiple times withing the outer loop(s). This patch does the manual peeling, which only works for non-masked reductions where the input array is not empty.
…#136790) Instead of using loop-carried IsFirst predicate, we can fetch the initial reduction values for MIN/MAX LOC/VAL reductions from the array itself. This results in a little bit cleaner loop nests, especially, generated for total reductions. Otherwise, LLVM is able to peel the first iteration of the innermost loop, but the surroudings of the peeled code are executed multiple times withing the outer loop(s). This patch does the manual peeling, which only works for non-masked reductions where the input array is not empty.
…#136790) Instead of using loop-carried IsFirst predicate, we can fetch the initial reduction values for MIN/MAX LOC/VAL reductions from the array itself. This results in a little bit cleaner loop nests, especially, generated for total reductions. Otherwise, LLVM is able to peel the first iteration of the innermost loop, but the surroudings of the peeled code are executed multiple times withing the outer loop(s). This patch does the manual peeling, which only works for non-masked reductions where the input array is not empty.
…#136790) Instead of using loop-carried IsFirst predicate, we can fetch the initial reduction values for MIN/MAX LOC/VAL reductions from the array itself. This results in a little bit cleaner loop nests, especially, generated for total reductions. Otherwise, LLVM is able to peel the first iteration of the innermost loop, but the surroudings of the peeled code are executed multiple times withing the outer loop(s). This patch does the manual peeling, which only works for non-masked reductions where the input array is not empty.
Instead of using loop-carried IsFirst predicate, we can fetch
the initial reduction values for MIN/MAX LOC/VAL reductions
from the array itself. This results in a little bit cleaner
loop nests, especially, generated for total reductions.
Otherwise, LLVM is able to peel the first iteration of the innermost
loop, but the surroudings of the peeled code are executed
multiple times withing the outer loop(s).
This patch does the manual peeling, which only works for
non-masked reductions where the input array is not empty.