-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[flang][OpenMP][HLFIR] Support vector subscripted array sections for DEPEND #133892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The bot doesn't seem to run on stacked PRs: @llvm/pr-subscribers-flang-openmp @llvm/pr-subscribers-flang-fir-hlfir |
Ping for review. @vzakhari are you happy with the change to |
…133891) The statement context is used for lowering clauses for openmp operations using generalised helpers from flang lowering. The statement context stores closures which generate code for cleaning up temporary values generated by the lowering helper. These closures are run when the statement construct is destroyed. Keeping the statement context local to the clause or operation being lowered without any special handling was not correct because any cleanup code would be generated at the insertion point when that statement context went out of scope (which would in general be inside of the newly created container operation). It would be better to generate the cleanup code after the newly created operation (clause processing is synchronous even for deferred tasks). Currently supported clauses are mostly populated with simple scalar values that require no cleanup. Even the simple array sections added by #132994 needed no cleanup because indexing the right values of the array did not create any temporaries. Supporting array sections with vector indexing will generate hlfir.destroy operations for cleanup. This patch fixes where those will be created. Those hlfir.destroy operations don't generate any FIR (or LLVM) code, but the issue still exists theoretically. I wasn't able to find any clauses which have any cleanup to use to test this PR. It is probably NFC for the current lowering. This will be tested in [the PR adding vector subscripting of array sections](#133892).
…DEPEND The OpenMP runtime needs the base address of the array section to identify the dependency. If we just put the vector subscript through the usual HLFIR expression lowering, that would generate a new contiguous array representing the values of the elements in the array which was sectioned. We cannot use addresses from this array because these addresses would not match dependencies on the original array. For example ``` integer :: array(1024) integer :: indices(2) indices(1) = 1 indices(2) = 100 !$omp task depend(out: array(1:512)) !$omp end task !$omp task depend(in: array(indices)) !$omp end task ``` This requires taking the lowering path previously only used for ordered assignments to get the address of the elements in the original array which were indexed. This is done using `hlfir.elemental_addr`. e.g. ``` array(indices) = 2 ``` `hlfir.elemental_addr` is awkward to use because it (by design) doesn't return something like `!hlfir.expr<>` (like `hlfir.elemental`) and so it can't have a generic lowering: each place it is used has to carefully inline the contents of the operation and extract the needed address. For this reason, `hlfir.elemental_addr` was not previously allowed outside of these ordered assignments. In this commit I relax that restriction so that I can use `hlfir.elemental_addr` to lower the OpenMP DEPEND clause. The disadvantage of doing so is that there is no generic lowering supporting uses of `hlfir.elemental_addr` in arbitrary contexts. I think it is okay to have this gap in the dialect verifier because HLFIR is not a dialect shared with users outside of flang. One alternative solution would have been to provide my own more limited re-implementation of `HlfirDesignatorBuilder` which skipped `hlfir::elemental_addr`, instead inlining its body directly at the current insertion point applying indices only for the first element. This would have been difficult to maintain because designation in Fortran is complex.
9cc7354
to
e748dc5
Compare
@llvm/pr-subscribers-flang-openmp @llvm/pr-subscribers-flang-fir-hlfir Author: Tom Eccles (tblah) ChangesThe OpenMP runtime needs the base address of the array section to identify the dependency. If we just put the vector subscript through the usual HLFIR expression lowering, that would generate a new contiguous array representing the values of the elements in the array which was sectioned. We cannot use addresses from this array because these addresses would not match dependencies on the original array. For example
This requires taking the lowering path previously only used for ordered assignments to get the address of the elements in the original array which were indexed. This is done using
For this reason, One alternative solution would have been to provide my own more limited re-implementation of Full diff: https://github.com/llvm/llvm-project/pull/133892.diff 4 Files Affected:
diff --git a/flang/include/flang/Optimizer/HLFIR/HLFIROps.td b/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
index f69930d5b53b3..cf2c78d2a12b2 100644
--- a/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
+++ b/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
@@ -1419,28 +1419,26 @@ def hlfir_YieldOp : hlfir_Op<"yield", [Terminator, ParentOneOf<["RegionAssignOp"
let assemblyFormat = "$entity attr-dict `:` type($entity) custom<YieldOpCleanup>($cleanup)";
}
-def hlfir_ElementalAddrOp : hlfir_Op<"elemental_addr", [Terminator, HasParent<"RegionAssignOp">,
+def hlfir_ElementalAddrOp : hlfir_Op<"elemental_addr", [Terminator,
RecursiveMemoryEffects, RecursivelySpeculatable, hlfir_ElementalOpInterface,
AttrSizedOperandSegments]> {
let summary = "Yield the address of a vector subscripted variable inside an hlfir.region_assign";
let description = [{
- Special terminator node for the left-hand side region of an hlfir.region_assign
- to a vector subscripted entity.
+ Used to get the address of elements in an array which is being subscripted
+ by a vector.
It represents how the address of an element of such entity is computed given
one based indices.
It is very similar to hlfir.elemental, except that it does not produce an SSA
value because there is no hlfir type to describe a vector subscripted entity
- (the codegen of such type would be problematic). Hence, it is tightly linked
- to an hlfir.region_assign by its terminator property.
+ (the codegen of such type would be problematic).
An optional cleanup region may be provided if any of the subscript expressions
of the designator require a cleanup.
This allows documenting cleanups that cannot be generated after the vector
subscripted designator usage (that has not been materizaled yet). The cleanups
- will be evaluated after the assignment once the related
- hlfir.region_assign is lowered.
+ should be evaluated after the usage of the elemental addr.
Example: "X(VECTOR) = Y"
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
index 44796994b244c..9432fe06b00b7 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
@@ -889,13 +889,55 @@ bool ClauseProcessor::processDepend(lower::SymMap &symMap,
} else if (evaluate::IsArrayElement(*object.ref())) {
// Array Section
SomeExpr expr = *object.ref();
- if (isVectorSubscript(expr))
- TODO(converter.getCurrentLocation(),
- "Vector subscripted array section for task dependency");
- hlfir::EntityWithAttributes entity = convertExprToHLFIR(
- converter.getCurrentLocation(), converter, expr, symMap, stmtCtx);
- dependVar = entity.getBase();
+ if (isVectorSubscript(expr)) {
+ // OpenMP needs the address of the first indexed element (required by
+ // the standard to be the lowest index) to identify the dependency. We
+ // don't need an accurate length for the array section because the
+ // OpenMP standard forbids overlapping array sections.
+
+ // Get a hlfir.elemental_addr op describing the address of the value
+ // indexed from the original array.
+ hlfir::ElementalAddrOp addrOp =
+ convertVectorSubscriptedExprToElementalAddr(
+ converter.getCurrentLocation(), converter, expr, symMap,
+ stmtCtx);
+ if (!addrOp.getCleanup().empty())
+ TODO(converter.getCurrentLocation(),
+ "Vector subscript in DEPEND clause requring a cleanup region");
+
+ // hlfir.elemental_addr doesn't have a normal lowering because it
+ // can't return a value. Instead we need to inline it here using
+ // values for the first element. Similar to hlfir::inlineElementalOp.
+
+ mlir::Value one = builder.createIntegerConstant(
+ converter.getCurrentLocation(), builder.getIndexType(), 1);
+ mlir::SmallVector<mlir::Value> oneBasedIndices;
+ oneBasedIndices.resize(addrOp.getIndices().size(), one);
+
+ mlir::IRMapping mapper;
+ mapper.map(addrOp.getIndices(), oneBasedIndices);
+ assert(addrOp.getElementalRegion().hasOneBlock());
+ mlir::Operation *newOp;
+ for (mlir::Operation &op :
+ addrOp.getElementalRegion().back().getOperations())
+ newOp = builder.clone(op, mapper);
+ auto yield = mlir::cast<hlfir::YieldOp>(newOp);
+
+ addrOp->erase();
+
+ if (!yield.getCleanup().empty())
+ TODO(converter.getCurrentLocation(),
+ "Vector subscript in DEPEND clause requring element cleanup");
+
+ dependVar = yield.getEntity();
+ yield->erase();
+ } else {
+ // Ordinary array section e.g. A(1:512:2)
+ hlfir::EntityWithAttributes entity = convertExprToHLFIR(
+ converter.getCurrentLocation(), converter, expr, symMap, stmtCtx);
+ dependVar = entity.getBase();
+ }
} else {
semantics::Symbol *sym = object.sym();
dependVar = converter.getSymbolAddress(*sym);
diff --git a/flang/test/Lower/OpenMP/Todo/depend-clause-vector-subscript-array-section.f90 b/flang/test/Lower/OpenMP/Todo/depend-clause-vector-subscript-array-section.f90
deleted file mode 100644
index f3bd58c8c559a..0000000000000
--- a/flang/test/Lower/OpenMP/Todo/depend-clause-vector-subscript-array-section.f90
+++ /dev/null
@@ -1,11 +0,0 @@
-! RUN: %not_todo_cmd bbc -emit-hlfir -fopenmp -o - %s 2>&1 | FileCheck %s
-! RUN: %not_todo_cmd %flang_fc1 -emit-hlfir -fopenmp -o - %s 2>&1 | FileCheck %s
-
-! CHECK: Vector subscripted array section for task dependency
-subroutine vectorSubscriptArraySection(array, indices)
- integer :: array(:)
- integer :: indices(:)
-
- !$omp task depend (in: array(indices))
- !$omp end task
-end subroutine
diff --git a/flang/test/Lower/OpenMP/task-depend-array-section.f90 b/flang/test/Lower/OpenMP/task-depend-array-section.f90
index b364a5e06a29c..4033b8ed1abf3 100644
--- a/flang/test/Lower/OpenMP/task-depend-array-section.f90
+++ b/flang/test/Lower/OpenMP/task-depend-array-section.f90
@@ -49,3 +49,36 @@ subroutine assumedShape(array)
! CHECK: }
! CHECK: return
! CHECK: }
+
+subroutine vectorSubscriptArraySection(array, indices)
+ integer :: array(:)
+ integer :: indices(:)
+
+ !$omp task depend (in: array(indices))
+ !$omp end task
+end subroutine
+! CHECK-LABEL: func.func @_QPvectorsubscriptarraysection(
+! CHECK-SAME: %[[VAL_0:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: !fir.box<!fir.array<?xi32>> {fir.bindc_name = "array"},
+! CHECK-SAME: %[[VAL_1:[0-9]+|[a-zA-Z$._-][a-zA-Z0-9$._-]*]]: !fir.box<!fir.array<?xi32>> {fir.bindc_name = "indices"}) {
+! CHECK: %[[VAL_2:.*]] = fir.dummy_scope : !fir.dscope
+! CHECK: %[[VAL_3:.*]]:2 = hlfir.declare %[[VAL_0]] dummy_scope %[[VAL_2]] {uniq_name = "_QFvectorsubscriptarraysectionEarray"} : (!fir.box<!fir.array<?xi32>>, !fir.dscope) -> (!fir.box<!fir.array<?xi32>>, !fir.box<!fir.array<?xi32>>)
+! CHECK: %[[VAL_4:.*]]:2 = hlfir.declare %[[VAL_1]] dummy_scope %[[VAL_2]] {uniq_name = "_QFvectorsubscriptarraysectionEindices"} : (!fir.box<!fir.array<?xi32>>, !fir.dscope) -> (!fir.box<!fir.array<?xi32>>, !fir.box<!fir.array<?xi32>>)
+! CHECK: %[[VAL_5:.*]] = arith.constant 0 : index
+! CHECK: %[[VAL_6:.*]]:3 = fir.box_dims %[[VAL_4]]#0, %[[VAL_5]] : (!fir.box<!fir.array<?xi32>>, index) -> (index, index, index)
+! CHECK: %[[VAL_7:.*]] = fir.shape %[[VAL_6]]#1 : (index) -> !fir.shape<1>
+! CHECK: %[[VAL_8:.*]] = hlfir.elemental %[[VAL_7]] unordered : (!fir.shape<1>) -> !hlfir.expr<?xi64> {
+! CHECK: ^bb0(%[[VAL_9:.*]]: index):
+! CHECK: %[[VAL_10:.*]] = hlfir.designate %[[VAL_4]]#0 (%[[VAL_9]]) : (!fir.box<!fir.array<?xi32>>, index) -> !fir.ref<i32>
+! CHECK: %[[VAL_11:.*]] = fir.load %[[VAL_10]] : !fir.ref<i32>
+! CHECK: %[[VAL_12:.*]] = fir.convert %[[VAL_11]] : (i32) -> i64
+! CHECK: hlfir.yield_element %[[VAL_12]] : i64
+! CHECK: }
+! CHECK: %[[VAL_13:.*]] = arith.constant 1 : index
+! CHECK: %[[VAL_14:.*]] = hlfir.apply %[[VAL_8]], %[[VAL_13]] : (!hlfir.expr<?xi64>, index) -> i64
+! CHECK: %[[VAL_15:.*]] = hlfir.designate %[[VAL_3]]#0 (%[[VAL_14]]) : (!fir.box<!fir.array<?xi32>>, i64) -> !fir.ref<i32>
+! CHECK: omp.task depend(taskdependin -> %[[VAL_15]] : !fir.ref<i32>) {
+! CHECK: omp.terminator
+! CHECK: }
+! CHECK: hlfir.destroy %[[VAL_8]] : !hlfir.expr<?xi64>
+! CHECK: return
+! CHECK: }
|
…tion point (#133891) The statement context is used for lowering clauses for openmp operations using generalised helpers from flang lowering. The statement context stores closures which generate code for cleaning up temporary values generated by the lowering helper. These closures are run when the statement construct is destroyed. Keeping the statement context local to the clause or operation being lowered without any special handling was not correct because any cleanup code would be generated at the insertion point when that statement context went out of scope (which would in general be inside of the newly created container operation). It would be better to generate the cleanup code after the newly created operation (clause processing is synchronous even for deferred tasks). Currently supported clauses are mostly populated with simple scalar values that require no cleanup. Even the simple array sections added by #132994 needed no cleanup because indexing the right values of the array did not create any temporaries. Supporting array sections with vector indexing will generate hlfir.destroy operations for cleanup. This patch fixes where those will be created. Those hlfir.destroy operations don't generate any FIR (or LLVM) code, but the issue still exists theoretically. I wasn't able to find any clauses which have any cleanup to use to test this PR. It is probably NFC for the current lowering. This will be tested in [the PR adding vector subscripting of array sections](llvm/llvm-project#133892).
Thank you for the changes, Tom! It feels a bit hacky, but I think it is an acceptable reuse of the designator builder code. At the same time, I wonder why you need to modify the operation definition at all. It seems to be short-lived, and it should not reach the verification between the insertion and deleteion in your code. Would that be possible to keep the operation definition unchanged, and have a comment in the OpenMP code saying that this is a non-standard usage of the operation, and that it will fail if we do not manage to erase it? |
Nice catch Slava. I was hitting the verifier because I was printing it out while I worked on it and I missed that it wasn't needed in the final version of the code. Thanks for taking a look! |
Oh, right, it would fail if anyone try to print IR in the process. That is a problem, and I think it is worth a comment around this code. It seems that the best approach would be to reuse the code for the designator generation, but not the operation. We should discuss this with Jean, when he is back. For the time being, I am okay with the change (plase add a comment explaining potential verification failures). |
* origin/main: (199 commits) [NFC][AsmPrinter] Refactor AsmPrinter and AArch64AsmPrinter to prepare for jump table partitions on aarch64 (llvm#125993) [HEXAGON] Fix corner cases for hwloops pass (llvm#135439) [flang] Handle volatility in lowering and codegen (llvm#135311) [MLIR][Shape] Support >2 args in `shape.broadcast` folder (llvm#126808) [DirectX] Use scalar arguments for @llvm.dx.dot intrinsics (llvm#134570) Remove the redundant check for "WeakPtr" in isSmartPtrClass to fix the issue 135612. (llvm#135629) [BOLT] Support relative vtable (llvm#135449) [flang] Fix linking to libMLIR (llvm#135483) [AsmPrinter] Link .section_sizes to the correct section (llvm#135583) [ctxprof] Handle instrumenting functions with `musttail` calls (llvm#135121) [SystemZ] Consider VST/VL as SimpleBDXStore/Load (llvm#135623) [libc++][CI] Pin the XCode version. (llvm#135412) [lldb-dap] Fix win32 build. (llvm#135638) [Interp] Mark inline-virtual.cpp as unsupported with ASan (llvm#135402) [libc++] Removes the _LIBCPP_VERBOSE_ABORT_NOT_NOEXCEPT macro. (llvm#135494) [CVP] Add tests for ucmp/scmp with switch (NFC) [mlir][tosa] Align AbsOp example variable names (llvm#135268) [mlir][tosa] Align AddOp examples to spec (llvm#135266) [mlir][tosa] Align RFFT2d and FFT2d operator examples (llvm#135261) [flang][OpenMP][HLFIR] Support vector subscripted array sections for DEPEND (llvm#133892) ...
…DEPEND (llvm#133892) The OpenMP runtime needs the base address of the array section to identify the dependency. If we just put the vector subscript through the usual HLFIR expression lowering, that would generate a new contiguous array representing the values of the elements in the array which was sectioned. We cannot use addresses from this array because these addresses would not match dependencies on the original array. For example ``` integer :: array(1024) integer :: indices(2) indices(1) = 1 indices(2) = 100 !$omp task depend(out: array(1:512)) !$omp end task !$omp task depend(in: array(indices)) !$omp end task ``` This requires taking the lowering path previously only used for ordered assignments to get the address of the elements in the original array which were indexed. This is done using `hlfir.elemental_addr`. e.g. ``` array(indices) = 2 ``` `hlfir.elemental_addr` is awkward to use because it (by design) doesn't return something like `!hlfir.expr<>` (like `hlfir.elemental`) and so it can't have a generic lowering: each place it is used has to carefully inline the contents of the operation and extract the needed address. For this reason, `hlfir.elemental_addr` is not allowed outside of these ordered assignments. In this commit I ignore this restriction so that I can use `hlfir.elemental_addr` to lower the OpenMP DEPEND clause (this works because the operation is inlined and removed before the verifier runs). One alternative solution would have been to provide my own more limited re-implementation of `HlfirDesignatorBuilder` which skipped `hlfir::elemental_addr`, instead inlining its body directly at the current insertion point applying indices only for the first element. This would have been difficult to maintain because designation in Fortran is complex.
Sharing the lowering logic would bring more complexity I think. You would need to share the code here
Splitting the steps seems less complex to me than sharing the generation logic here. So I think your solution is rather clean, even though it implies generating a temporary hlfir.elemental_addr that is discared (in fact, it is not very different from how hlfir.elemental_addr are transformed into hlfir.elemental [here] (https://github.com/llvm/llvm-project/blob/c9eb1ffcfe7cf8c9751afe436a6fcab9cb5b9c9b/flang/lib/Lower/ConvertExprToHLFIR.cpp#L850C31-L850C71) when they appear outside of LHS). The extra generation/deletion compile time cost should be quite low, and vector subscripts are not that common anyway. My only suggestion would be to contain this hlfir.elemental creation/deletion pattern to ConvertExprToHLFIR.cpp in some kind of Note that an alternative would be to work at the evaluate::Expr level to build the designator of the first element and lower that, have you considered it? I am not sure that it would be cheaper/less complex (if that code is behind an API, we can easily go one way or the other anyway). |
This encapsulates implementation details of hlfir.elemental_addr inside of ConvertExprToHLFIR instead of leaking to OpenMP code. Requested here: llvm#133892 (comment)
Thanks for taking a look Jean, I have moved it into a helper in #137456 |
…7456) This encapsulates implementation details of hlfir.elemental_addr inside of ConvertExprToHLFIR instead of leaking to OpenMP code. Requested here: #133892 (comment)
…m#137456) This encapsulates implementation details of hlfir.elemental_addr inside of ConvertExprToHLFIR instead of leaking to OpenMP code. Requested here: llvm#133892 (comment)
…m#137456) This encapsulates implementation details of hlfir.elemental_addr inside of ConvertExprToHLFIR instead of leaking to OpenMP code. Requested here: llvm#133892 (comment)
…m#137456) This encapsulates implementation details of hlfir.elemental_addr inside of ConvertExprToHLFIR instead of leaking to OpenMP code. Requested here: llvm#133892 (comment)
…helper (#137456) This encapsulates implementation details of hlfir.elemental_addr inside of ConvertExprToHLFIR instead of leaking to OpenMP code. Requested here: llvm/llvm-project#133892 (comment)
This was added in - llvm#132230 - llvm#132994 - llvm#133892
…rted (#139081) This was added in - llvm/llvm-project#132230 - llvm/llvm-project#132994 - llvm/llvm-project#133892
…m#137456) This encapsulates implementation details of hlfir.elemental_addr inside of ConvertExprToHLFIR instead of leaking to OpenMP code. Requested here: llvm#133892 (comment)
The OpenMP runtime needs the base address of the array section to identify the dependency.
If we just put the vector subscript through the usual HLFIR expression lowering, that would generate a new contiguous array representing the values of the elements in the array which was sectioned. We cannot use addresses from this array because these addresses would not match dependencies on the original array. For example
This requires taking the lowering path previously only used for ordered assignments to get the address of the elements in the original array which were indexed. This is done using
hlfir.elemental_addr
. e.g.hlfir.elemental_addr
is awkward to use because it (by design) doesn't return something like!hlfir.expr<>
(likehlfir.elemental
) and so it can't have a generic lowering: each place it is used has to carefully inline the contents of the operation and extract the needed address.For this reason,
hlfir.elemental_addr
is not allowed outside of these ordered assignments. In this commit I ignore this restriction so that I can usehlfir.elemental_addr
to lower the OpenMP DEPEND clause (this works because the operation is inlined and removed before the verifier runs).One alternative solution would have been to provide my own more limited re-implementation of
HlfirDesignatorBuilder
which skippedhlfir::elemental_addr
, instead inlining its body directly at the current insertion point applying indices only for the first element. This would have been difficult to maintain because designation in Fortran is complex.