Skip to content

[flang][OpenMP] Enable delayed privatization by default omp.wsloop #122471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 16, 2025

Conversation

ergawy
Copy link
Member

@ergawy ergawy commented Jan 10, 2025

This enable delayed privatization by default for omp.wsloop ops, with one caveat! I had to workaround the "impure" alloc region issue that being resolved at the moment. The workaround detects whether the alloc region's argument is used in the region and at the same time defined in block that does not dominate the chosen alloca insertion point. If so, we move the alloca insertion point below the defining instruction of the alloc region argument. This basically reverts to the non-delayed-privatizaiton behavior.

@llvmbot llvmbot added mlir:llvm mlir flang Flang issues not falling into any other category mlir:openmp flang:fir-hlfir flang:openmp labels Jan 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 10, 2025

@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-mlir-openmp

Author: Kareem Ergawy (ergawy)

Changes

This enable delayed privatization by default for omp.wsloop ops, with one caveat! I had to workaround the "impure" alloc region issue that being resolved at the moment. The workaround detects whether the alloc region's argument is used in the region and at the same time defined in block that does not dominate the chosen alloca insertion point. If so, we move the alloca insertion point below the defining instruction of the alloc region argument. This basically reverts to the non-delayed-privatizaiton behavior.


Patch is 251.21 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/122471.diff

73 Files Affected:

  • (modified) flang/lib/Lower/OpenMP/OpenMP.cpp (+1-1)
  • (modified) flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90 (+2-2)
  • (modified) flang/test/Lower/OpenMP/Todo/omp-default-clause-inner-loop.f90 (+1-2)
  • (modified) flang/test/Lower/OpenMP/associate.f90 (+2-2)
  • (modified) flang/test/Lower/OpenMP/copyin.f90 (+4-8)
  • (modified) flang/test/Lower/OpenMP/critical.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/default-clause-byref.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/default-clause.f90 (+6-10)
  • (modified) flang/test/Lower/OpenMP/hlfir-wsloop.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/lastprivate-allocatable.f90 (+4-5)
  • (modified) flang/test/Lower/OpenMP/lastprivate-commonblock.f90 (+2-4)
  • (modified) flang/test/Lower/OpenMP/lastprivate-iv.f90 (+5-10)
  • (modified) flang/test/Lower/OpenMP/location.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/order-clause.f90 (+3-3)
  • (modified) flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 (+17-30)
  • (modified) flang/test/Lower/OpenMP/parallel-private-clause-fixes.f90 (+5-7)
  • (modified) flang/test/Lower/OpenMP/parallel-private-clause.f90 (+10-33)
  • (modified) flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/parallel-reduction3.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/parallel-wsloop-firstpriv.f90 (+9-27)
  • (modified) flang/test/Lower/OpenMP/parallel-wsloop-lastpriv.f90 (+26-44)
  • (modified) flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/parallel-wsloop.f90 (+21-45)
  • (modified) flang/test/Lower/OpenMP/private-derived-type.f90 (+12-10)
  • (modified) flang/test/Lower/OpenMP/stop-stmt-in-region.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/target.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/unstructured.f90 (+10-14)
  • (modified) flang/test/Lower/OpenMP/wsloop-chunks.f90 (+3-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-collapse.f90 (+6-10)
  • (modified) flang/test/Lower/OpenMP/wsloop-monotonic.f90 (+1-2)
  • (modified) flang/test/Lower/OpenMP/wsloop-nonmonotonic.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-ordered.f90 (+2-2)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-add-byref.f90 (+14-21)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-add.f90 (+14-21)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 (+4-6)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-array.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-iand-byref.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-iand.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-ieor-byref.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-ieor.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-ior-byref.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-ior.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-and-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-and.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-eqv-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-eqv.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-neqv-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-neqv.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-or-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-or.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-max-2-byref.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-max-2.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-max-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-max.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-min-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-min.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-min2.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-mul-byref.f90 (+14-21)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-mul.f90 (+14-21)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-multi.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-schedule.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-unstructured.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-variable.f90 (+7-10)
  • (modified) flang/test/Lower/OpenMP/wsloop.f90 (+6-9)
  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+56)
  • (added) mlir/test/Target/LLVMIR/openmp-wsloop-private-late-alloca-workaround.mlir (+50)
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index cd4b25a17722c1..2ce753039880c1 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -2079,7 +2079,7 @@ static void genStandaloneDo(lower::AbstractConverter &converter,
 
   DataSharingProcessor dsp(converter, semaCtx, item->clauses, eval,
                            /*shouldCollectPreDeterminedSymbols=*/true,
-                           enableDelayedPrivatizationStaging, symTable);
+                           enableDelayedPrivatization, symTable);
   dsp.processStep1(&wsloopClauseOps);
 
   mlir::omp::LoopNestOperands loopNestClauseOps;
diff --git a/flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90 b/flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90
index 66fd120085c782..c98850b8000d36 100644
--- a/flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90
+++ b/flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90
@@ -1,6 +1,6 @@
-! RUN: %flang_fc1 -emit-hlfir -fopenmp -mmlir --openmp-enable-delayed-privatization-staging \
+! RUN: %flang_fc1 -emit-hlfir -fopenmp \
 ! RUN:   -o - %s 2>&1 | FileCheck %s
-! RUN: bbc -emit-hlfir -fopenmp --openmp-enable-delayed-privatization-staging -o - %s 2>&1 \
+! RUN: bbc -emit-hlfir -fopenmp  -o - %s 2>&1 \
 ! RUN:   | FileCheck %s
 
 subroutine wsloop_private
diff --git a/flang/test/Lower/OpenMP/Todo/omp-default-clause-inner-loop.f90 b/flang/test/Lower/OpenMP/Todo/omp-default-clause-inner-loop.f90
index 77a1304f39a488..10879c53dc0c58 100644
--- a/flang/test/Lower/OpenMP/Todo/omp-default-clause-inner-loop.f90
+++ b/flang/test/Lower/OpenMP/Todo/omp-default-clause-inner-loop.f90
@@ -9,11 +9,10 @@
 ! The string "EXPECTED" denotes the expected FIR
 
 ! CHECK: omp.parallel  private(@{{.*}} %{{.*}} -> %[[PRIVATE_Y:.*]], @{{.*}} %{{.*}} -> %[[PRIVATE_Y:.*]] : !fir.ref<i32>, !fir.ref<i32>) {
-! CHECK: %[[TEMP:.*]] = fir.alloca i32 {bindc_name = "x", pinned, {{.*}}}
 ! CHECK: %[[const_1:.*]] = arith.constant 1 : i32
 ! CHECK: %[[const_2:.*]] = arith.constant 10 : i32
 ! CHECK: %[[const_3:.*]] = arith.constant 1 : i32
-! CHECK: omp.wsloop {
+! CHECK: omp.wsloop private(@{{.*}} %{{.*}} -> %[[TEMP:.*]] : !fir.ref<i32>) {
 ! CHECK-NEXT: omp.loop_nest (%[[ARG:.*]]) : i32 = (%[[const_1]]) to (%[[const_2]]) inclusive step (%[[const_3]]) {
 ! CHECK: fir.store %[[ARG]] to %[[TEMP]] : !fir.ref<i32>
 ! EXPECTED: %[[temp_1:.*]] = fir.load %[[PRIVATE_Z]] : !fir.ref<i32>
diff --git a/flang/test/Lower/OpenMP/associate.f90 b/flang/test/Lower/OpenMP/associate.f90
index 4964890a6842c1..d497b4ade782e0 100644
--- a/flang/test/Lower/OpenMP/associate.f90
+++ b/flang/test/Lower/OpenMP/associate.f90
@@ -6,12 +6,12 @@
 !CHECK:         omp.parallel {
 !CHECK-NOT:       hlfir.declare {{.*}} {uniq_name = "_QFtest_parallel_assocEa"}
 !CHECK-NOT:       hlfir.declare {{.*}} {uniq_name = "_QFtest_parallel_assocEb"}
-!CHECK:           omp.wsloop {
+!CHECK:           omp.wsloop private({{.*}}) {
 !CHECK:           }
 !CHECK:         }
 !CHECK:         omp.parallel {{.*}} {
 !CHECK-NOT:       hlfir.declare {{.*}} {uniq_name = "_QFtest_parallel_assocEb"}
-!CHECK:           omp.wsloop {
+!CHECK:           omp.wsloop private({{.*}}) {
 !CHECK:           }
 !CHECK:         }
 subroutine test_parallel_assoc()
diff --git a/flang/test/Lower/OpenMP/copyin.f90 b/flang/test/Lower/OpenMP/copyin.f90
index f3d147c10668f4..af41442e6870f6 100644
--- a/flang/test/Lower/OpenMP/copyin.f90
+++ b/flang/test/Lower/OpenMP/copyin.f90
@@ -154,14 +154,13 @@ subroutine copyin_derived_type()
 
 ! CHECK:             omp.barrier
 
-! CHECK:             %[[VAL_6:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-! CHECK:             %[[VAL_7:.*]]:2 = hlfir.declare %[[VAL_6]] {uniq_name = "_QFcombined_parallel_worksharing_loopEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 
 ! CHECK:             %[[VAL_11:.*]] = arith.constant 1 : i32
 ! CHECK:             %[[VAL_12:.*]] = fir.load %[[VAL_9]]#0 : !fir.ref<i32>
 ! CHECK:             %[[VAL_13:.*]] = arith.constant 1 : i32
-! CHECK:             omp.wsloop {
+! CHECK:             omp.wsloop private(@{{.*}} %{{.*}} -> %[[VAL_6:.*]] : !fir.ref<i32>) {
 ! CHECK-NEXT:          omp.loop_nest (%[[VAL_14:.*]]) : i32 = (%[[VAL_11]]) to (%[[VAL_12]]) inclusive step (%[[VAL_13]]) {
+! CHECK:                 %[[VAL_7:.*]]:2 = hlfir.declare %[[VAL_6]] {uniq_name = "_QFcombined_parallel_worksharing_loopEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 ! CHECK:                 fir.store %[[VAL_14]] to %[[VAL_7]]#1 : !fir.ref<i32>
 ! CHECK:                 fir.call @_QPsub4(%[[VAL_9]]#1) fastmath<contract> : (!fir.ref<i32>) -> ()
 ! CHECK:                 omp.yield
@@ -321,15 +320,12 @@ subroutine common_1()
 ! CHECK:             %[[VAL_33:.*]] = fir.load %[[VAL_18]]#0 : !fir.ref<i32>
 ! CHECK:             hlfir.assign %[[VAL_33]] to %[[VAL_31]]#0 : i32, !fir.ref<i32>
 ! CHECK:             omp.barrier
-
-! CHECK:             %[[VAL_19:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-! CHECK:             %[[VAL_20:.*]]:2 = hlfir.declare %[[VAL_19]] {uniq_name = "_QFcommon_2Ei"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-
 ! CHECK:             %[[VAL_34:.*]] = arith.constant 1 : i32
 ! CHECK:             %[[VAL_35:.*]] = fir.load %[[VAL_26]]#0 : !fir.ref<i32>
 ! CHECK:             %[[VAL_36:.*]] = arith.constant 1 : i32
-! CHECK:             omp.wsloop {
+! CHECK:             omp.wsloop private(@{{.*}} %{{.*}} -> %[[VAL_19:.*]] : !fir.ref<i32>) {
 ! CHECK-NEXT:          omp.loop_nest (%[[VAL_37:.*]]) : i32 = (%[[VAL_34]]) to (%[[VAL_35]]) inclusive step (%[[VAL_36]]) {
+! CHECK:             %[[VAL_20:.*]]:2 = hlfir.declare %[[VAL_19]] {uniq_name = "_QFcommon_2Ei"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 ! CHECK:                 fir.store %[[VAL_37]] to %[[VAL_20]]#1 : !fir.ref<i32>
 ! CHECK:                 %[[VAL_38:.*]] = fir.load %[[VAL_31]]#0 : !fir.ref<i32>
 ! CHECK:                 %[[VAL_39:.*]] = fir.load %[[VAL_20]]#0 : !fir.ref<i32>
diff --git a/flang/test/Lower/OpenMP/critical.f90 b/flang/test/Lower/OpenMP/critical.f90
index 051d3782106469..99a4426ab04533 100644
--- a/flang/test/Lower/OpenMP/critical.f90
+++ b/flang/test/Lower/OpenMP/critical.f90
@@ -38,11 +38,10 @@ subroutine predetermined_privatization()
   !CHECK: omp.parallel
   !$omp parallel do
 
-  !CHECK: %[[PRIV_I_ALLOC:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-  !CHECK: %[[PRIV_I_DECL:.*]]:2 = hlfir.declare %[[PRIV_I_ALLOC]]
   do i = 2, 10
-    !CHECK: omp.wsloop
+    !CHECK: omp.wsloop private(@{{.*}} %{{.*}} -> %[[PRIV_I_ALLOC:.*]] : !fir.ref<i32>)
     !CHECK: omp.loop_nest (%[[IV:[^[:space:]]+]])
+    !CHECK: %[[PRIV_I_DECL:.*]]:2 = hlfir.declare %[[PRIV_I_ALLOC]]
     !CHECK: fir.store %[[IV]] to %[[PRIV_I_DECL]]#1
     !CHECK: omp.critical
     !$omp critical
diff --git a/flang/test/Lower/OpenMP/default-clause-byref.f90 b/flang/test/Lower/OpenMP/default-clause-byref.f90
index 654c13ada9e39f..10e62005f42ba0 100644
--- a/flang/test/Lower/OpenMP/default-clause-byref.f90
+++ b/flang/test/Lower/OpenMP/default-clause-byref.f90
@@ -346,7 +346,7 @@ subroutine skipped_default_clause_checks()
        type(it)::iii
 
 !CHECK: omp.parallel {{.*}} {
-!CHECK: omp.wsloop reduction(byref @min_byref_i32 %[[VAL_Z_DECLARE]]#0 -> %[[PRV:.+]] : !fir.ref<i32>) {
+!CHECK: omp.wsloop private({{.*}}) reduction(byref @min_byref_i32 %[[VAL_Z_DECLARE]]#0 -> %[[PRV:.+]] : !fir.ref<i32>) {
 !CHECK-NEXT: omp.loop_nest (%[[ARG:.*]]) {{.*}} {
 !CHECK: omp.yield
 !CHECK: }
diff --git a/flang/test/Lower/OpenMP/default-clause.f90 b/flang/test/Lower/OpenMP/default-clause.f90
index c004813a911f73..fcc8d033eea0fa 100644
--- a/flang/test/Lower/OpenMP/default-clause.f90
+++ b/flang/test/Lower/OpenMP/default-clause.f90
@@ -284,16 +284,13 @@ subroutine nested_default_clause_test4
 !CHECK-LABEL: func @_QPnested_default_clause_test5
 !CHECK: omp.parallel {
 
-!CHECK: %[[X_ALLOCA:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFnested_default_clause_test5Ex"}
-!CHECK: %[[X_DECLARE:.*]]:2 = hlfir.declare %[[X_ALLOCA]] {{.*}}
-
-!CHECK: %[[LOOP_VAR_ALLOCA:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-!CHECK: %[[LOOP_VAR_DECLARE:.*]]:2 = hlfir.declare %[[LOOP_VAR_ALLOCA]] {{.*}}
-
 !CHECK: %[[CONST_LB:.*]] = arith.constant 1 : i32
 !CHECK: %[[CONST_UB:.*]] = arith.constant 50 : i32
 !CHECK: %[[CONST_STEP:.*]] = arith.constant 1 : i32
+! CHECK: omp.wsloop private(@{{.*}} %{{.*}} -> %[[X_ALLOCA:.*]], @{{.*}} %{{.*}} -> %[[LOOP_VAR_ALLOCA:.*]] : !fir.ref<i32>, !fir.ref<i32>) {
 !CHECK: omp.loop_nest (%[[ARG:.*]]) : i32 = (%[[CONST_LB]]) to (%[[CONST_UB]]) inclusive step (%[[CONST_STEP]]) {
+!CHECK: %[[X_DECLARE:.*]]:2 = hlfir.declare %[[X_ALLOCA]] {{.*}}
+!CHECK: %[[LOOP_VAR_DECLARE:.*]]:2 = hlfir.declare %[[LOOP_VAR_ALLOCA]] {{.*}}
 !CHECK: fir.store %[[ARG]] to %[[LOOP_VAR_DECLARE]]#1 : !fir.ref<i32>
 !CHECK: %[[LOADED_X:.*]] = fir.load %[[X_DECLARE]]#0 : !fir.ref<i32>
 !CHECK: %[[CONST:.*]] = arith.constant 1 : i32
@@ -321,13 +318,12 @@ subroutine nested_default_clause_test5
 
 !CHECK: %[[Z_VAR_DECLARE:.*]]:2 = hlfir.declare %[[Z_VAR]] {{.*}}
 
-!CHECK: %[[LOOP_VAR:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-!CHECK: %[[LOOP_VAR_DECLARE:.*]]:2 = hlfir.declare %[[LOOP_VAR]] {{.*}}
-
 !CHECK: %[[CONST_LB:.*]] = arith.constant 1 : i32
 !CHECK: %[[CONST_UB:.*]] = arith.constant 10 : i32
 !CHECK: %[[CONST_STEP:.*]] = arith.constant 1 : i32
+! CHECK: omp.wsloop private(@{{.*}} %{{.*}} -> %[[LOOP_VAR:.*]] : !fir.ref<i32>) {
 !CHECK: omp.loop_nest (%[[ARG:.*]]) : i32 = (%[[CONST_LB]]) to (%[[CONST_UB]]) inclusive step (%[[CONST_STEP]]) {
+!CHECK: %[[LOOP_VAR_DECLARE:.*]]:2 = hlfir.declare %[[LOOP_VAR]] {{.*}}
 !CHECK: fir.store %[[ARG]] to %[[LOOP_VAR_DECLARE]]#1 : !fir.ref<i32>
 !CHECK: %[[LOADED_X:.*]] = fir.load %[[X_VAR_DECLARE]]#0 : !fir.ref<i32>
 !CHECK: %[[CONST:.*]] = arith.constant 1 : i32
@@ -386,7 +382,7 @@ subroutine skipped_default_clause_checks()
        type(it)::iii
 
 !CHECK: omp.parallel {{.*}} {
-!CHECK: omp.wsloop reduction(@min_i32 %[[VAL_Z_DECLARE]]#0 -> %[[PRV:.+]] : !fir.ref<i32>) {
+!CHECK: omp.wsloop private({{.*}}) reduction(@min_i32 %[[VAL_Z_DECLARE]]#0 -> %[[PRV:.+]] : !fir.ref<i32>) {
 !CHECK-NEXT: omp.loop_nest (%[[ARG:.*]]) {{.*}} {
 !CHECK: omp.yield
 !CHECK: }
diff --git a/flang/test/Lower/OpenMP/hlfir-wsloop.f90 b/flang/test/Lower/OpenMP/hlfir-wsloop.f90
index f7b0ba681efebf..786ab916d000c3 100644
--- a/flang/test/Lower/OpenMP/hlfir-wsloop.f90
+++ b/flang/test/Lower/OpenMP/hlfir-wsloop.f90
@@ -10,12 +10,11 @@ subroutine simple_loop
   ! CHECK-DAG:     %[[WS_END:.*]] = arith.constant 9 : i32
   ! CHECK:  omp.parallel
   !$OMP PARALLEL
-  ! CHECK-DAG:     %[[ALLOCA_IV:.*]] = fir.alloca i32 {{{.*}}, pinned, {{.*}}}
-  ! CHECK:         %[[IV:.*]]    = fir.declare %[[ALLOCA_IV]] {uniq_name = "_QFsimple_loopEi"} : (!fir.ref<i32>) -> !fir.ref<i32>
-  ! CHECK:         omp.wsloop {
+  ! CHECK:         omp.wsloop private(@{{.*}} %{{.*}} -> %[[ALLOCA_IV:.*]] : !fir.ref<i32>) {
   ! CHECK-NEXT:      omp.loop_nest (%[[I:.*]]) : i32 = (%[[WS_ST]]) to (%[[WS_END]]) inclusive step (%[[WS_ST]]) {
   !$OMP DO
   do i=1, 9
+  ! CHECK:         %[[IV:.*]]    = fir.declare %[[ALLOCA_IV]] {uniq_name = "_QFsimple_loopEi"} : (!fir.ref<i32>) -> !fir.ref<i32>
   ! CHECK:             fir.store %[[I]] to %[[IV:.*]] : !fir.ref<i32>
   ! CHECK:             %[[LOAD_IV:.*]] = fir.load %[[IV]] : !fir.ref<i32>
   ! CHECK:             fir.call @_FortranAioOutputInteger32({{.*}}, %[[LOAD_IV]]) {{.*}}: (!fir.ref<i8>, i32) -> i1
diff --git a/flang/test/Lower/OpenMP/lastprivate-allocatable.f90 b/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
index 6b7d849fde93ca..fd8338393dd880 100644
--- a/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
+++ b/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
@@ -8,12 +8,11 @@
 ! CHECK:           fir.store %[[VAL_2]] to %[[VAL_0]] : !fir.ref<!fir.box<!fir.heap<i32>>>
 ! CHECK:           %[[VAL_3:.*]]:2 = hlfir.declare %[[VAL_0]] {fortran_attrs = {{.*}}<allocatable>, uniq_name = "_QFEa"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
 ! CHECK:           omp.parallel {
-!                    create original copy of private variable
-! CHECK:             %[[VAL_16:.*]]:2 = hlfir.declare %{{.*}} {fortran_attrs = {{.*}}<allocatable>, uniq_name = "_QFEa"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
-! CHECK:             %[[VAL_17:.*]] = fir.alloca i32 {bindc_name = "i", pinned, uniq_name = "_QFEi"}
-! CHECK:             %[[VAL_18:.*]]:2 = hlfir.declare %[[VAL_17]] {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-! CHECK:             omp.wsloop {
+! CHECK:             omp.wsloop private(@{{.*}} %{{.*}} -> %{{.*}}, @{{.*}} %{{.*}} -> %[[VAL_17:.*]] : !fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<i32>) {
 ! CHECK:               omp.loop_nest
+! CHECK:                   %[[VAL_16:.*]]:2 = hlfir.declare %{{.*}} {fortran_attrs = {{.*}}<allocatable>, uniq_name = "_QFEa"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
+! CHECK:                   %[[VAL_18:.*]]:2 = hlfir.declare %[[VAL_17]] {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+
 !                        [...]
 !                        if this is the last iteration
 ! CHECK:                 fir.if %{{.*}} {
diff --git a/flang/test/Lower/OpenMP/lastprivate-commonblock.f90 b/flang/test/Lower/OpenMP/lastprivate-commonblock.f90
index faa3d3e053f345..c059382bf634c5 100644
--- a/flang/test/Lower/OpenMP/lastprivate-commonblock.f90
+++ b/flang/test/Lower/OpenMP/lastprivate-commonblock.f90
@@ -11,12 +11,10 @@
 !CHECK:      %[[CB_C_Y_COOR:.*]] = fir.coordinate_of %[[CB_C_REF_CVT]], %{{.*}} : (!fir.ref<!fir.array<?xi8>>, index) -> !fir.ref<i8>
 !CHECK:      %[[CB_C_Y_ADDR:.*]] = fir.convert %[[CB_C_Y_COOR]] : (!fir.ref<i8>) -> !fir.ref<f32>
 !CHECK:      %[[Y_DECL:.*]]:2 = hlfir.declare %[[CB_C_Y_ADDR]] {uniq_name = "_QFlastprivate_commonEy"} : (!fir.ref<f32>) -> (!fir.ref<f32>, !fir.ref<f32>)
-!CHECK:      %[[PRIVATE_X_REF:.*]] = fir.alloca f32 {bindc_name = "x", pinned, uniq_name = "_QFlastprivate_commonEx"}
+!CHECK:      omp.wsloop private(@{{.*}} %{{.*}} -> %[[PRIVATE_X_REF:.*]], @{{.*}} %{{.*}} -> %[[PRIVATE_Y_REF:.*]], @{{.*}} %{{.*}} -> %{{.*}} : !{{.*}}, !{{.*}}, !{{.*}}) {
+!CHECK-NEXT:   omp.loop_nest (%[[I:.*]]) : i32 = (%{{.*}}) to (%{{.*}}) inclusive step (%{{.*}}) {
 !CHECK:      %[[PRIVATE_X_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_X_REF]] {uniq_name = "_QFlastprivate_commonEx"} : (!fir.ref<f32>) -> (!fir.ref<f32>, !fir.ref<f32>)
-!CHECK:      %[[PRIVATE_Y_REF:.*]] = fir.alloca f32 {bindc_name = "y", pinned, uniq_name = "_QFlastprivate_commonEy"}
 !CHECK:      %[[PRIVATE_Y_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_Y_REF]] {uniq_name = "_QFlastprivate_commonEy"} : (!fir.ref<f32>) -> (!fir.ref<f32>, !fir.ref<f32>)
-!CHECK:      omp.wsloop {
-!CHECK-NEXT:   omp.loop_nest (%[[I:.*]]) : i32 = (%{{.*}}) to (%{{.*}}) inclusive step (%{{.*}}) {
 !CHECK:          %[[V:.*]] = arith.addi %[[I]], %{{.*}} : i32
 !CHECK:          %[[C0:.*]] = arith.constant 0 : i32
 !CHECK:          %[[NEG_STEP:.*]] = arith.cmpi slt, %{{.*}}, %[[C0]] : i32
diff --git a/flang/test/Lower/OpenMP/lastprivate-iv.f90 b/flang/test/Lower/OpenMP/lastprivate-iv.f90
index 63a81e818bc8ba..aacefd8b59c0f2 100644
--- a/flang/test/Lower/OpenMP/lastprivate-iv.f90
+++ b/flang/test/Lower/OpenMP/lastprivate-iv.f90
@@ -6,14 +6,12 @@
 !CHECK:      %[[I2_MEM:.*]] = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFlastprivate_iv_incEi"}
 !CHECK:      %[[I2:.*]]:2 = hlfir.declare %[[I2_MEM]] {uniq_name = "_QFlastprivate_iv_incEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 
-!CHECK:      %[[I_MEM:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-!CHECK:      %[[I:.*]]:2 = hlfir.declare %[[I_MEM]] {uniq_name = "_QFlastprivate_iv_incEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-
 !CHECK:      %[[LB:.*]] = arith.constant 4 : i32
 !CHECK:      %[[UB:.*]] = arith.constant 10 : i32
 !CHECK:      %[[STEP:.*]]  = arith.constant 3 : i32
-!CHECK:      omp.wsloop {
+!CHECK:      omp.wsloop private(@{{.*}} %{{.*}} -> %[[I_MEM:.*]] : !fir.ref<i32>) {
 !CHECK-NEXT:   omp.loop_nest (%[[IV:.*]]) : i32 = (%[[LB]]) to (%[[UB]]) inclusive step (%[[STEP]]) {
+!CHECK:          %[[I:.*]]:2 = hlfir.declare %[[I_MEM]] {uniq_name = "_QFlastprivate_iv_incEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 !CHECK:          fir.store %[[IV]] to %[[I]]#1 : !fir.ref<i32>
 !CHECK:          %[[V:.*]] = arith.addi %[[IV]], %[[STEP]] : i32
 !CHECK:          %[[C0:.*]] = arith.constant 0 : i32
@@ -42,15 +40,12 @@ subroutine lastprivate_iv_inc()
 
 !CHECK:      %[[I2_MEM:.*]] = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFlastprivate_iv_decEi"}
 !CHECK:      %[[I2:.*]]:2 = hlfir.declare %[[I2_MEM]] {uniq_name = "_QFlastprivate_iv_decEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-
-!CHECK:      %[[I_MEM:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-!CHECK:      %[[I:.*]]:2 = hlfir.declare %[[I_MEM]] {uniq_name = "_QFlastprivate_iv_decEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-
 !CHECK:      %[[LB:.*]] = arith.constant 10 : i32
 !CHECK:      %[[UB:.*]] = arith.constant 1 : i32
 !CHECK:      %[[STEP:.*]]  = arith.constant -3 : i32
-!CHECK:      omp.wsloop {
+!CHECK:      omp.wsloop private(@{{.*}} %{{.*}} -> %[[I_MEM:.*]] : !fir.ref<i32>) {
 !CHECK-NEXT:   omp.loop_nest (%[[IV:.*]]) : i32 = (%[[LB]]) to (%[[UB]]) inclusive step (%[[STEP]]) {
+!CHECK:          %[[I:.*]]:2 = hlfir.declare %[[I_MEM]] {uniq_name = "_QFlastprivate_iv_decEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 !CHECK:          fir.store %[[IV]] to %[[I]]#1 : !fir.ref<i32>
 !CHECK:          %[[V:.*]] = arith.addi %[[IV]], %[[STEP]] : i32
 !CHECK:          %[[C0:.*]] = arith.constant 0 : i32
@@ -80,7 +75,7 @@ subroutine lastprivate_iv_dec()
 subroutine lastprivate_iv_i1
   integer*1 :: i1
   i1=0
-!CHECK:    omp.wsloop
+!CHECK:    omp.wsloop private({{.*}})
 !CHECK:      omp.loop_nest
 !CHECK:        fir.if %{{.*}} {
 !CHECK:          %[[I8_VAL:.*]] = fir.convert %{{.*}} : (i32) -> i8
diff --git a/flang/test/Lower/OpenMP/location.f90 b/flang/test/Lower/OpenMP/location.f90
index 2dab22a1c1f90d..fc7dd434998638 100644
--- a/flang/test/Lower/OpenMP/location.f90
+++ b/flang/test/Lower/OpenMP/location.f90
@@ -28,7 +28,7 @@ subroutine sub_target()
 
 !CHECK-LABEL: sub_loop
 subroutine sub_loop()
-!CHECK: omp.wsloop {
+!CHECK: omp.wsloop private({{.*}}) {
 !CHECK-NEXT: omp.loop_nest {{.*}} {
   !$omp do
   do i=1,10
diff --git a/flang/test/Lower/OpenMP/order-clause.f90 b/flang/test/Lower/OpenMP/order-clause.f90
index 717d9740c56f80..75b3a27bbd7fe1 100644
--- a/flang/test/Lower/OpenMP/order-clause.f90
+++ b/flang/test/Lower/OpenMP/order-clause.f90
@@ -20,15 +20,15 @@ end subroutine simd_order
 
 !CHECK-LABEL:   func.func @_QPdo_order() {
 subroutine do_order
-   !CHECK: omp.wsloop order(reproducible:concurrent) {
+   !CHECK: omp.wsloop order(reproducible:concurrent) private({{.*}}) {
    !$omp do order(concurrent)
    do i = 1, 10
    end do
-   !CHECK: omp.wsloop order(reproducible:concurrent) {
+   !CHECK: omp.wsloop order(reproducible:concurrent) private({{.*}}) {
    !$omp do order(reproducible:concurrent)
    do i = 1, 10
    end do
-   !CHECK: omp.wsloop order(unconstrained:concurrent) {
+   !CHECK: omp.wsloop order(unconstrained:concurrent) private({{.*}}) {
    !$omp do order(unconstrained:concurrent)
    do i = 1, 10
    end do
diff --git a/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 b/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90
index 86309a24f91a03..531413c124f81b 100644
--- a/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90
+++ b/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90
@@ -10,12 +10,12 @@
 !CHECK-DAG: %[[ARG1_DECL:.*]]:2 = hlfir.declare %[[ARG1_REF]] typeparams %[[FIVE]] dummy_scope %{{[0-9]+}} {uniq_name = "_QFlastprivate_characterEarg1"} : (!fir.ref<!fir.char<1,5>>, index, !fir.dscope) -> (!fir.ref<!fir.char<1,5>>, !fir.ref<!fir.char<1,5>>)
 
 !CHECK: omp.parallel {
-!CHECK-DAG: %[[ARG1_PVT:.*]] = fir.alloca !fir.char<1,5> {bindc_name = "arg1", pin...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jan 10, 2025

@llvm/pr-subscribers-mlir-llvm

Author: Kareem Ergawy (ergawy)

Changes

This enable delayed privatization by default for omp.wsloop ops, with one caveat! I had to workaround the "impure" alloc region issue that being resolved at the moment. The workaround detects whether the alloc region's argument is used in the region and at the same time defined in block that does not dominate the chosen alloca insertion point. If so, we move the alloca insertion point below the defining instruction of the alloc region argument. This basically reverts to the non-delayed-privatizaiton behavior.


Patch is 251.21 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/122471.diff

73 Files Affected:

  • (modified) flang/lib/Lower/OpenMP/OpenMP.cpp (+1-1)
  • (modified) flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90 (+2-2)
  • (modified) flang/test/Lower/OpenMP/Todo/omp-default-clause-inner-loop.f90 (+1-2)
  • (modified) flang/test/Lower/OpenMP/associate.f90 (+2-2)
  • (modified) flang/test/Lower/OpenMP/copyin.f90 (+4-8)
  • (modified) flang/test/Lower/OpenMP/critical.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/default-clause-byref.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/default-clause.f90 (+6-10)
  • (modified) flang/test/Lower/OpenMP/hlfir-wsloop.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/lastprivate-allocatable.f90 (+4-5)
  • (modified) flang/test/Lower/OpenMP/lastprivate-commonblock.f90 (+2-4)
  • (modified) flang/test/Lower/OpenMP/lastprivate-iv.f90 (+5-10)
  • (modified) flang/test/Lower/OpenMP/location.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/order-clause.f90 (+3-3)
  • (modified) flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 (+17-30)
  • (modified) flang/test/Lower/OpenMP/parallel-private-clause-fixes.f90 (+5-7)
  • (modified) flang/test/Lower/OpenMP/parallel-private-clause.f90 (+10-33)
  • (modified) flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/parallel-reduction3.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/parallel-wsloop-firstpriv.f90 (+9-27)
  • (modified) flang/test/Lower/OpenMP/parallel-wsloop-lastpriv.f90 (+26-44)
  • (modified) flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/parallel-wsloop.f90 (+21-45)
  • (modified) flang/test/Lower/OpenMP/private-derived-type.f90 (+12-10)
  • (modified) flang/test/Lower/OpenMP/stop-stmt-in-region.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/target.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/unstructured.f90 (+10-14)
  • (modified) flang/test/Lower/OpenMP/wsloop-chunks.f90 (+3-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-collapse.f90 (+6-10)
  • (modified) flang/test/Lower/OpenMP/wsloop-monotonic.f90 (+1-2)
  • (modified) flang/test/Lower/OpenMP/wsloop-nonmonotonic.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-ordered.f90 (+2-2)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-add-byref.f90 (+14-21)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-add.f90 (+14-21)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 (+4-6)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-array.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-iand-byref.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-iand.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-ieor-byref.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-ieor.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-ior-byref.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-ior.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-and-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-and.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-eqv-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-eqv.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-neqv-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-neqv.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-or-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-or.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-max-2-byref.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-max-2.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-max-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-max.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-min-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-min.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-min2.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-mul-byref.f90 (+14-21)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-mul.f90 (+14-21)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-multi.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-schedule.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-unstructured.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-variable.f90 (+7-10)
  • (modified) flang/test/Lower/OpenMP/wsloop.f90 (+6-9)
  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+56)
  • (added) mlir/test/Target/LLVMIR/openmp-wsloop-private-late-alloca-workaround.mlir (+50)
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index cd4b25a17722c1..2ce753039880c1 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -2079,7 +2079,7 @@ static void genStandaloneDo(lower::AbstractConverter &converter,
 
   DataSharingProcessor dsp(converter, semaCtx, item->clauses, eval,
                            /*shouldCollectPreDeterminedSymbols=*/true,
-                           enableDelayedPrivatizationStaging, symTable);
+                           enableDelayedPrivatization, symTable);
   dsp.processStep1(&wsloopClauseOps);
 
   mlir::omp::LoopNestOperands loopNestClauseOps;
diff --git a/flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90 b/flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90
index 66fd120085c782..c98850b8000d36 100644
--- a/flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90
+++ b/flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90
@@ -1,6 +1,6 @@
-! RUN: %flang_fc1 -emit-hlfir -fopenmp -mmlir --openmp-enable-delayed-privatization-staging \
+! RUN: %flang_fc1 -emit-hlfir -fopenmp \
 ! RUN:   -o - %s 2>&1 | FileCheck %s
-! RUN: bbc -emit-hlfir -fopenmp --openmp-enable-delayed-privatization-staging -o - %s 2>&1 \
+! RUN: bbc -emit-hlfir -fopenmp  -o - %s 2>&1 \
 ! RUN:   | FileCheck %s
 
 subroutine wsloop_private
diff --git a/flang/test/Lower/OpenMP/Todo/omp-default-clause-inner-loop.f90 b/flang/test/Lower/OpenMP/Todo/omp-default-clause-inner-loop.f90
index 77a1304f39a488..10879c53dc0c58 100644
--- a/flang/test/Lower/OpenMP/Todo/omp-default-clause-inner-loop.f90
+++ b/flang/test/Lower/OpenMP/Todo/omp-default-clause-inner-loop.f90
@@ -9,11 +9,10 @@
 ! The string "EXPECTED" denotes the expected FIR
 
 ! CHECK: omp.parallel  private(@{{.*}} %{{.*}} -> %[[PRIVATE_Y:.*]], @{{.*}} %{{.*}} -> %[[PRIVATE_Y:.*]] : !fir.ref<i32>, !fir.ref<i32>) {
-! CHECK: %[[TEMP:.*]] = fir.alloca i32 {bindc_name = "x", pinned, {{.*}}}
 ! CHECK: %[[const_1:.*]] = arith.constant 1 : i32
 ! CHECK: %[[const_2:.*]] = arith.constant 10 : i32
 ! CHECK: %[[const_3:.*]] = arith.constant 1 : i32
-! CHECK: omp.wsloop {
+! CHECK: omp.wsloop private(@{{.*}} %{{.*}} -> %[[TEMP:.*]] : !fir.ref<i32>) {
 ! CHECK-NEXT: omp.loop_nest (%[[ARG:.*]]) : i32 = (%[[const_1]]) to (%[[const_2]]) inclusive step (%[[const_3]]) {
 ! CHECK: fir.store %[[ARG]] to %[[TEMP]] : !fir.ref<i32>
 ! EXPECTED: %[[temp_1:.*]] = fir.load %[[PRIVATE_Z]] : !fir.ref<i32>
diff --git a/flang/test/Lower/OpenMP/associate.f90 b/flang/test/Lower/OpenMP/associate.f90
index 4964890a6842c1..d497b4ade782e0 100644
--- a/flang/test/Lower/OpenMP/associate.f90
+++ b/flang/test/Lower/OpenMP/associate.f90
@@ -6,12 +6,12 @@
 !CHECK:         omp.parallel {
 !CHECK-NOT:       hlfir.declare {{.*}} {uniq_name = "_QFtest_parallel_assocEa"}
 !CHECK-NOT:       hlfir.declare {{.*}} {uniq_name = "_QFtest_parallel_assocEb"}
-!CHECK:           omp.wsloop {
+!CHECK:           omp.wsloop private({{.*}}) {
 !CHECK:           }
 !CHECK:         }
 !CHECK:         omp.parallel {{.*}} {
 !CHECK-NOT:       hlfir.declare {{.*}} {uniq_name = "_QFtest_parallel_assocEb"}
-!CHECK:           omp.wsloop {
+!CHECK:           omp.wsloop private({{.*}}) {
 !CHECK:           }
 !CHECK:         }
 subroutine test_parallel_assoc()
diff --git a/flang/test/Lower/OpenMP/copyin.f90 b/flang/test/Lower/OpenMP/copyin.f90
index f3d147c10668f4..af41442e6870f6 100644
--- a/flang/test/Lower/OpenMP/copyin.f90
+++ b/flang/test/Lower/OpenMP/copyin.f90
@@ -154,14 +154,13 @@ subroutine copyin_derived_type()
 
 ! CHECK:             omp.barrier
 
-! CHECK:             %[[VAL_6:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-! CHECK:             %[[VAL_7:.*]]:2 = hlfir.declare %[[VAL_6]] {uniq_name = "_QFcombined_parallel_worksharing_loopEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 
 ! CHECK:             %[[VAL_11:.*]] = arith.constant 1 : i32
 ! CHECK:             %[[VAL_12:.*]] = fir.load %[[VAL_9]]#0 : !fir.ref<i32>
 ! CHECK:             %[[VAL_13:.*]] = arith.constant 1 : i32
-! CHECK:             omp.wsloop {
+! CHECK:             omp.wsloop private(@{{.*}} %{{.*}} -> %[[VAL_6:.*]] : !fir.ref<i32>) {
 ! CHECK-NEXT:          omp.loop_nest (%[[VAL_14:.*]]) : i32 = (%[[VAL_11]]) to (%[[VAL_12]]) inclusive step (%[[VAL_13]]) {
+! CHECK:                 %[[VAL_7:.*]]:2 = hlfir.declare %[[VAL_6]] {uniq_name = "_QFcombined_parallel_worksharing_loopEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 ! CHECK:                 fir.store %[[VAL_14]] to %[[VAL_7]]#1 : !fir.ref<i32>
 ! CHECK:                 fir.call @_QPsub4(%[[VAL_9]]#1) fastmath<contract> : (!fir.ref<i32>) -> ()
 ! CHECK:                 omp.yield
@@ -321,15 +320,12 @@ subroutine common_1()
 ! CHECK:             %[[VAL_33:.*]] = fir.load %[[VAL_18]]#0 : !fir.ref<i32>
 ! CHECK:             hlfir.assign %[[VAL_33]] to %[[VAL_31]]#0 : i32, !fir.ref<i32>
 ! CHECK:             omp.barrier
-
-! CHECK:             %[[VAL_19:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-! CHECK:             %[[VAL_20:.*]]:2 = hlfir.declare %[[VAL_19]] {uniq_name = "_QFcommon_2Ei"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-
 ! CHECK:             %[[VAL_34:.*]] = arith.constant 1 : i32
 ! CHECK:             %[[VAL_35:.*]] = fir.load %[[VAL_26]]#0 : !fir.ref<i32>
 ! CHECK:             %[[VAL_36:.*]] = arith.constant 1 : i32
-! CHECK:             omp.wsloop {
+! CHECK:             omp.wsloop private(@{{.*}} %{{.*}} -> %[[VAL_19:.*]] : !fir.ref<i32>) {
 ! CHECK-NEXT:          omp.loop_nest (%[[VAL_37:.*]]) : i32 = (%[[VAL_34]]) to (%[[VAL_35]]) inclusive step (%[[VAL_36]]) {
+! CHECK:             %[[VAL_20:.*]]:2 = hlfir.declare %[[VAL_19]] {uniq_name = "_QFcommon_2Ei"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 ! CHECK:                 fir.store %[[VAL_37]] to %[[VAL_20]]#1 : !fir.ref<i32>
 ! CHECK:                 %[[VAL_38:.*]] = fir.load %[[VAL_31]]#0 : !fir.ref<i32>
 ! CHECK:                 %[[VAL_39:.*]] = fir.load %[[VAL_20]]#0 : !fir.ref<i32>
diff --git a/flang/test/Lower/OpenMP/critical.f90 b/flang/test/Lower/OpenMP/critical.f90
index 051d3782106469..99a4426ab04533 100644
--- a/flang/test/Lower/OpenMP/critical.f90
+++ b/flang/test/Lower/OpenMP/critical.f90
@@ -38,11 +38,10 @@ subroutine predetermined_privatization()
   !CHECK: omp.parallel
   !$omp parallel do
 
-  !CHECK: %[[PRIV_I_ALLOC:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-  !CHECK: %[[PRIV_I_DECL:.*]]:2 = hlfir.declare %[[PRIV_I_ALLOC]]
   do i = 2, 10
-    !CHECK: omp.wsloop
+    !CHECK: omp.wsloop private(@{{.*}} %{{.*}} -> %[[PRIV_I_ALLOC:.*]] : !fir.ref<i32>)
     !CHECK: omp.loop_nest (%[[IV:[^[:space:]]+]])
+    !CHECK: %[[PRIV_I_DECL:.*]]:2 = hlfir.declare %[[PRIV_I_ALLOC]]
     !CHECK: fir.store %[[IV]] to %[[PRIV_I_DECL]]#1
     !CHECK: omp.critical
     !$omp critical
diff --git a/flang/test/Lower/OpenMP/default-clause-byref.f90 b/flang/test/Lower/OpenMP/default-clause-byref.f90
index 654c13ada9e39f..10e62005f42ba0 100644
--- a/flang/test/Lower/OpenMP/default-clause-byref.f90
+++ b/flang/test/Lower/OpenMP/default-clause-byref.f90
@@ -346,7 +346,7 @@ subroutine skipped_default_clause_checks()
        type(it)::iii
 
 !CHECK: omp.parallel {{.*}} {
-!CHECK: omp.wsloop reduction(byref @min_byref_i32 %[[VAL_Z_DECLARE]]#0 -> %[[PRV:.+]] : !fir.ref<i32>) {
+!CHECK: omp.wsloop private({{.*}}) reduction(byref @min_byref_i32 %[[VAL_Z_DECLARE]]#0 -> %[[PRV:.+]] : !fir.ref<i32>) {
 !CHECK-NEXT: omp.loop_nest (%[[ARG:.*]]) {{.*}} {
 !CHECK: omp.yield
 !CHECK: }
diff --git a/flang/test/Lower/OpenMP/default-clause.f90 b/flang/test/Lower/OpenMP/default-clause.f90
index c004813a911f73..fcc8d033eea0fa 100644
--- a/flang/test/Lower/OpenMP/default-clause.f90
+++ b/flang/test/Lower/OpenMP/default-clause.f90
@@ -284,16 +284,13 @@ subroutine nested_default_clause_test4
 !CHECK-LABEL: func @_QPnested_default_clause_test5
 !CHECK: omp.parallel {
 
-!CHECK: %[[X_ALLOCA:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFnested_default_clause_test5Ex"}
-!CHECK: %[[X_DECLARE:.*]]:2 = hlfir.declare %[[X_ALLOCA]] {{.*}}
-
-!CHECK: %[[LOOP_VAR_ALLOCA:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-!CHECK: %[[LOOP_VAR_DECLARE:.*]]:2 = hlfir.declare %[[LOOP_VAR_ALLOCA]] {{.*}}
-
 !CHECK: %[[CONST_LB:.*]] = arith.constant 1 : i32
 !CHECK: %[[CONST_UB:.*]] = arith.constant 50 : i32
 !CHECK: %[[CONST_STEP:.*]] = arith.constant 1 : i32
+! CHECK: omp.wsloop private(@{{.*}} %{{.*}} -> %[[X_ALLOCA:.*]], @{{.*}} %{{.*}} -> %[[LOOP_VAR_ALLOCA:.*]] : !fir.ref<i32>, !fir.ref<i32>) {
 !CHECK: omp.loop_nest (%[[ARG:.*]]) : i32 = (%[[CONST_LB]]) to (%[[CONST_UB]]) inclusive step (%[[CONST_STEP]]) {
+!CHECK: %[[X_DECLARE:.*]]:2 = hlfir.declare %[[X_ALLOCA]] {{.*}}
+!CHECK: %[[LOOP_VAR_DECLARE:.*]]:2 = hlfir.declare %[[LOOP_VAR_ALLOCA]] {{.*}}
 !CHECK: fir.store %[[ARG]] to %[[LOOP_VAR_DECLARE]]#1 : !fir.ref<i32>
 !CHECK: %[[LOADED_X:.*]] = fir.load %[[X_DECLARE]]#0 : !fir.ref<i32>
 !CHECK: %[[CONST:.*]] = arith.constant 1 : i32
@@ -321,13 +318,12 @@ subroutine nested_default_clause_test5
 
 !CHECK: %[[Z_VAR_DECLARE:.*]]:2 = hlfir.declare %[[Z_VAR]] {{.*}}
 
-!CHECK: %[[LOOP_VAR:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-!CHECK: %[[LOOP_VAR_DECLARE:.*]]:2 = hlfir.declare %[[LOOP_VAR]] {{.*}}
-
 !CHECK: %[[CONST_LB:.*]] = arith.constant 1 : i32
 !CHECK: %[[CONST_UB:.*]] = arith.constant 10 : i32
 !CHECK: %[[CONST_STEP:.*]] = arith.constant 1 : i32
+! CHECK: omp.wsloop private(@{{.*}} %{{.*}} -> %[[LOOP_VAR:.*]] : !fir.ref<i32>) {
 !CHECK: omp.loop_nest (%[[ARG:.*]]) : i32 = (%[[CONST_LB]]) to (%[[CONST_UB]]) inclusive step (%[[CONST_STEP]]) {
+!CHECK: %[[LOOP_VAR_DECLARE:.*]]:2 = hlfir.declare %[[LOOP_VAR]] {{.*}}
 !CHECK: fir.store %[[ARG]] to %[[LOOP_VAR_DECLARE]]#1 : !fir.ref<i32>
 !CHECK: %[[LOADED_X:.*]] = fir.load %[[X_VAR_DECLARE]]#0 : !fir.ref<i32>
 !CHECK: %[[CONST:.*]] = arith.constant 1 : i32
@@ -386,7 +382,7 @@ subroutine skipped_default_clause_checks()
        type(it)::iii
 
 !CHECK: omp.parallel {{.*}} {
-!CHECK: omp.wsloop reduction(@min_i32 %[[VAL_Z_DECLARE]]#0 -> %[[PRV:.+]] : !fir.ref<i32>) {
+!CHECK: omp.wsloop private({{.*}}) reduction(@min_i32 %[[VAL_Z_DECLARE]]#0 -> %[[PRV:.+]] : !fir.ref<i32>) {
 !CHECK-NEXT: omp.loop_nest (%[[ARG:.*]]) {{.*}} {
 !CHECK: omp.yield
 !CHECK: }
diff --git a/flang/test/Lower/OpenMP/hlfir-wsloop.f90 b/flang/test/Lower/OpenMP/hlfir-wsloop.f90
index f7b0ba681efebf..786ab916d000c3 100644
--- a/flang/test/Lower/OpenMP/hlfir-wsloop.f90
+++ b/flang/test/Lower/OpenMP/hlfir-wsloop.f90
@@ -10,12 +10,11 @@ subroutine simple_loop
   ! CHECK-DAG:     %[[WS_END:.*]] = arith.constant 9 : i32
   ! CHECK:  omp.parallel
   !$OMP PARALLEL
-  ! CHECK-DAG:     %[[ALLOCA_IV:.*]] = fir.alloca i32 {{{.*}}, pinned, {{.*}}}
-  ! CHECK:         %[[IV:.*]]    = fir.declare %[[ALLOCA_IV]] {uniq_name = "_QFsimple_loopEi"} : (!fir.ref<i32>) -> !fir.ref<i32>
-  ! CHECK:         omp.wsloop {
+  ! CHECK:         omp.wsloop private(@{{.*}} %{{.*}} -> %[[ALLOCA_IV:.*]] : !fir.ref<i32>) {
   ! CHECK-NEXT:      omp.loop_nest (%[[I:.*]]) : i32 = (%[[WS_ST]]) to (%[[WS_END]]) inclusive step (%[[WS_ST]]) {
   !$OMP DO
   do i=1, 9
+  ! CHECK:         %[[IV:.*]]    = fir.declare %[[ALLOCA_IV]] {uniq_name = "_QFsimple_loopEi"} : (!fir.ref<i32>) -> !fir.ref<i32>
   ! CHECK:             fir.store %[[I]] to %[[IV:.*]] : !fir.ref<i32>
   ! CHECK:             %[[LOAD_IV:.*]] = fir.load %[[IV]] : !fir.ref<i32>
   ! CHECK:             fir.call @_FortranAioOutputInteger32({{.*}}, %[[LOAD_IV]]) {{.*}}: (!fir.ref<i8>, i32) -> i1
diff --git a/flang/test/Lower/OpenMP/lastprivate-allocatable.f90 b/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
index 6b7d849fde93ca..fd8338393dd880 100644
--- a/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
+++ b/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
@@ -8,12 +8,11 @@
 ! CHECK:           fir.store %[[VAL_2]] to %[[VAL_0]] : !fir.ref<!fir.box<!fir.heap<i32>>>
 ! CHECK:           %[[VAL_3:.*]]:2 = hlfir.declare %[[VAL_0]] {fortran_attrs = {{.*}}<allocatable>, uniq_name = "_QFEa"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
 ! CHECK:           omp.parallel {
-!                    create original copy of private variable
-! CHECK:             %[[VAL_16:.*]]:2 = hlfir.declare %{{.*}} {fortran_attrs = {{.*}}<allocatable>, uniq_name = "_QFEa"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
-! CHECK:             %[[VAL_17:.*]] = fir.alloca i32 {bindc_name = "i", pinned, uniq_name = "_QFEi"}
-! CHECK:             %[[VAL_18:.*]]:2 = hlfir.declare %[[VAL_17]] {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-! CHECK:             omp.wsloop {
+! CHECK:             omp.wsloop private(@{{.*}} %{{.*}} -> %{{.*}}, @{{.*}} %{{.*}} -> %[[VAL_17:.*]] : !fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<i32>) {
 ! CHECK:               omp.loop_nest
+! CHECK:                   %[[VAL_16:.*]]:2 = hlfir.declare %{{.*}} {fortran_attrs = {{.*}}<allocatable>, uniq_name = "_QFEa"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
+! CHECK:                   %[[VAL_18:.*]]:2 = hlfir.declare %[[VAL_17]] {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+
 !                        [...]
 !                        if this is the last iteration
 ! CHECK:                 fir.if %{{.*}} {
diff --git a/flang/test/Lower/OpenMP/lastprivate-commonblock.f90 b/flang/test/Lower/OpenMP/lastprivate-commonblock.f90
index faa3d3e053f345..c059382bf634c5 100644
--- a/flang/test/Lower/OpenMP/lastprivate-commonblock.f90
+++ b/flang/test/Lower/OpenMP/lastprivate-commonblock.f90
@@ -11,12 +11,10 @@
 !CHECK:      %[[CB_C_Y_COOR:.*]] = fir.coordinate_of %[[CB_C_REF_CVT]], %{{.*}} : (!fir.ref<!fir.array<?xi8>>, index) -> !fir.ref<i8>
 !CHECK:      %[[CB_C_Y_ADDR:.*]] = fir.convert %[[CB_C_Y_COOR]] : (!fir.ref<i8>) -> !fir.ref<f32>
 !CHECK:      %[[Y_DECL:.*]]:2 = hlfir.declare %[[CB_C_Y_ADDR]] {uniq_name = "_QFlastprivate_commonEy"} : (!fir.ref<f32>) -> (!fir.ref<f32>, !fir.ref<f32>)
-!CHECK:      %[[PRIVATE_X_REF:.*]] = fir.alloca f32 {bindc_name = "x", pinned, uniq_name = "_QFlastprivate_commonEx"}
+!CHECK:      omp.wsloop private(@{{.*}} %{{.*}} -> %[[PRIVATE_X_REF:.*]], @{{.*}} %{{.*}} -> %[[PRIVATE_Y_REF:.*]], @{{.*}} %{{.*}} -> %{{.*}} : !{{.*}}, !{{.*}}, !{{.*}}) {
+!CHECK-NEXT:   omp.loop_nest (%[[I:.*]]) : i32 = (%{{.*}}) to (%{{.*}}) inclusive step (%{{.*}}) {
 !CHECK:      %[[PRIVATE_X_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_X_REF]] {uniq_name = "_QFlastprivate_commonEx"} : (!fir.ref<f32>) -> (!fir.ref<f32>, !fir.ref<f32>)
-!CHECK:      %[[PRIVATE_Y_REF:.*]] = fir.alloca f32 {bindc_name = "y", pinned, uniq_name = "_QFlastprivate_commonEy"}
 !CHECK:      %[[PRIVATE_Y_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_Y_REF]] {uniq_name = "_QFlastprivate_commonEy"} : (!fir.ref<f32>) -> (!fir.ref<f32>, !fir.ref<f32>)
-!CHECK:      omp.wsloop {
-!CHECK-NEXT:   omp.loop_nest (%[[I:.*]]) : i32 = (%{{.*}}) to (%{{.*}}) inclusive step (%{{.*}}) {
 !CHECK:          %[[V:.*]] = arith.addi %[[I]], %{{.*}} : i32
 !CHECK:          %[[C0:.*]] = arith.constant 0 : i32
 !CHECK:          %[[NEG_STEP:.*]] = arith.cmpi slt, %{{.*}}, %[[C0]] : i32
diff --git a/flang/test/Lower/OpenMP/lastprivate-iv.f90 b/flang/test/Lower/OpenMP/lastprivate-iv.f90
index 63a81e818bc8ba..aacefd8b59c0f2 100644
--- a/flang/test/Lower/OpenMP/lastprivate-iv.f90
+++ b/flang/test/Lower/OpenMP/lastprivate-iv.f90
@@ -6,14 +6,12 @@
 !CHECK:      %[[I2_MEM:.*]] = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFlastprivate_iv_incEi"}
 !CHECK:      %[[I2:.*]]:2 = hlfir.declare %[[I2_MEM]] {uniq_name = "_QFlastprivate_iv_incEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 
-!CHECK:      %[[I_MEM:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-!CHECK:      %[[I:.*]]:2 = hlfir.declare %[[I_MEM]] {uniq_name = "_QFlastprivate_iv_incEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-
 !CHECK:      %[[LB:.*]] = arith.constant 4 : i32
 !CHECK:      %[[UB:.*]] = arith.constant 10 : i32
 !CHECK:      %[[STEP:.*]]  = arith.constant 3 : i32
-!CHECK:      omp.wsloop {
+!CHECK:      omp.wsloop private(@{{.*}} %{{.*}} -> %[[I_MEM:.*]] : !fir.ref<i32>) {
 !CHECK-NEXT:   omp.loop_nest (%[[IV:.*]]) : i32 = (%[[LB]]) to (%[[UB]]) inclusive step (%[[STEP]]) {
+!CHECK:          %[[I:.*]]:2 = hlfir.declare %[[I_MEM]] {uniq_name = "_QFlastprivate_iv_incEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 !CHECK:          fir.store %[[IV]] to %[[I]]#1 : !fir.ref<i32>
 !CHECK:          %[[V:.*]] = arith.addi %[[IV]], %[[STEP]] : i32
 !CHECK:          %[[C0:.*]] = arith.constant 0 : i32
@@ -42,15 +40,12 @@ subroutine lastprivate_iv_inc()
 
 !CHECK:      %[[I2_MEM:.*]] = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFlastprivate_iv_decEi"}
 !CHECK:      %[[I2:.*]]:2 = hlfir.declare %[[I2_MEM]] {uniq_name = "_QFlastprivate_iv_decEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-
-!CHECK:      %[[I_MEM:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-!CHECK:      %[[I:.*]]:2 = hlfir.declare %[[I_MEM]] {uniq_name = "_QFlastprivate_iv_decEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-
 !CHECK:      %[[LB:.*]] = arith.constant 10 : i32
 !CHECK:      %[[UB:.*]] = arith.constant 1 : i32
 !CHECK:      %[[STEP:.*]]  = arith.constant -3 : i32
-!CHECK:      omp.wsloop {
+!CHECK:      omp.wsloop private(@{{.*}} %{{.*}} -> %[[I_MEM:.*]] : !fir.ref<i32>) {
 !CHECK-NEXT:   omp.loop_nest (%[[IV:.*]]) : i32 = (%[[LB]]) to (%[[UB]]) inclusive step (%[[STEP]]) {
+!CHECK:          %[[I:.*]]:2 = hlfir.declare %[[I_MEM]] {uniq_name = "_QFlastprivate_iv_decEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 !CHECK:          fir.store %[[IV]] to %[[I]]#1 : !fir.ref<i32>
 !CHECK:          %[[V:.*]] = arith.addi %[[IV]], %[[STEP]] : i32
 !CHECK:          %[[C0:.*]] = arith.constant 0 : i32
@@ -80,7 +75,7 @@ subroutine lastprivate_iv_dec()
 subroutine lastprivate_iv_i1
   integer*1 :: i1
   i1=0
-!CHECK:    omp.wsloop
+!CHECK:    omp.wsloop private({{.*}})
 !CHECK:      omp.loop_nest
 !CHECK:        fir.if %{{.*}} {
 !CHECK:          %[[I8_VAL:.*]] = fir.convert %{{.*}} : (i32) -> i8
diff --git a/flang/test/Lower/OpenMP/location.f90 b/flang/test/Lower/OpenMP/location.f90
index 2dab22a1c1f90d..fc7dd434998638 100644
--- a/flang/test/Lower/OpenMP/location.f90
+++ b/flang/test/Lower/OpenMP/location.f90
@@ -28,7 +28,7 @@ subroutine sub_target()
 
 !CHECK-LABEL: sub_loop
 subroutine sub_loop()
-!CHECK: omp.wsloop {
+!CHECK: omp.wsloop private({{.*}}) {
 !CHECK-NEXT: omp.loop_nest {{.*}} {
   !$omp do
   do i=1,10
diff --git a/flang/test/Lower/OpenMP/order-clause.f90 b/flang/test/Lower/OpenMP/order-clause.f90
index 717d9740c56f80..75b3a27bbd7fe1 100644
--- a/flang/test/Lower/OpenMP/order-clause.f90
+++ b/flang/test/Lower/OpenMP/order-clause.f90
@@ -20,15 +20,15 @@ end subroutine simd_order
 
 !CHECK-LABEL:   func.func @_QPdo_order() {
 subroutine do_order
-   !CHECK: omp.wsloop order(reproducible:concurrent) {
+   !CHECK: omp.wsloop order(reproducible:concurrent) private({{.*}}) {
    !$omp do order(concurrent)
    do i = 1, 10
    end do
-   !CHECK: omp.wsloop order(reproducible:concurrent) {
+   !CHECK: omp.wsloop order(reproducible:concurrent) private({{.*}}) {
    !$omp do order(reproducible:concurrent)
    do i = 1, 10
    end do
-   !CHECK: omp.wsloop order(unconstrained:concurrent) {
+   !CHECK: omp.wsloop order(unconstrained:concurrent) private({{.*}}) {
    !$omp do order(unconstrained:concurrent)
    do i = 1, 10
    end do
diff --git a/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 b/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90
index 86309a24f91a03..531413c124f81b 100644
--- a/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90
+++ b/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90
@@ -10,12 +10,12 @@
 !CHECK-DAG: %[[ARG1_DECL:.*]]:2 = hlfir.declare %[[ARG1_REF]] typeparams %[[FIVE]] dummy_scope %{{[0-9]+}} {uniq_name = "_QFlastprivate_characterEarg1"} : (!fir.ref<!fir.char<1,5>>, index, !fir.dscope) -> (!fir.ref<!fir.char<1,5>>, !fir.ref<!fir.char<1,5>>)
 
 !CHECK: omp.parallel {
-!CHECK-DAG: %[[ARG1_PVT:.*]] = fir.alloca !fir.char<1,5> {bindc_name = "arg1", pin...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jan 10, 2025

@llvm/pr-subscribers-flang-openmp

Author: Kareem Ergawy (ergawy)

Changes

This enable delayed privatization by default for omp.wsloop ops, with one caveat! I had to workaround the "impure" alloc region issue that being resolved at the moment. The workaround detects whether the alloc region's argument is used in the region and at the same time defined in block that does not dominate the chosen alloca insertion point. If so, we move the alloca insertion point below the defining instruction of the alloc region argument. This basically reverts to the non-delayed-privatizaiton behavior.


Patch is 251.21 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/122471.diff

73 Files Affected:

  • (modified) flang/lib/Lower/OpenMP/OpenMP.cpp (+1-1)
  • (modified) flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90 (+2-2)
  • (modified) flang/test/Lower/OpenMP/Todo/omp-default-clause-inner-loop.f90 (+1-2)
  • (modified) flang/test/Lower/OpenMP/associate.f90 (+2-2)
  • (modified) flang/test/Lower/OpenMP/copyin.f90 (+4-8)
  • (modified) flang/test/Lower/OpenMP/critical.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/default-clause-byref.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/default-clause.f90 (+6-10)
  • (modified) flang/test/Lower/OpenMP/hlfir-wsloop.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/lastprivate-allocatable.f90 (+4-5)
  • (modified) flang/test/Lower/OpenMP/lastprivate-commonblock.f90 (+2-4)
  • (modified) flang/test/Lower/OpenMP/lastprivate-iv.f90 (+5-10)
  • (modified) flang/test/Lower/OpenMP/location.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/order-clause.f90 (+3-3)
  • (modified) flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 (+17-30)
  • (modified) flang/test/Lower/OpenMP/parallel-private-clause-fixes.f90 (+5-7)
  • (modified) flang/test/Lower/OpenMP/parallel-private-clause.f90 (+10-33)
  • (modified) flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/parallel-reduction3.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/parallel-wsloop-firstpriv.f90 (+9-27)
  • (modified) flang/test/Lower/OpenMP/parallel-wsloop-lastpriv.f90 (+26-44)
  • (modified) flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/parallel-wsloop.f90 (+21-45)
  • (modified) flang/test/Lower/OpenMP/private-derived-type.f90 (+12-10)
  • (modified) flang/test/Lower/OpenMP/stop-stmt-in-region.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/target.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/unstructured.f90 (+10-14)
  • (modified) flang/test/Lower/OpenMP/wsloop-chunks.f90 (+3-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-collapse.f90 (+6-10)
  • (modified) flang/test/Lower/OpenMP/wsloop-monotonic.f90 (+1-2)
  • (modified) flang/test/Lower/OpenMP/wsloop-nonmonotonic.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-ordered.f90 (+2-2)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-add-byref.f90 (+14-21)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-add.f90 (+14-21)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 (+4-6)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-array.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-iand-byref.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-iand.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-ieor-byref.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-ieor.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-ior-byref.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-ior.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-and-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-and.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-eqv-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-eqv.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-neqv-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-neqv.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-or-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-or.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-max-2-byref.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-max-2.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-max-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-max.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-min-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-min.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-min2.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-mul-byref.f90 (+14-21)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-mul.f90 (+14-21)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-multi.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-schedule.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-unstructured.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-variable.f90 (+7-10)
  • (modified) flang/test/Lower/OpenMP/wsloop.f90 (+6-9)
  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+56)
  • (added) mlir/test/Target/LLVMIR/openmp-wsloop-private-late-alloca-workaround.mlir (+50)
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index cd4b25a17722c1..2ce753039880c1 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -2079,7 +2079,7 @@ static void genStandaloneDo(lower::AbstractConverter &converter,
 
   DataSharingProcessor dsp(converter, semaCtx, item->clauses, eval,
                            /*shouldCollectPreDeterminedSymbols=*/true,
-                           enableDelayedPrivatizationStaging, symTable);
+                           enableDelayedPrivatization, symTable);
   dsp.processStep1(&wsloopClauseOps);
 
   mlir::omp::LoopNestOperands loopNestClauseOps;
diff --git a/flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90 b/flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90
index 66fd120085c782..c98850b8000d36 100644
--- a/flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90
+++ b/flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90
@@ -1,6 +1,6 @@
-! RUN: %flang_fc1 -emit-hlfir -fopenmp -mmlir --openmp-enable-delayed-privatization-staging \
+! RUN: %flang_fc1 -emit-hlfir -fopenmp \
 ! RUN:   -o - %s 2>&1 | FileCheck %s
-! RUN: bbc -emit-hlfir -fopenmp --openmp-enable-delayed-privatization-staging -o - %s 2>&1 \
+! RUN: bbc -emit-hlfir -fopenmp  -o - %s 2>&1 \
 ! RUN:   | FileCheck %s
 
 subroutine wsloop_private
diff --git a/flang/test/Lower/OpenMP/Todo/omp-default-clause-inner-loop.f90 b/flang/test/Lower/OpenMP/Todo/omp-default-clause-inner-loop.f90
index 77a1304f39a488..10879c53dc0c58 100644
--- a/flang/test/Lower/OpenMP/Todo/omp-default-clause-inner-loop.f90
+++ b/flang/test/Lower/OpenMP/Todo/omp-default-clause-inner-loop.f90
@@ -9,11 +9,10 @@
 ! The string "EXPECTED" denotes the expected FIR
 
 ! CHECK: omp.parallel  private(@{{.*}} %{{.*}} -> %[[PRIVATE_Y:.*]], @{{.*}} %{{.*}} -> %[[PRIVATE_Y:.*]] : !fir.ref<i32>, !fir.ref<i32>) {
-! CHECK: %[[TEMP:.*]] = fir.alloca i32 {bindc_name = "x", pinned, {{.*}}}
 ! CHECK: %[[const_1:.*]] = arith.constant 1 : i32
 ! CHECK: %[[const_2:.*]] = arith.constant 10 : i32
 ! CHECK: %[[const_3:.*]] = arith.constant 1 : i32
-! CHECK: omp.wsloop {
+! CHECK: omp.wsloop private(@{{.*}} %{{.*}} -> %[[TEMP:.*]] : !fir.ref<i32>) {
 ! CHECK-NEXT: omp.loop_nest (%[[ARG:.*]]) : i32 = (%[[const_1]]) to (%[[const_2]]) inclusive step (%[[const_3]]) {
 ! CHECK: fir.store %[[ARG]] to %[[TEMP]] : !fir.ref<i32>
 ! EXPECTED: %[[temp_1:.*]] = fir.load %[[PRIVATE_Z]] : !fir.ref<i32>
diff --git a/flang/test/Lower/OpenMP/associate.f90 b/flang/test/Lower/OpenMP/associate.f90
index 4964890a6842c1..d497b4ade782e0 100644
--- a/flang/test/Lower/OpenMP/associate.f90
+++ b/flang/test/Lower/OpenMP/associate.f90
@@ -6,12 +6,12 @@
 !CHECK:         omp.parallel {
 !CHECK-NOT:       hlfir.declare {{.*}} {uniq_name = "_QFtest_parallel_assocEa"}
 !CHECK-NOT:       hlfir.declare {{.*}} {uniq_name = "_QFtest_parallel_assocEb"}
-!CHECK:           omp.wsloop {
+!CHECK:           omp.wsloop private({{.*}}) {
 !CHECK:           }
 !CHECK:         }
 !CHECK:         omp.parallel {{.*}} {
 !CHECK-NOT:       hlfir.declare {{.*}} {uniq_name = "_QFtest_parallel_assocEb"}
-!CHECK:           omp.wsloop {
+!CHECK:           omp.wsloop private({{.*}}) {
 !CHECK:           }
 !CHECK:         }
 subroutine test_parallel_assoc()
diff --git a/flang/test/Lower/OpenMP/copyin.f90 b/flang/test/Lower/OpenMP/copyin.f90
index f3d147c10668f4..af41442e6870f6 100644
--- a/flang/test/Lower/OpenMP/copyin.f90
+++ b/flang/test/Lower/OpenMP/copyin.f90
@@ -154,14 +154,13 @@ subroutine copyin_derived_type()
 
 ! CHECK:             omp.barrier
 
-! CHECK:             %[[VAL_6:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-! CHECK:             %[[VAL_7:.*]]:2 = hlfir.declare %[[VAL_6]] {uniq_name = "_QFcombined_parallel_worksharing_loopEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 
 ! CHECK:             %[[VAL_11:.*]] = arith.constant 1 : i32
 ! CHECK:             %[[VAL_12:.*]] = fir.load %[[VAL_9]]#0 : !fir.ref<i32>
 ! CHECK:             %[[VAL_13:.*]] = arith.constant 1 : i32
-! CHECK:             omp.wsloop {
+! CHECK:             omp.wsloop private(@{{.*}} %{{.*}} -> %[[VAL_6:.*]] : !fir.ref<i32>) {
 ! CHECK-NEXT:          omp.loop_nest (%[[VAL_14:.*]]) : i32 = (%[[VAL_11]]) to (%[[VAL_12]]) inclusive step (%[[VAL_13]]) {
+! CHECK:                 %[[VAL_7:.*]]:2 = hlfir.declare %[[VAL_6]] {uniq_name = "_QFcombined_parallel_worksharing_loopEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 ! CHECK:                 fir.store %[[VAL_14]] to %[[VAL_7]]#1 : !fir.ref<i32>
 ! CHECK:                 fir.call @_QPsub4(%[[VAL_9]]#1) fastmath<contract> : (!fir.ref<i32>) -> ()
 ! CHECK:                 omp.yield
@@ -321,15 +320,12 @@ subroutine common_1()
 ! CHECK:             %[[VAL_33:.*]] = fir.load %[[VAL_18]]#0 : !fir.ref<i32>
 ! CHECK:             hlfir.assign %[[VAL_33]] to %[[VAL_31]]#0 : i32, !fir.ref<i32>
 ! CHECK:             omp.barrier
-
-! CHECK:             %[[VAL_19:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-! CHECK:             %[[VAL_20:.*]]:2 = hlfir.declare %[[VAL_19]] {uniq_name = "_QFcommon_2Ei"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-
 ! CHECK:             %[[VAL_34:.*]] = arith.constant 1 : i32
 ! CHECK:             %[[VAL_35:.*]] = fir.load %[[VAL_26]]#0 : !fir.ref<i32>
 ! CHECK:             %[[VAL_36:.*]] = arith.constant 1 : i32
-! CHECK:             omp.wsloop {
+! CHECK:             omp.wsloop private(@{{.*}} %{{.*}} -> %[[VAL_19:.*]] : !fir.ref<i32>) {
 ! CHECK-NEXT:          omp.loop_nest (%[[VAL_37:.*]]) : i32 = (%[[VAL_34]]) to (%[[VAL_35]]) inclusive step (%[[VAL_36]]) {
+! CHECK:             %[[VAL_20:.*]]:2 = hlfir.declare %[[VAL_19]] {uniq_name = "_QFcommon_2Ei"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 ! CHECK:                 fir.store %[[VAL_37]] to %[[VAL_20]]#1 : !fir.ref<i32>
 ! CHECK:                 %[[VAL_38:.*]] = fir.load %[[VAL_31]]#0 : !fir.ref<i32>
 ! CHECK:                 %[[VAL_39:.*]] = fir.load %[[VAL_20]]#0 : !fir.ref<i32>
diff --git a/flang/test/Lower/OpenMP/critical.f90 b/flang/test/Lower/OpenMP/critical.f90
index 051d3782106469..99a4426ab04533 100644
--- a/flang/test/Lower/OpenMP/critical.f90
+++ b/flang/test/Lower/OpenMP/critical.f90
@@ -38,11 +38,10 @@ subroutine predetermined_privatization()
   !CHECK: omp.parallel
   !$omp parallel do
 
-  !CHECK: %[[PRIV_I_ALLOC:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-  !CHECK: %[[PRIV_I_DECL:.*]]:2 = hlfir.declare %[[PRIV_I_ALLOC]]
   do i = 2, 10
-    !CHECK: omp.wsloop
+    !CHECK: omp.wsloop private(@{{.*}} %{{.*}} -> %[[PRIV_I_ALLOC:.*]] : !fir.ref<i32>)
     !CHECK: omp.loop_nest (%[[IV:[^[:space:]]+]])
+    !CHECK: %[[PRIV_I_DECL:.*]]:2 = hlfir.declare %[[PRIV_I_ALLOC]]
     !CHECK: fir.store %[[IV]] to %[[PRIV_I_DECL]]#1
     !CHECK: omp.critical
     !$omp critical
diff --git a/flang/test/Lower/OpenMP/default-clause-byref.f90 b/flang/test/Lower/OpenMP/default-clause-byref.f90
index 654c13ada9e39f..10e62005f42ba0 100644
--- a/flang/test/Lower/OpenMP/default-clause-byref.f90
+++ b/flang/test/Lower/OpenMP/default-clause-byref.f90
@@ -346,7 +346,7 @@ subroutine skipped_default_clause_checks()
        type(it)::iii
 
 !CHECK: omp.parallel {{.*}} {
-!CHECK: omp.wsloop reduction(byref @min_byref_i32 %[[VAL_Z_DECLARE]]#0 -> %[[PRV:.+]] : !fir.ref<i32>) {
+!CHECK: omp.wsloop private({{.*}}) reduction(byref @min_byref_i32 %[[VAL_Z_DECLARE]]#0 -> %[[PRV:.+]] : !fir.ref<i32>) {
 !CHECK-NEXT: omp.loop_nest (%[[ARG:.*]]) {{.*}} {
 !CHECK: omp.yield
 !CHECK: }
diff --git a/flang/test/Lower/OpenMP/default-clause.f90 b/flang/test/Lower/OpenMP/default-clause.f90
index c004813a911f73..fcc8d033eea0fa 100644
--- a/flang/test/Lower/OpenMP/default-clause.f90
+++ b/flang/test/Lower/OpenMP/default-clause.f90
@@ -284,16 +284,13 @@ subroutine nested_default_clause_test4
 !CHECK-LABEL: func @_QPnested_default_clause_test5
 !CHECK: omp.parallel {
 
-!CHECK: %[[X_ALLOCA:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFnested_default_clause_test5Ex"}
-!CHECK: %[[X_DECLARE:.*]]:2 = hlfir.declare %[[X_ALLOCA]] {{.*}}
-
-!CHECK: %[[LOOP_VAR_ALLOCA:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-!CHECK: %[[LOOP_VAR_DECLARE:.*]]:2 = hlfir.declare %[[LOOP_VAR_ALLOCA]] {{.*}}
-
 !CHECK: %[[CONST_LB:.*]] = arith.constant 1 : i32
 !CHECK: %[[CONST_UB:.*]] = arith.constant 50 : i32
 !CHECK: %[[CONST_STEP:.*]] = arith.constant 1 : i32
+! CHECK: omp.wsloop private(@{{.*}} %{{.*}} -> %[[X_ALLOCA:.*]], @{{.*}} %{{.*}} -> %[[LOOP_VAR_ALLOCA:.*]] : !fir.ref<i32>, !fir.ref<i32>) {
 !CHECK: omp.loop_nest (%[[ARG:.*]]) : i32 = (%[[CONST_LB]]) to (%[[CONST_UB]]) inclusive step (%[[CONST_STEP]]) {
+!CHECK: %[[X_DECLARE:.*]]:2 = hlfir.declare %[[X_ALLOCA]] {{.*}}
+!CHECK: %[[LOOP_VAR_DECLARE:.*]]:2 = hlfir.declare %[[LOOP_VAR_ALLOCA]] {{.*}}
 !CHECK: fir.store %[[ARG]] to %[[LOOP_VAR_DECLARE]]#1 : !fir.ref<i32>
 !CHECK: %[[LOADED_X:.*]] = fir.load %[[X_DECLARE]]#0 : !fir.ref<i32>
 !CHECK: %[[CONST:.*]] = arith.constant 1 : i32
@@ -321,13 +318,12 @@ subroutine nested_default_clause_test5
 
 !CHECK: %[[Z_VAR_DECLARE:.*]]:2 = hlfir.declare %[[Z_VAR]] {{.*}}
 
-!CHECK: %[[LOOP_VAR:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-!CHECK: %[[LOOP_VAR_DECLARE:.*]]:2 = hlfir.declare %[[LOOP_VAR]] {{.*}}
-
 !CHECK: %[[CONST_LB:.*]] = arith.constant 1 : i32
 !CHECK: %[[CONST_UB:.*]] = arith.constant 10 : i32
 !CHECK: %[[CONST_STEP:.*]] = arith.constant 1 : i32
+! CHECK: omp.wsloop private(@{{.*}} %{{.*}} -> %[[LOOP_VAR:.*]] : !fir.ref<i32>) {
 !CHECK: omp.loop_nest (%[[ARG:.*]]) : i32 = (%[[CONST_LB]]) to (%[[CONST_UB]]) inclusive step (%[[CONST_STEP]]) {
+!CHECK: %[[LOOP_VAR_DECLARE:.*]]:2 = hlfir.declare %[[LOOP_VAR]] {{.*}}
 !CHECK: fir.store %[[ARG]] to %[[LOOP_VAR_DECLARE]]#1 : !fir.ref<i32>
 !CHECK: %[[LOADED_X:.*]] = fir.load %[[X_VAR_DECLARE]]#0 : !fir.ref<i32>
 !CHECK: %[[CONST:.*]] = arith.constant 1 : i32
@@ -386,7 +382,7 @@ subroutine skipped_default_clause_checks()
        type(it)::iii
 
 !CHECK: omp.parallel {{.*}} {
-!CHECK: omp.wsloop reduction(@min_i32 %[[VAL_Z_DECLARE]]#0 -> %[[PRV:.+]] : !fir.ref<i32>) {
+!CHECK: omp.wsloop private({{.*}}) reduction(@min_i32 %[[VAL_Z_DECLARE]]#0 -> %[[PRV:.+]] : !fir.ref<i32>) {
 !CHECK-NEXT: omp.loop_nest (%[[ARG:.*]]) {{.*}} {
 !CHECK: omp.yield
 !CHECK: }
diff --git a/flang/test/Lower/OpenMP/hlfir-wsloop.f90 b/flang/test/Lower/OpenMP/hlfir-wsloop.f90
index f7b0ba681efebf..786ab916d000c3 100644
--- a/flang/test/Lower/OpenMP/hlfir-wsloop.f90
+++ b/flang/test/Lower/OpenMP/hlfir-wsloop.f90
@@ -10,12 +10,11 @@ subroutine simple_loop
   ! CHECK-DAG:     %[[WS_END:.*]] = arith.constant 9 : i32
   ! CHECK:  omp.parallel
   !$OMP PARALLEL
-  ! CHECK-DAG:     %[[ALLOCA_IV:.*]] = fir.alloca i32 {{{.*}}, pinned, {{.*}}}
-  ! CHECK:         %[[IV:.*]]    = fir.declare %[[ALLOCA_IV]] {uniq_name = "_QFsimple_loopEi"} : (!fir.ref<i32>) -> !fir.ref<i32>
-  ! CHECK:         omp.wsloop {
+  ! CHECK:         omp.wsloop private(@{{.*}} %{{.*}} -> %[[ALLOCA_IV:.*]] : !fir.ref<i32>) {
   ! CHECK-NEXT:      omp.loop_nest (%[[I:.*]]) : i32 = (%[[WS_ST]]) to (%[[WS_END]]) inclusive step (%[[WS_ST]]) {
   !$OMP DO
   do i=1, 9
+  ! CHECK:         %[[IV:.*]]    = fir.declare %[[ALLOCA_IV]] {uniq_name = "_QFsimple_loopEi"} : (!fir.ref<i32>) -> !fir.ref<i32>
   ! CHECK:             fir.store %[[I]] to %[[IV:.*]] : !fir.ref<i32>
   ! CHECK:             %[[LOAD_IV:.*]] = fir.load %[[IV]] : !fir.ref<i32>
   ! CHECK:             fir.call @_FortranAioOutputInteger32({{.*}}, %[[LOAD_IV]]) {{.*}}: (!fir.ref<i8>, i32) -> i1
diff --git a/flang/test/Lower/OpenMP/lastprivate-allocatable.f90 b/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
index 6b7d849fde93ca..fd8338393dd880 100644
--- a/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
+++ b/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
@@ -8,12 +8,11 @@
 ! CHECK:           fir.store %[[VAL_2]] to %[[VAL_0]] : !fir.ref<!fir.box<!fir.heap<i32>>>
 ! CHECK:           %[[VAL_3:.*]]:2 = hlfir.declare %[[VAL_0]] {fortran_attrs = {{.*}}<allocatable>, uniq_name = "_QFEa"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
 ! CHECK:           omp.parallel {
-!                    create original copy of private variable
-! CHECK:             %[[VAL_16:.*]]:2 = hlfir.declare %{{.*}} {fortran_attrs = {{.*}}<allocatable>, uniq_name = "_QFEa"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
-! CHECK:             %[[VAL_17:.*]] = fir.alloca i32 {bindc_name = "i", pinned, uniq_name = "_QFEi"}
-! CHECK:             %[[VAL_18:.*]]:2 = hlfir.declare %[[VAL_17]] {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-! CHECK:             omp.wsloop {
+! CHECK:             omp.wsloop private(@{{.*}} %{{.*}} -> %{{.*}}, @{{.*}} %{{.*}} -> %[[VAL_17:.*]] : !fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<i32>) {
 ! CHECK:               omp.loop_nest
+! CHECK:                   %[[VAL_16:.*]]:2 = hlfir.declare %{{.*}} {fortran_attrs = {{.*}}<allocatable>, uniq_name = "_QFEa"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
+! CHECK:                   %[[VAL_18:.*]]:2 = hlfir.declare %[[VAL_17]] {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+
 !                        [...]
 !                        if this is the last iteration
 ! CHECK:                 fir.if %{{.*}} {
diff --git a/flang/test/Lower/OpenMP/lastprivate-commonblock.f90 b/flang/test/Lower/OpenMP/lastprivate-commonblock.f90
index faa3d3e053f345..c059382bf634c5 100644
--- a/flang/test/Lower/OpenMP/lastprivate-commonblock.f90
+++ b/flang/test/Lower/OpenMP/lastprivate-commonblock.f90
@@ -11,12 +11,10 @@
 !CHECK:      %[[CB_C_Y_COOR:.*]] = fir.coordinate_of %[[CB_C_REF_CVT]], %{{.*}} : (!fir.ref<!fir.array<?xi8>>, index) -> !fir.ref<i8>
 !CHECK:      %[[CB_C_Y_ADDR:.*]] = fir.convert %[[CB_C_Y_COOR]] : (!fir.ref<i8>) -> !fir.ref<f32>
 !CHECK:      %[[Y_DECL:.*]]:2 = hlfir.declare %[[CB_C_Y_ADDR]] {uniq_name = "_QFlastprivate_commonEy"} : (!fir.ref<f32>) -> (!fir.ref<f32>, !fir.ref<f32>)
-!CHECK:      %[[PRIVATE_X_REF:.*]] = fir.alloca f32 {bindc_name = "x", pinned, uniq_name = "_QFlastprivate_commonEx"}
+!CHECK:      omp.wsloop private(@{{.*}} %{{.*}} -> %[[PRIVATE_X_REF:.*]], @{{.*}} %{{.*}} -> %[[PRIVATE_Y_REF:.*]], @{{.*}} %{{.*}} -> %{{.*}} : !{{.*}}, !{{.*}}, !{{.*}}) {
+!CHECK-NEXT:   omp.loop_nest (%[[I:.*]]) : i32 = (%{{.*}}) to (%{{.*}}) inclusive step (%{{.*}}) {
 !CHECK:      %[[PRIVATE_X_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_X_REF]] {uniq_name = "_QFlastprivate_commonEx"} : (!fir.ref<f32>) -> (!fir.ref<f32>, !fir.ref<f32>)
-!CHECK:      %[[PRIVATE_Y_REF:.*]] = fir.alloca f32 {bindc_name = "y", pinned, uniq_name = "_QFlastprivate_commonEy"}
 !CHECK:      %[[PRIVATE_Y_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_Y_REF]] {uniq_name = "_QFlastprivate_commonEy"} : (!fir.ref<f32>) -> (!fir.ref<f32>, !fir.ref<f32>)
-!CHECK:      omp.wsloop {
-!CHECK-NEXT:   omp.loop_nest (%[[I:.*]]) : i32 = (%{{.*}}) to (%{{.*}}) inclusive step (%{{.*}}) {
 !CHECK:          %[[V:.*]] = arith.addi %[[I]], %{{.*}} : i32
 !CHECK:          %[[C0:.*]] = arith.constant 0 : i32
 !CHECK:          %[[NEG_STEP:.*]] = arith.cmpi slt, %{{.*}}, %[[C0]] : i32
diff --git a/flang/test/Lower/OpenMP/lastprivate-iv.f90 b/flang/test/Lower/OpenMP/lastprivate-iv.f90
index 63a81e818bc8ba..aacefd8b59c0f2 100644
--- a/flang/test/Lower/OpenMP/lastprivate-iv.f90
+++ b/flang/test/Lower/OpenMP/lastprivate-iv.f90
@@ -6,14 +6,12 @@
 !CHECK:      %[[I2_MEM:.*]] = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFlastprivate_iv_incEi"}
 !CHECK:      %[[I2:.*]]:2 = hlfir.declare %[[I2_MEM]] {uniq_name = "_QFlastprivate_iv_incEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 
-!CHECK:      %[[I_MEM:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-!CHECK:      %[[I:.*]]:2 = hlfir.declare %[[I_MEM]] {uniq_name = "_QFlastprivate_iv_incEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-
 !CHECK:      %[[LB:.*]] = arith.constant 4 : i32
 !CHECK:      %[[UB:.*]] = arith.constant 10 : i32
 !CHECK:      %[[STEP:.*]]  = arith.constant 3 : i32
-!CHECK:      omp.wsloop {
+!CHECK:      omp.wsloop private(@{{.*}} %{{.*}} -> %[[I_MEM:.*]] : !fir.ref<i32>) {
 !CHECK-NEXT:   omp.loop_nest (%[[IV:.*]]) : i32 = (%[[LB]]) to (%[[UB]]) inclusive step (%[[STEP]]) {
+!CHECK:          %[[I:.*]]:2 = hlfir.declare %[[I_MEM]] {uniq_name = "_QFlastprivate_iv_incEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 !CHECK:          fir.store %[[IV]] to %[[I]]#1 : !fir.ref<i32>
 !CHECK:          %[[V:.*]] = arith.addi %[[IV]], %[[STEP]] : i32
 !CHECK:          %[[C0:.*]] = arith.constant 0 : i32
@@ -42,15 +40,12 @@ subroutine lastprivate_iv_inc()
 
 !CHECK:      %[[I2_MEM:.*]] = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFlastprivate_iv_decEi"}
 !CHECK:      %[[I2:.*]]:2 = hlfir.declare %[[I2_MEM]] {uniq_name = "_QFlastprivate_iv_decEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-
-!CHECK:      %[[I_MEM:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-!CHECK:      %[[I:.*]]:2 = hlfir.declare %[[I_MEM]] {uniq_name = "_QFlastprivate_iv_decEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-
 !CHECK:      %[[LB:.*]] = arith.constant 10 : i32
 !CHECK:      %[[UB:.*]] = arith.constant 1 : i32
 !CHECK:      %[[STEP:.*]]  = arith.constant -3 : i32
-!CHECK:      omp.wsloop {
+!CHECK:      omp.wsloop private(@{{.*}} %{{.*}} -> %[[I_MEM:.*]] : !fir.ref<i32>) {
 !CHECK-NEXT:   omp.loop_nest (%[[IV:.*]]) : i32 = (%[[LB]]) to (%[[UB]]) inclusive step (%[[STEP]]) {
+!CHECK:          %[[I:.*]]:2 = hlfir.declare %[[I_MEM]] {uniq_name = "_QFlastprivate_iv_decEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 !CHECK:          fir.store %[[IV]] to %[[I]]#1 : !fir.ref<i32>
 !CHECK:          %[[V:.*]] = arith.addi %[[IV]], %[[STEP]] : i32
 !CHECK:          %[[C0:.*]] = arith.constant 0 : i32
@@ -80,7 +75,7 @@ subroutine lastprivate_iv_dec()
 subroutine lastprivate_iv_i1
   integer*1 :: i1
   i1=0
-!CHECK:    omp.wsloop
+!CHECK:    omp.wsloop private({{.*}})
 !CHECK:      omp.loop_nest
 !CHECK:        fir.if %{{.*}} {
 !CHECK:          %[[I8_VAL:.*]] = fir.convert %{{.*}} : (i32) -> i8
diff --git a/flang/test/Lower/OpenMP/location.f90 b/flang/test/Lower/OpenMP/location.f90
index 2dab22a1c1f90d..fc7dd434998638 100644
--- a/flang/test/Lower/OpenMP/location.f90
+++ b/flang/test/Lower/OpenMP/location.f90
@@ -28,7 +28,7 @@ subroutine sub_target()
 
 !CHECK-LABEL: sub_loop
 subroutine sub_loop()
-!CHECK: omp.wsloop {
+!CHECK: omp.wsloop private({{.*}}) {
 !CHECK-NEXT: omp.loop_nest {{.*}} {
   !$omp do
   do i=1,10
diff --git a/flang/test/Lower/OpenMP/order-clause.f90 b/flang/test/Lower/OpenMP/order-clause.f90
index 717d9740c56f80..75b3a27bbd7fe1 100644
--- a/flang/test/Lower/OpenMP/order-clause.f90
+++ b/flang/test/Lower/OpenMP/order-clause.f90
@@ -20,15 +20,15 @@ end subroutine simd_order
 
 !CHECK-LABEL:   func.func @_QPdo_order() {
 subroutine do_order
-   !CHECK: omp.wsloop order(reproducible:concurrent) {
+   !CHECK: omp.wsloop order(reproducible:concurrent) private({{.*}}) {
    !$omp do order(concurrent)
    do i = 1, 10
    end do
-   !CHECK: omp.wsloop order(reproducible:concurrent) {
+   !CHECK: omp.wsloop order(reproducible:concurrent) private({{.*}}) {
    !$omp do order(reproducible:concurrent)
    do i = 1, 10
    end do
-   !CHECK: omp.wsloop order(unconstrained:concurrent) {
+   !CHECK: omp.wsloop order(unconstrained:concurrent) private({{.*}}) {
    !$omp do order(unconstrained:concurrent)
    do i = 1, 10
    end do
diff --git a/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 b/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90
index 86309a24f91a03..531413c124f81b 100644
--- a/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90
+++ b/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90
@@ -10,12 +10,12 @@
 !CHECK-DAG: %[[ARG1_DECL:.*]]:2 = hlfir.declare %[[ARG1_REF]] typeparams %[[FIVE]] dummy_scope %{{[0-9]+}} {uniq_name = "_QFlastprivate_characterEarg1"} : (!fir.ref<!fir.char<1,5>>, index, !fir.dscope) -> (!fir.ref<!fir.char<1,5>>, !fir.ref<!fir.char<1,5>>)
 
 !CHECK: omp.parallel {
-!CHECK-DAG: %[[ARG1_PVT:.*]] = fir.alloca !fir.char<1,5> {bindc_name = "arg1", pin...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jan 10, 2025

@llvm/pr-subscribers-mlir

Author: Kareem Ergawy (ergawy)

Changes

This enable delayed privatization by default for omp.wsloop ops, with one caveat! I had to workaround the "impure" alloc region issue that being resolved at the moment. The workaround detects whether the alloc region's argument is used in the region and at the same time defined in block that does not dominate the chosen alloca insertion point. If so, we move the alloca insertion point below the defining instruction of the alloc region argument. This basically reverts to the non-delayed-privatizaiton behavior.


Patch is 251.21 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/122471.diff

73 Files Affected:

  • (modified) flang/lib/Lower/OpenMP/OpenMP.cpp (+1-1)
  • (modified) flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90 (+2-2)
  • (modified) flang/test/Lower/OpenMP/Todo/omp-default-clause-inner-loop.f90 (+1-2)
  • (modified) flang/test/Lower/OpenMP/associate.f90 (+2-2)
  • (modified) flang/test/Lower/OpenMP/copyin.f90 (+4-8)
  • (modified) flang/test/Lower/OpenMP/critical.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/default-clause-byref.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/default-clause.f90 (+6-10)
  • (modified) flang/test/Lower/OpenMP/hlfir-wsloop.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/lastprivate-allocatable.f90 (+4-5)
  • (modified) flang/test/Lower/OpenMP/lastprivate-commonblock.f90 (+2-4)
  • (modified) flang/test/Lower/OpenMP/lastprivate-iv.f90 (+5-10)
  • (modified) flang/test/Lower/OpenMP/location.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/order-clause.f90 (+3-3)
  • (modified) flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 (+17-30)
  • (modified) flang/test/Lower/OpenMP/parallel-private-clause-fixes.f90 (+5-7)
  • (modified) flang/test/Lower/OpenMP/parallel-private-clause.f90 (+10-33)
  • (modified) flang/test/Lower/OpenMP/parallel-reduction-allocatable-array.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/parallel-reduction-pointer-array.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/parallel-reduction3.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/parallel-wsloop-firstpriv.f90 (+9-27)
  • (modified) flang/test/Lower/OpenMP/parallel-wsloop-lastpriv.f90 (+26-44)
  • (modified) flang/test/Lower/OpenMP/parallel-wsloop-reduction-byref.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/parallel-wsloop-reduction.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/parallel-wsloop.f90 (+21-45)
  • (modified) flang/test/Lower/OpenMP/private-derived-type.f90 (+12-10)
  • (modified) flang/test/Lower/OpenMP/stop-stmt-in-region.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/target.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/unstructured.f90 (+10-14)
  • (modified) flang/test/Lower/OpenMP/wsloop-chunks.f90 (+3-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-collapse.f90 (+6-10)
  • (modified) flang/test/Lower/OpenMP/wsloop-monotonic.f90 (+1-2)
  • (modified) flang/test/Lower/OpenMP/wsloop-nonmonotonic.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-ordered.f90 (+2-2)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-add-byref.f90 (+14-21)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-add.f90 (+14-21)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-allocatable-array-minmax.f90 (+4-6)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-allocatable.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-array-assumed-shape.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-array.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-array2.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-iand-byref.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-iand.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-ieor-byref.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-ieor.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-ior-byref.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-ior.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-and-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-and.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-eqv-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-eqv.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-neqv-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-neqv.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-or-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-logical-or.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-max-2-byref.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-max-2.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-max-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-max.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-min-byref.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-min.f90 (+6-9)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-min2.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-mul-byref.f90 (+14-21)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-mul.f90 (+14-21)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-multi.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-multiple-clauses.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-reduction-pointer.f90 (+2-3)
  • (modified) flang/test/Lower/OpenMP/wsloop-schedule.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-unstructured.f90 (+1-1)
  • (modified) flang/test/Lower/OpenMP/wsloop-variable.f90 (+7-10)
  • (modified) flang/test/Lower/OpenMP/wsloop.f90 (+6-9)
  • (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+56)
  • (added) mlir/test/Target/LLVMIR/openmp-wsloop-private-late-alloca-workaround.mlir (+50)
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index cd4b25a17722c1..2ce753039880c1 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -2079,7 +2079,7 @@ static void genStandaloneDo(lower::AbstractConverter &converter,
 
   DataSharingProcessor dsp(converter, semaCtx, item->clauses, eval,
                            /*shouldCollectPreDeterminedSymbols=*/true,
-                           enableDelayedPrivatizationStaging, symTable);
+                           enableDelayedPrivatization, symTable);
   dsp.processStep1(&wsloopClauseOps);
 
   mlir::omp::LoopNestOperands loopNestClauseOps;
diff --git a/flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90 b/flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90
index 66fd120085c782..c98850b8000d36 100644
--- a/flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90
+++ b/flang/test/Lower/OpenMP/DelayedPrivatization/wsloop.f90
@@ -1,6 +1,6 @@
-! RUN: %flang_fc1 -emit-hlfir -fopenmp -mmlir --openmp-enable-delayed-privatization-staging \
+! RUN: %flang_fc1 -emit-hlfir -fopenmp \
 ! RUN:   -o - %s 2>&1 | FileCheck %s
-! RUN: bbc -emit-hlfir -fopenmp --openmp-enable-delayed-privatization-staging -o - %s 2>&1 \
+! RUN: bbc -emit-hlfir -fopenmp  -o - %s 2>&1 \
 ! RUN:   | FileCheck %s
 
 subroutine wsloop_private
diff --git a/flang/test/Lower/OpenMP/Todo/omp-default-clause-inner-loop.f90 b/flang/test/Lower/OpenMP/Todo/omp-default-clause-inner-loop.f90
index 77a1304f39a488..10879c53dc0c58 100644
--- a/flang/test/Lower/OpenMP/Todo/omp-default-clause-inner-loop.f90
+++ b/flang/test/Lower/OpenMP/Todo/omp-default-clause-inner-loop.f90
@@ -9,11 +9,10 @@
 ! The string "EXPECTED" denotes the expected FIR
 
 ! CHECK: omp.parallel  private(@{{.*}} %{{.*}} -> %[[PRIVATE_Y:.*]], @{{.*}} %{{.*}} -> %[[PRIVATE_Y:.*]] : !fir.ref<i32>, !fir.ref<i32>) {
-! CHECK: %[[TEMP:.*]] = fir.alloca i32 {bindc_name = "x", pinned, {{.*}}}
 ! CHECK: %[[const_1:.*]] = arith.constant 1 : i32
 ! CHECK: %[[const_2:.*]] = arith.constant 10 : i32
 ! CHECK: %[[const_3:.*]] = arith.constant 1 : i32
-! CHECK: omp.wsloop {
+! CHECK: omp.wsloop private(@{{.*}} %{{.*}} -> %[[TEMP:.*]] : !fir.ref<i32>) {
 ! CHECK-NEXT: omp.loop_nest (%[[ARG:.*]]) : i32 = (%[[const_1]]) to (%[[const_2]]) inclusive step (%[[const_3]]) {
 ! CHECK: fir.store %[[ARG]] to %[[TEMP]] : !fir.ref<i32>
 ! EXPECTED: %[[temp_1:.*]] = fir.load %[[PRIVATE_Z]] : !fir.ref<i32>
diff --git a/flang/test/Lower/OpenMP/associate.f90 b/flang/test/Lower/OpenMP/associate.f90
index 4964890a6842c1..d497b4ade782e0 100644
--- a/flang/test/Lower/OpenMP/associate.f90
+++ b/flang/test/Lower/OpenMP/associate.f90
@@ -6,12 +6,12 @@
 !CHECK:         omp.parallel {
 !CHECK-NOT:       hlfir.declare {{.*}} {uniq_name = "_QFtest_parallel_assocEa"}
 !CHECK-NOT:       hlfir.declare {{.*}} {uniq_name = "_QFtest_parallel_assocEb"}
-!CHECK:           omp.wsloop {
+!CHECK:           omp.wsloop private({{.*}}) {
 !CHECK:           }
 !CHECK:         }
 !CHECK:         omp.parallel {{.*}} {
 !CHECK-NOT:       hlfir.declare {{.*}} {uniq_name = "_QFtest_parallel_assocEb"}
-!CHECK:           omp.wsloop {
+!CHECK:           omp.wsloop private({{.*}}) {
 !CHECK:           }
 !CHECK:         }
 subroutine test_parallel_assoc()
diff --git a/flang/test/Lower/OpenMP/copyin.f90 b/flang/test/Lower/OpenMP/copyin.f90
index f3d147c10668f4..af41442e6870f6 100644
--- a/flang/test/Lower/OpenMP/copyin.f90
+++ b/flang/test/Lower/OpenMP/copyin.f90
@@ -154,14 +154,13 @@ subroutine copyin_derived_type()
 
 ! CHECK:             omp.barrier
 
-! CHECK:             %[[VAL_6:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-! CHECK:             %[[VAL_7:.*]]:2 = hlfir.declare %[[VAL_6]] {uniq_name = "_QFcombined_parallel_worksharing_loopEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 
 ! CHECK:             %[[VAL_11:.*]] = arith.constant 1 : i32
 ! CHECK:             %[[VAL_12:.*]] = fir.load %[[VAL_9]]#0 : !fir.ref<i32>
 ! CHECK:             %[[VAL_13:.*]] = arith.constant 1 : i32
-! CHECK:             omp.wsloop {
+! CHECK:             omp.wsloop private(@{{.*}} %{{.*}} -> %[[VAL_6:.*]] : !fir.ref<i32>) {
 ! CHECK-NEXT:          omp.loop_nest (%[[VAL_14:.*]]) : i32 = (%[[VAL_11]]) to (%[[VAL_12]]) inclusive step (%[[VAL_13]]) {
+! CHECK:                 %[[VAL_7:.*]]:2 = hlfir.declare %[[VAL_6]] {uniq_name = "_QFcombined_parallel_worksharing_loopEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 ! CHECK:                 fir.store %[[VAL_14]] to %[[VAL_7]]#1 : !fir.ref<i32>
 ! CHECK:                 fir.call @_QPsub4(%[[VAL_9]]#1) fastmath<contract> : (!fir.ref<i32>) -> ()
 ! CHECK:                 omp.yield
@@ -321,15 +320,12 @@ subroutine common_1()
 ! CHECK:             %[[VAL_33:.*]] = fir.load %[[VAL_18]]#0 : !fir.ref<i32>
 ! CHECK:             hlfir.assign %[[VAL_33]] to %[[VAL_31]]#0 : i32, !fir.ref<i32>
 ! CHECK:             omp.barrier
-
-! CHECK:             %[[VAL_19:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-! CHECK:             %[[VAL_20:.*]]:2 = hlfir.declare %[[VAL_19]] {uniq_name = "_QFcommon_2Ei"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-
 ! CHECK:             %[[VAL_34:.*]] = arith.constant 1 : i32
 ! CHECK:             %[[VAL_35:.*]] = fir.load %[[VAL_26]]#0 : !fir.ref<i32>
 ! CHECK:             %[[VAL_36:.*]] = arith.constant 1 : i32
-! CHECK:             omp.wsloop {
+! CHECK:             omp.wsloop private(@{{.*}} %{{.*}} -> %[[VAL_19:.*]] : !fir.ref<i32>) {
 ! CHECK-NEXT:          omp.loop_nest (%[[VAL_37:.*]]) : i32 = (%[[VAL_34]]) to (%[[VAL_35]]) inclusive step (%[[VAL_36]]) {
+! CHECK:             %[[VAL_20:.*]]:2 = hlfir.declare %[[VAL_19]] {uniq_name = "_QFcommon_2Ei"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 ! CHECK:                 fir.store %[[VAL_37]] to %[[VAL_20]]#1 : !fir.ref<i32>
 ! CHECK:                 %[[VAL_38:.*]] = fir.load %[[VAL_31]]#0 : !fir.ref<i32>
 ! CHECK:                 %[[VAL_39:.*]] = fir.load %[[VAL_20]]#0 : !fir.ref<i32>
diff --git a/flang/test/Lower/OpenMP/critical.f90 b/flang/test/Lower/OpenMP/critical.f90
index 051d3782106469..99a4426ab04533 100644
--- a/flang/test/Lower/OpenMP/critical.f90
+++ b/flang/test/Lower/OpenMP/critical.f90
@@ -38,11 +38,10 @@ subroutine predetermined_privatization()
   !CHECK: omp.parallel
   !$omp parallel do
 
-  !CHECK: %[[PRIV_I_ALLOC:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-  !CHECK: %[[PRIV_I_DECL:.*]]:2 = hlfir.declare %[[PRIV_I_ALLOC]]
   do i = 2, 10
-    !CHECK: omp.wsloop
+    !CHECK: omp.wsloop private(@{{.*}} %{{.*}} -> %[[PRIV_I_ALLOC:.*]] : !fir.ref<i32>)
     !CHECK: omp.loop_nest (%[[IV:[^[:space:]]+]])
+    !CHECK: %[[PRIV_I_DECL:.*]]:2 = hlfir.declare %[[PRIV_I_ALLOC]]
     !CHECK: fir.store %[[IV]] to %[[PRIV_I_DECL]]#1
     !CHECK: omp.critical
     !$omp critical
diff --git a/flang/test/Lower/OpenMP/default-clause-byref.f90 b/flang/test/Lower/OpenMP/default-clause-byref.f90
index 654c13ada9e39f..10e62005f42ba0 100644
--- a/flang/test/Lower/OpenMP/default-clause-byref.f90
+++ b/flang/test/Lower/OpenMP/default-clause-byref.f90
@@ -346,7 +346,7 @@ subroutine skipped_default_clause_checks()
        type(it)::iii
 
 !CHECK: omp.parallel {{.*}} {
-!CHECK: omp.wsloop reduction(byref @min_byref_i32 %[[VAL_Z_DECLARE]]#0 -> %[[PRV:.+]] : !fir.ref<i32>) {
+!CHECK: omp.wsloop private({{.*}}) reduction(byref @min_byref_i32 %[[VAL_Z_DECLARE]]#0 -> %[[PRV:.+]] : !fir.ref<i32>) {
 !CHECK-NEXT: omp.loop_nest (%[[ARG:.*]]) {{.*}} {
 !CHECK: omp.yield
 !CHECK: }
diff --git a/flang/test/Lower/OpenMP/default-clause.f90 b/flang/test/Lower/OpenMP/default-clause.f90
index c004813a911f73..fcc8d033eea0fa 100644
--- a/flang/test/Lower/OpenMP/default-clause.f90
+++ b/flang/test/Lower/OpenMP/default-clause.f90
@@ -284,16 +284,13 @@ subroutine nested_default_clause_test4
 !CHECK-LABEL: func @_QPnested_default_clause_test5
 !CHECK: omp.parallel {
 
-!CHECK: %[[X_ALLOCA:.*]] = fir.alloca i32 {bindc_name = "x", pinned, uniq_name = "_QFnested_default_clause_test5Ex"}
-!CHECK: %[[X_DECLARE:.*]]:2 = hlfir.declare %[[X_ALLOCA]] {{.*}}
-
-!CHECK: %[[LOOP_VAR_ALLOCA:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-!CHECK: %[[LOOP_VAR_DECLARE:.*]]:2 = hlfir.declare %[[LOOP_VAR_ALLOCA]] {{.*}}
-
 !CHECK: %[[CONST_LB:.*]] = arith.constant 1 : i32
 !CHECK: %[[CONST_UB:.*]] = arith.constant 50 : i32
 !CHECK: %[[CONST_STEP:.*]] = arith.constant 1 : i32
+! CHECK: omp.wsloop private(@{{.*}} %{{.*}} -> %[[X_ALLOCA:.*]], @{{.*}} %{{.*}} -> %[[LOOP_VAR_ALLOCA:.*]] : !fir.ref<i32>, !fir.ref<i32>) {
 !CHECK: omp.loop_nest (%[[ARG:.*]]) : i32 = (%[[CONST_LB]]) to (%[[CONST_UB]]) inclusive step (%[[CONST_STEP]]) {
+!CHECK: %[[X_DECLARE:.*]]:2 = hlfir.declare %[[X_ALLOCA]] {{.*}}
+!CHECK: %[[LOOP_VAR_DECLARE:.*]]:2 = hlfir.declare %[[LOOP_VAR_ALLOCA]] {{.*}}
 !CHECK: fir.store %[[ARG]] to %[[LOOP_VAR_DECLARE]]#1 : !fir.ref<i32>
 !CHECK: %[[LOADED_X:.*]] = fir.load %[[X_DECLARE]]#0 : !fir.ref<i32>
 !CHECK: %[[CONST:.*]] = arith.constant 1 : i32
@@ -321,13 +318,12 @@ subroutine nested_default_clause_test5
 
 !CHECK: %[[Z_VAR_DECLARE:.*]]:2 = hlfir.declare %[[Z_VAR]] {{.*}}
 
-!CHECK: %[[LOOP_VAR:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-!CHECK: %[[LOOP_VAR_DECLARE:.*]]:2 = hlfir.declare %[[LOOP_VAR]] {{.*}}
-
 !CHECK: %[[CONST_LB:.*]] = arith.constant 1 : i32
 !CHECK: %[[CONST_UB:.*]] = arith.constant 10 : i32
 !CHECK: %[[CONST_STEP:.*]] = arith.constant 1 : i32
+! CHECK: omp.wsloop private(@{{.*}} %{{.*}} -> %[[LOOP_VAR:.*]] : !fir.ref<i32>) {
 !CHECK: omp.loop_nest (%[[ARG:.*]]) : i32 = (%[[CONST_LB]]) to (%[[CONST_UB]]) inclusive step (%[[CONST_STEP]]) {
+!CHECK: %[[LOOP_VAR_DECLARE:.*]]:2 = hlfir.declare %[[LOOP_VAR]] {{.*}}
 !CHECK: fir.store %[[ARG]] to %[[LOOP_VAR_DECLARE]]#1 : !fir.ref<i32>
 !CHECK: %[[LOADED_X:.*]] = fir.load %[[X_VAR_DECLARE]]#0 : !fir.ref<i32>
 !CHECK: %[[CONST:.*]] = arith.constant 1 : i32
@@ -386,7 +382,7 @@ subroutine skipped_default_clause_checks()
        type(it)::iii
 
 !CHECK: omp.parallel {{.*}} {
-!CHECK: omp.wsloop reduction(@min_i32 %[[VAL_Z_DECLARE]]#0 -> %[[PRV:.+]] : !fir.ref<i32>) {
+!CHECK: omp.wsloop private({{.*}}) reduction(@min_i32 %[[VAL_Z_DECLARE]]#0 -> %[[PRV:.+]] : !fir.ref<i32>) {
 !CHECK-NEXT: omp.loop_nest (%[[ARG:.*]]) {{.*}} {
 !CHECK: omp.yield
 !CHECK: }
diff --git a/flang/test/Lower/OpenMP/hlfir-wsloop.f90 b/flang/test/Lower/OpenMP/hlfir-wsloop.f90
index f7b0ba681efebf..786ab916d000c3 100644
--- a/flang/test/Lower/OpenMP/hlfir-wsloop.f90
+++ b/flang/test/Lower/OpenMP/hlfir-wsloop.f90
@@ -10,12 +10,11 @@ subroutine simple_loop
   ! CHECK-DAG:     %[[WS_END:.*]] = arith.constant 9 : i32
   ! CHECK:  omp.parallel
   !$OMP PARALLEL
-  ! CHECK-DAG:     %[[ALLOCA_IV:.*]] = fir.alloca i32 {{{.*}}, pinned, {{.*}}}
-  ! CHECK:         %[[IV:.*]]    = fir.declare %[[ALLOCA_IV]] {uniq_name = "_QFsimple_loopEi"} : (!fir.ref<i32>) -> !fir.ref<i32>
-  ! CHECK:         omp.wsloop {
+  ! CHECK:         omp.wsloop private(@{{.*}} %{{.*}} -> %[[ALLOCA_IV:.*]] : !fir.ref<i32>) {
   ! CHECK-NEXT:      omp.loop_nest (%[[I:.*]]) : i32 = (%[[WS_ST]]) to (%[[WS_END]]) inclusive step (%[[WS_ST]]) {
   !$OMP DO
   do i=1, 9
+  ! CHECK:         %[[IV:.*]]    = fir.declare %[[ALLOCA_IV]] {uniq_name = "_QFsimple_loopEi"} : (!fir.ref<i32>) -> !fir.ref<i32>
   ! CHECK:             fir.store %[[I]] to %[[IV:.*]] : !fir.ref<i32>
   ! CHECK:             %[[LOAD_IV:.*]] = fir.load %[[IV]] : !fir.ref<i32>
   ! CHECK:             fir.call @_FortranAioOutputInteger32({{.*}}, %[[LOAD_IV]]) {{.*}}: (!fir.ref<i8>, i32) -> i1
diff --git a/flang/test/Lower/OpenMP/lastprivate-allocatable.f90 b/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
index 6b7d849fde93ca..fd8338393dd880 100644
--- a/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
+++ b/flang/test/Lower/OpenMP/lastprivate-allocatable.f90
@@ -8,12 +8,11 @@
 ! CHECK:           fir.store %[[VAL_2]] to %[[VAL_0]] : !fir.ref<!fir.box<!fir.heap<i32>>>
 ! CHECK:           %[[VAL_3:.*]]:2 = hlfir.declare %[[VAL_0]] {fortran_attrs = {{.*}}<allocatable>, uniq_name = "_QFEa"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
 ! CHECK:           omp.parallel {
-!                    create original copy of private variable
-! CHECK:             %[[VAL_16:.*]]:2 = hlfir.declare %{{.*}} {fortran_attrs = {{.*}}<allocatable>, uniq_name = "_QFEa"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
-! CHECK:             %[[VAL_17:.*]] = fir.alloca i32 {bindc_name = "i", pinned, uniq_name = "_QFEi"}
-! CHECK:             %[[VAL_18:.*]]:2 = hlfir.declare %[[VAL_17]] {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-! CHECK:             omp.wsloop {
+! CHECK:             omp.wsloop private(@{{.*}} %{{.*}} -> %{{.*}}, @{{.*}} %{{.*}} -> %[[VAL_17:.*]] : !fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<i32>) {
 ! CHECK:               omp.loop_nest
+! CHECK:                   %[[VAL_16:.*]]:2 = hlfir.declare %{{.*}} {fortran_attrs = {{.*}}<allocatable>, uniq_name = "_QFEa"} : (!fir.ref<!fir.box<!fir.heap<i32>>>) -> (!fir.ref<!fir.box<!fir.heap<i32>>>, !fir.ref<!fir.box<!fir.heap<i32>>>)
+! CHECK:                   %[[VAL_18:.*]]:2 = hlfir.declare %[[VAL_17]] {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+
 !                        [...]
 !                        if this is the last iteration
 ! CHECK:                 fir.if %{{.*}} {
diff --git a/flang/test/Lower/OpenMP/lastprivate-commonblock.f90 b/flang/test/Lower/OpenMP/lastprivate-commonblock.f90
index faa3d3e053f345..c059382bf634c5 100644
--- a/flang/test/Lower/OpenMP/lastprivate-commonblock.f90
+++ b/flang/test/Lower/OpenMP/lastprivate-commonblock.f90
@@ -11,12 +11,10 @@
 !CHECK:      %[[CB_C_Y_COOR:.*]] = fir.coordinate_of %[[CB_C_REF_CVT]], %{{.*}} : (!fir.ref<!fir.array<?xi8>>, index) -> !fir.ref<i8>
 !CHECK:      %[[CB_C_Y_ADDR:.*]] = fir.convert %[[CB_C_Y_COOR]] : (!fir.ref<i8>) -> !fir.ref<f32>
 !CHECK:      %[[Y_DECL:.*]]:2 = hlfir.declare %[[CB_C_Y_ADDR]] {uniq_name = "_QFlastprivate_commonEy"} : (!fir.ref<f32>) -> (!fir.ref<f32>, !fir.ref<f32>)
-!CHECK:      %[[PRIVATE_X_REF:.*]] = fir.alloca f32 {bindc_name = "x", pinned, uniq_name = "_QFlastprivate_commonEx"}
+!CHECK:      omp.wsloop private(@{{.*}} %{{.*}} -> %[[PRIVATE_X_REF:.*]], @{{.*}} %{{.*}} -> %[[PRIVATE_Y_REF:.*]], @{{.*}} %{{.*}} -> %{{.*}} : !{{.*}}, !{{.*}}, !{{.*}}) {
+!CHECK-NEXT:   omp.loop_nest (%[[I:.*]]) : i32 = (%{{.*}}) to (%{{.*}}) inclusive step (%{{.*}}) {
 !CHECK:      %[[PRIVATE_X_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_X_REF]] {uniq_name = "_QFlastprivate_commonEx"} : (!fir.ref<f32>) -> (!fir.ref<f32>, !fir.ref<f32>)
-!CHECK:      %[[PRIVATE_Y_REF:.*]] = fir.alloca f32 {bindc_name = "y", pinned, uniq_name = "_QFlastprivate_commonEy"}
 !CHECK:      %[[PRIVATE_Y_DECL:.*]]:2 = hlfir.declare %[[PRIVATE_Y_REF]] {uniq_name = "_QFlastprivate_commonEy"} : (!fir.ref<f32>) -> (!fir.ref<f32>, !fir.ref<f32>)
-!CHECK:      omp.wsloop {
-!CHECK-NEXT:   omp.loop_nest (%[[I:.*]]) : i32 = (%{{.*}}) to (%{{.*}}) inclusive step (%{{.*}}) {
 !CHECK:          %[[V:.*]] = arith.addi %[[I]], %{{.*}} : i32
 !CHECK:          %[[C0:.*]] = arith.constant 0 : i32
 !CHECK:          %[[NEG_STEP:.*]] = arith.cmpi slt, %{{.*}}, %[[C0]] : i32
diff --git a/flang/test/Lower/OpenMP/lastprivate-iv.f90 b/flang/test/Lower/OpenMP/lastprivate-iv.f90
index 63a81e818bc8ba..aacefd8b59c0f2 100644
--- a/flang/test/Lower/OpenMP/lastprivate-iv.f90
+++ b/flang/test/Lower/OpenMP/lastprivate-iv.f90
@@ -6,14 +6,12 @@
 !CHECK:      %[[I2_MEM:.*]] = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFlastprivate_iv_incEi"}
 !CHECK:      %[[I2:.*]]:2 = hlfir.declare %[[I2_MEM]] {uniq_name = "_QFlastprivate_iv_incEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 
-!CHECK:      %[[I_MEM:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-!CHECK:      %[[I:.*]]:2 = hlfir.declare %[[I_MEM]] {uniq_name = "_QFlastprivate_iv_incEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-
 !CHECK:      %[[LB:.*]] = arith.constant 4 : i32
 !CHECK:      %[[UB:.*]] = arith.constant 10 : i32
 !CHECK:      %[[STEP:.*]]  = arith.constant 3 : i32
-!CHECK:      omp.wsloop {
+!CHECK:      omp.wsloop private(@{{.*}} %{{.*}} -> %[[I_MEM:.*]] : !fir.ref<i32>) {
 !CHECK-NEXT:   omp.loop_nest (%[[IV:.*]]) : i32 = (%[[LB]]) to (%[[UB]]) inclusive step (%[[STEP]]) {
+!CHECK:          %[[I:.*]]:2 = hlfir.declare %[[I_MEM]] {uniq_name = "_QFlastprivate_iv_incEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 !CHECK:          fir.store %[[IV]] to %[[I]]#1 : !fir.ref<i32>
 !CHECK:          %[[V:.*]] = arith.addi %[[IV]], %[[STEP]] : i32
 !CHECK:          %[[C0:.*]] = arith.constant 0 : i32
@@ -42,15 +40,12 @@ subroutine lastprivate_iv_inc()
 
 !CHECK:      %[[I2_MEM:.*]] = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFlastprivate_iv_decEi"}
 !CHECK:      %[[I2:.*]]:2 = hlfir.declare %[[I2_MEM]] {uniq_name = "_QFlastprivate_iv_decEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-
-!CHECK:      %[[I_MEM:.*]] = fir.alloca i32 {bindc_name = "i", pinned, {{.*}}}
-!CHECK:      %[[I:.*]]:2 = hlfir.declare %[[I_MEM]] {uniq_name = "_QFlastprivate_iv_decEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
-
 !CHECK:      %[[LB:.*]] = arith.constant 10 : i32
 !CHECK:      %[[UB:.*]] = arith.constant 1 : i32
 !CHECK:      %[[STEP:.*]]  = arith.constant -3 : i32
-!CHECK:      omp.wsloop {
+!CHECK:      omp.wsloop private(@{{.*}} %{{.*}} -> %[[I_MEM:.*]] : !fir.ref<i32>) {
 !CHECK-NEXT:   omp.loop_nest (%[[IV:.*]]) : i32 = (%[[LB]]) to (%[[UB]]) inclusive step (%[[STEP]]) {
+!CHECK:          %[[I:.*]]:2 = hlfir.declare %[[I_MEM]] {uniq_name = "_QFlastprivate_iv_decEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
 !CHECK:          fir.store %[[IV]] to %[[I]]#1 : !fir.ref<i32>
 !CHECK:          %[[V:.*]] = arith.addi %[[IV]], %[[STEP]] : i32
 !CHECK:          %[[C0:.*]] = arith.constant 0 : i32
@@ -80,7 +75,7 @@ subroutine lastprivate_iv_dec()
 subroutine lastprivate_iv_i1
   integer*1 :: i1
   i1=0
-!CHECK:    omp.wsloop
+!CHECK:    omp.wsloop private({{.*}})
 !CHECK:      omp.loop_nest
 !CHECK:        fir.if %{{.*}} {
 !CHECK:          %[[I8_VAL:.*]] = fir.convert %{{.*}} : (i32) -> i8
diff --git a/flang/test/Lower/OpenMP/location.f90 b/flang/test/Lower/OpenMP/location.f90
index 2dab22a1c1f90d..fc7dd434998638 100644
--- a/flang/test/Lower/OpenMP/location.f90
+++ b/flang/test/Lower/OpenMP/location.f90
@@ -28,7 +28,7 @@ subroutine sub_target()
 
 !CHECK-LABEL: sub_loop
 subroutine sub_loop()
-!CHECK: omp.wsloop {
+!CHECK: omp.wsloop private({{.*}}) {
 !CHECK-NEXT: omp.loop_nest {{.*}} {
   !$omp do
   do i=1,10
diff --git a/flang/test/Lower/OpenMP/order-clause.f90 b/flang/test/Lower/OpenMP/order-clause.f90
index 717d9740c56f80..75b3a27bbd7fe1 100644
--- a/flang/test/Lower/OpenMP/order-clause.f90
+++ b/flang/test/Lower/OpenMP/order-clause.f90
@@ -20,15 +20,15 @@ end subroutine simd_order
 
 !CHECK-LABEL:   func.func @_QPdo_order() {
 subroutine do_order
-   !CHECK: omp.wsloop order(reproducible:concurrent) {
+   !CHECK: omp.wsloop order(reproducible:concurrent) private({{.*}}) {
    !$omp do order(concurrent)
    do i = 1, 10
    end do
-   !CHECK: omp.wsloop order(reproducible:concurrent) {
+   !CHECK: omp.wsloop order(reproducible:concurrent) private({{.*}}) {
    !$omp do order(reproducible:concurrent)
    do i = 1, 10
    end do
-   !CHECK: omp.wsloop order(unconstrained:concurrent) {
+   !CHECK: omp.wsloop order(unconstrained:concurrent) private({{.*}}) {
    !$omp do order(unconstrained:concurrent)
    do i = 1, 10
    end do
diff --git a/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90 b/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90
index 86309a24f91a03..531413c124f81b 100644
--- a/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90
+++ b/flang/test/Lower/OpenMP/parallel-lastprivate-clause-scalar.f90
@@ -10,12 +10,12 @@
 !CHECK-DAG: %[[ARG1_DECL:.*]]:2 = hlfir.declare %[[ARG1_REF]] typeparams %[[FIVE]] dummy_scope %{{[0-9]+}} {uniq_name = "_QFlastprivate_characterEarg1"} : (!fir.ref<!fir.char<1,5>>, index, !fir.dscope) -> (!fir.ref<!fir.char<1,5>>, !fir.ref<!fir.char<1,5>>)
 
 !CHECK: omp.parallel {
-!CHECK-DAG: %[[ARG1_PVT:.*]] = fir.alloca !fir.char<1,5> {bindc_name = "arg1", pin...
[truncated]

@ergawy ergawy force-pushed the enable_dealyed_priv_ws.loop branch 2 times, most recently from 98eaa4b to 705865a Compare January 10, 2025 15:16
@ergawy
Copy link
Member Author

ergawy commented Jan 13, 2025

Ping! Can you give this PR a look 👀? Specially the "temporary" workaround until @tblah finishes updating the omp.private op (I don't want to rush him, hence the workaround).

Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Kareem, the changes to Flang lowering and test updates LGTM. The LLVM IR translation workaround seems fine as well, but perhaps someone else can give it a look since I'm not so familiar with that part.

@ergawy ergawy force-pushed the enable_dealyed_priv_ws.loop branch 2 times, most recently from b79bcf5 to a50e46f Compare January 13, 2025 14:32
Copy link

github-actions bot commented Jan 13, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@ergawy ergawy force-pushed the enable_dealyed_priv_ws.loop branch 2 times, most recently from 5754fe0 to 423fdaf Compare January 13, 2025 15:39
@tblah
Copy link
Contributor

tblah commented Jan 14, 2025

I think the workaround looks okay.

I found a bug (I think) unrelated to the workaround:

program first_and_lastprivate
  integer i
  integer :: x(100) = 1
  !$omp parallel do firstprivate(x) lastprivate(x)
    do i=1,1
      if (any(x/=1)) then
        stop 1
      end if
    end do
  !$omp end parallel do
end program

This only reproduces when lastprivate is used.

@ergawy
Copy link
Member Author

ergawy commented Jan 14, 2025

I found a bug (I think) unrelated to the workaround:

Seems to be working fine with me: flang -S -emit-llvm -fopenmp -o /tmp/test.ll /tmp/test.f90. Did I miss something?

Copy link
Contributor

@luporl luporl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG. There are 2 typos in the title: privatizaiton and defualt.

@ergawy ergawy changed the title [flang][OpenMP] Enable delayed privatizaiton by defualt omp.wsloop [flang][OpenMP] Enable delayed privatization by default omp.wsloop Jan 15, 2025
@tblah
Copy link
Contributor

tblah commented Jan 15, 2025

I found a bug (I think) unrelated to the workaround:

Seems to be working fine with me: flang -S -emit-llvm -fopenmp -o /tmp/test.ll /tmp/test.f90. Did I miss something?

Sorry I didn't explain: the generated code is wrong. On my system I hit the stop 1 when running the code.

@ergawy
Copy link
Member Author

ergawy commented Jan 15, 2025

I found a bug (I think) unrelated to the workaround:

Seems to be working fine with me: flang -S -emit-llvm -fopenmp -o /tmp/test.ll /tmp/test.f90. Did I miss something?

Sorry I didn't explain: the generated code is wrong. On my system I hit the stop 1 when running the code.

Ah, I see. I will take a look in a bit then ...

@ergawy
Copy link
Member Author

ergawy commented Jan 15, 2025

I found a bug (I think) unrelated to the workaround:

It is indeed a bug and a weird one 🤔. When lastprivate is used, the arguments of copy block in the privatizer are swtiched:

copy {
^bb0(%arg0: !fir.ref<i32>, %arg1: !fir.ref<i32>):
  %0 = fir.load %arg1 : !fir.ref<i32>
  hlfir.assign %0 to %arg0 : i32, !fir.ref<i32>
  omp.yield(%arg1 : !fir.ref<i32>)
}

so it stores from the private copy to the original host one and not the other way around like it should. The weird thing is, the last private update is emitted properly in the body of the op itself.

Might be a problem with how values are managed by the symbol table. But not sure yet, looking ....

@luporl
Copy link
Contributor

luporl commented Jan 15, 2025

It is indeed a bug and a weird one 🤔. When lastprivate is used, the arguments of copy block in the privatizer are swtiched:

copyHostAssociateVar() switches the copy direction when sym has Flag::OmpLastPrivate set (and copyAssignIP is not null).

@ergawy
Copy link
Member Author

ergawy commented Jan 15, 2025

It is indeed a bug and a weird one 🤔. When lastprivate is used, the arguments of copy block in the privatizer are swtiched:

copyHostAssociateVar() switches the copy direction when sym has Flag::OmpLastPrivate set (and copyAssignIP is not null).

Yup, I had fun finding that out :). I have a fix and will open a pr soon.

This enable delayed privatization by default for `omp.wsloop` ops, with
one caveat! I had to workaround the "impure" alloc region issue that
being resolved at the moment. The workaround detects whether the alloc
region's argument is used in the region and at the same time defined in
block that does not dominate the chosen alloca insertion point. If so,
we move the alloca insertion point below the defining instruction of the
alloc region argument. This basically reverts to the
non-delayed-privatizaiton behavior.
@ergawy
Copy link
Member Author

ergawy commented Jan 16, 2025

Created a follow up PR with a possible fix: #123162.

@ergawy
Copy link
Member Author

ergawy commented Jan 16, 2025

@tblah @luporl are you folks fine with merging this PR given the follow up one for lastprivate?

Copy link
Contributor

@luporl luporl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ergawy ergawy merged commit 6f82408 into llvm:main Jan 16, 2025
8 checks passed
ergawy added a commit that referenced this pull request Jan 16, 2025
…r` (#123162)

This fixes a bug when the same variable is used in `firstprivate` and
`lastprivate` clauses on the same construct. The issue boils down to the
fact that `copyHostAssociateVar` was deciding the direction of the copy
assignment (i.e. the `lhs` and `rhs`) based on whether the
`copyAssignIP`
parameter is set. This is not the best way to do it since it is not
related to whether we doing a copy from host to localized copy or the
other way around. When we set the insertion for `firstprivate` in
delayed privatization, this resulted in switching the direction of the
copy assignment. Instead, this PR adds a new paramter to explicitely
tell
the function the direction of the assignment.

This is a follow up PR for
#122471, only the latest commit
is relevant.
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 16, 2025
…AssociateVar` (#123162)

This fixes a bug when the same variable is used in `firstprivate` and
`lastprivate` clauses on the same construct. The issue boils down to the
fact that `copyHostAssociateVar` was deciding the direction of the copy
assignment (i.e. the `lhs` and `rhs`) based on whether the
`copyAssignIP`
parameter is set. This is not the best way to do it since it is not
related to whether we doing a copy from host to localized copy or the
other way around. When we set the insertion for `firstprivate` in
delayed privatization, this resulted in switching the direction of the
copy assignment. Instead, this PR adds a new paramter to explicitely
tell
the function the direction of the assignment.

This is a follow up PR for
llvm/llvm-project#122471, only the latest commit
is relevant.
ergawy added a commit to ergawy/llvm-project that referenced this pull request Jan 17, 2025
…wsloop` (llvm#122471)"

This seems to have caused some regressions in Fujitsu's test-suite:
https://linaro.atlassian.net/browse/LLVM-1521

This reverts commit 6f82408.
@ergawy
Copy link
Member Author

ergawy commented Jan 17, 2025

Reverted while investigating the Fujitsu regressions: #123324

ergawy added a commit to ergawy/llvm-project that referenced this pull request Jan 20, 2025
…wsloop` (llvm#122471)"

This seems to have caused some regressions in Fujitsu's test-suite:
https://linaro.atlassian.net/browse/LLVM-1521

This reverts commit 6f82408.
ergawy added a commit to ergawy/llvm-project that referenced this pull request Jan 22, 2025
…wsloop` (llvm#122471)"

This seems to have caused some regressions in Fujitsu's test-suite:
https://linaro.atlassian.net/browse/LLVM-1521

This reverts commit 6f82408.
ergawy added a commit that referenced this pull request Jan 22, 2025
…wsloop` (#122471)" (#123324)

This seems to have caused some regressions in Fujitsu's test-suite:
https://linaro.atlassian.net/browse/LLVM-1521

This reverts commit 6f82408.
ergawy added a commit that referenced this pull request Feb 6, 2025
…125732)

Reapplies #122471

This is based on #125699, only
the latest commit is relevant.

With changes in this PR and the parent one, the previously reported
failures in the Fujitsu(*) test suite should hopefully be resolved (I
verified all the 14 reported failures and they pass now).

(*) https://linaro.atlassian.net/browse/LLVM-1521
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
…lvm#125732)

Reapplies llvm#122471

This is based on llvm#125699, only
the latest commit is relevant.

With changes in this PR and the parent one, the previously reported
failures in the Fujitsu(*) test suite should hopefully be resolved (I
verified all the 14 reported failures and they pass now).

(*) https://linaro.atlassian.net/browse/LLVM-1521
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants