Skip to content

[llvm][OpenMP] Handle complex types in atomic read #111377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 23, 2024

Conversation

NimishMishra
Copy link
Contributor

This patch adds functionality for atomically reading llvm.struct types.

Fixes: #93441

@llvmbot
Copy link
Member

llvmbot commented Oct 7, 2024

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-flang-openmp

Author: None (NimishMishra)

Changes

This patch adds functionality for atomically reading llvm.struct types.

Fixes: #93441


Full diff: https://github.com/llvm/llvm-project/pull/111377.diff

2 Files Affected:

  • (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+13-1)
  • (modified) mlir/test/Target/LLVMIR/openmp-llvm.mlir (+22)
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 47cc6ff7655caf..b2c1ea6b805fd6 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -7822,7 +7822,7 @@ OpenMPIRBuilder::createAtomicRead(const LocationDescription &Loc,
          "OMP Atomic expects a pointer to target memory");
   Type *XElemTy = X.ElemTy;
   assert((XElemTy->isFloatingPointTy() || XElemTy->isIntegerTy() ||
-          XElemTy->isPointerTy()) &&
+          XElemTy->isPointerTy() || XElemTy->isStructTy()) &&
          "OMP atomic read expected a scalar type");
 
   Value *XRead = nullptr;
@@ -7832,6 +7832,18 @@ OpenMPIRBuilder::createAtomicRead(const LocationDescription &Loc,
         Builder.CreateLoad(XElemTy, X.Var, X.IsVolatile, "omp.atomic.read");
     XLD->setAtomic(AO);
     XRead = cast<Value>(XLD);
+  } else if (XElemTy->isStructTy()) {
+    LoadInst *OldVal = Builder.CreateLoad(XElemTy, X.Var, "omp.atomic.read");
+    OldVal->setAtomic(AO);
+    const DataLayout &LoadDL = OldVal->getModule()->getDataLayout();
+    unsigned LoadSize =
+        LoadDL.getTypeStoreSize(OldVal->getPointerOperand()->getType());
+    OpenMPIRBuilder::AtomicInfo atomicInfo(
+        &Builder, XElemTy, LoadSize * 8, LoadSize * 8, OldVal->getAlign(),
+        OldVal->getAlign(), true /* UseLibcall */, X.Var);
+    auto AtomicLoadRes = atomicInfo.EmitAtomicLoadLibcall(AO);
+    XRead = AtomicLoadRes.first;
+    OldVal->eraseFromParent();
   } else {
     // We need to perform atomic op as integer
     IntegerType *IntCastTy =
diff --git a/mlir/test/Target/LLVMIR/openmp-llvm.mlir b/mlir/test/Target/LLVMIR/openmp-llvm.mlir
index 5d76e87472dfe4..47a7e0a7634cdc 100644
--- a/mlir/test/Target/LLVMIR/openmp-llvm.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-llvm.mlir
@@ -1564,6 +1564,28 @@ llvm.func @_QPomp_atomic_capture_complex() {
 
 // -----
 
+// CHECK-LABEL: define void @omp_atomic_read_complex() {
+llvm.func @omp_atomic_read_complex(){
+
+// CHECK: %[[a:.*]] = alloca { float, float }, i64 1, align 8
+// CHECK: %[[b:.*]] = alloca { float, float }, i64 1, align 8
+// CHECK: %[[ATOMIC_TEMP_LOAD:.*]] = alloca { float, float }, align 8
+// CHECK: call void @__atomic_load(i64 8, ptr %[[b]], ptr %[[ATOMIC_TEMP_LOAD]], i32 0)
+// CHECK: %[[LOADED_VAL:.*]] = load { float, float }, ptr %[[ATOMIC_TEMP_LOAD]], align 8
+// CHECK: store { float, float } %[[LOADED_VAL]], ptr %[[a]], align 4
+// CHECK: ret void
+// CHECK: }
+
+    %0 = llvm.mlir.constant(1 : i64) : i64
+    %1 = llvm.alloca %0 x !llvm.struct<(f32, f32)> {bindc_name = "ib"} : (i64) -> !llvm.ptr
+    %2 = llvm.mlir.constant(1 : i64) : i64
+    %3 = llvm.alloca %2 x !llvm.struct<(f32, f32)> {bindc_name = "ia"} : (i64) -> !llvm.ptr
+    omp.atomic.read %1 = %3 : !llvm.ptr, !llvm.struct<(f32, f32)>
+    llvm.return
+}
+
+// -----
+
 // Checking an order-dependent operation when the order is `expr binop x`
 // CHECK-LABEL: @omp_atomic_update_ordering
 // CHECK-SAME: (ptr %[[x:.*]], i32 %[[expr:.*]])

@tblah
Copy link
Contributor

tblah commented Oct 8, 2024

Thanks for your work on this.

One of our tests segfaults at runtime. Here is a minimal reproducer (that should loop forever)

program complex_read
  complex :: x = 0
  complex v
  !$omp parallel
  do
    !$omp atomic read
      v = x
    if (v /= x) exit
  end do
  !$omp end parallel
end program

I am testing on aarch64 (aws graviton 3) and using the atomic implementation from compiler-rt.

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tblah Is the segfault with this patch or without it?

OpenMPIRBuilder::AtomicInfo atomicInfo(
&Builder, XElemTy, LoadSize * 8, LoadSize * 8, OldVal->getAlign(),
OldVal->getAlign(), true /* UseLibcall */, X.Var);
auto AtomicLoadRes = atomicInfo.EmitAtomicLoadLibcall(AO);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Libcalls should only be used if the device does not support atomicrmw of that size. E.g. when the struct is only 4 bytes long, emit atomicrmw instead of call __atomic_load.

I think this is part of the larger issue of handling atomics properly. For the immediate problem, this should work, but I suggest to add a TODO/FIXME comment here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Meinersbur I have added a FIXME for this point. Thanks. If everything else looks okay, can we merge this?

@tblah
Copy link
Contributor

tblah commented Oct 8, 2024

@tblah Is the segfault with this patch or without it?

The segfault is with this patch. Without this patch flang-new hits an assertion failure.

@NimishMishra
Copy link
Contributor Author

Thanks @tblah for testing this PR. Without this patch, I believe the assertion would be related to emitting an atomic read on non-integer types.

I'll investigate on this and get back.

@NimishMishra
Copy link
Contributor Author

NimishMishra commented Oct 18, 2024

Hi @tblah,

Thanks for the test. I could reproduce the issue with my patch.

GDB shows the segfault is in atomic_load. I have a couple of questions regarding the test that can help in triaging this better:

I was not able to reproduce the segfault on the following test, which is strange, because I did not expect the segfault to depend on do

program complex_read
  complex :: x = 0
  complex v
  !$omp parallel
    !$omp atomic read
      v = x
  !$omp end parallel
end program

Moreover, the following test (where I remove the if), I observe a crash.

program complex_read
  complex :: x = 0
  complex v
  !$omp parallel
  do
    !$omp atomic read
      v = x
  end do
  !$omp end parallel
end program

Taking a step further, if I remove the atomic read (i.e. the following test), the crash still happens.

program complex_read
  complex :: x = 0
  complex v
  !$omp parallel
  do
      v = x
  end do
  !$omp end parallel
end program

@tblah
Copy link
Contributor

tblah commented Oct 18, 2024

Thanks for taking a look.

GDB shows the segfault is in atomic_load. I have a couple of questions regarding the test that can help in triaging this better:

I have tried this using both gnu libatomic and compiler-rt so I expect the bug is in what we are passing to atomic_load rather than atomic_load itself.

I get the same results on your three examples:

  1. Compiles and runs correctly
  2. compiler crash in llvm::CodeExtractor::findAllocas
  3. compiler crash in llvm::CodeExtractor::findAllocas

It isn't clear to me if the runtime crash in my test is the same bug or not. It might be. I have seen those compiler crashes before while making changes to mlir/lib/Target/LLVMIR/OpenMPToLLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp - if we generate malformed IR in the callbacks passed into OpenMPIRBuilder the code extractor can crash while trying to outline the openmp region.

I don't have a machine available to confirm, but I would guess (2) and (3) are reproducible on x86?

@tblah
Copy link
Contributor

tblah commented Oct 18, 2024

The LLVM IR for your final example right before the code extractor runs is as follows:

define void @_QQmain() #0 {
  %tid.addr = alloca i32, align 4
  %zero.addr = alloca i32, align 4
  %1 = alloca { float, float }, i64 1, align 8
  br label %entry

entry:                                            ; preds = %0
  %omp_global_thread_num = call i32 @__kmpc_global_thread_num(ptr @1)
  br label %omp.par.entry

omp.par.entry:                                    ; preds = %entry
  %tid.addr.local = alloca i32, align 4
  %tid = load i32, ptr %tid.addr.local, align 4
  %tid.addr.use = load i32, ptr %tid.addr, align 4
  %zero.addr.use = load i32, ptr %zero.addr, align 4
  br label %omp.reduction.init

omp.reduction.init:                               ; preds = %omp.par.entry
  br label %omp.par.region

omp.par.region:                                   ; preds = %omp.reduction.init
  br label %omp.par.region1

omp.par.region2:                                  ; preds = %omp.par.region2, %omp.par.region1
  %2 = load { float, float }, ptr @_QFEx, align 4
  store { float, float } %2, ptr %1, align 4
  br label %omp.par.region2

omp.par.region1:                                  ; preds = %omp.par.region
  br label %omp.par.region2

omp.region.cont:                                  ; No predecessors!
  br label %omp.par.pre_finalize

omp.par.pre_finalize:                             ; preds = %omp.region.cont
  br label %omp.par.outlined.exit

omp.par.outlined.exit:                            ; preds = %omp.par.pre_finalize
  br label %omp.par.exit.split

omp.par.exit.split:                               ; preds = %omp.par.outlined.exit
  unreachable
}

See the controlflow graph with omp.par.region2 unconditionally branching to itself for the infinite do loop.

My guess is that this is a different bug that the code extractor can't handle infinite loops in the control flow graph - it is looking for live out values from that CFG region but there's no way out of the region.

A much simpler example also crashes the compiler:

program infinite_loop
  integer :: i
  !$omp parallel
  do
    i = 0
  end do
  !$omp end parallel
end program

Surprisingly, equivalent C code does not crash the compiler.

In conclusion I think the compiler crash and the executable crash are different bugs. I will create a ticket for the compiler crash

@NimishMishra
Copy link
Contributor Author

Hi @tblah,

I have tested out this PR with working on #112908, and I do not see segfault (once we remove do). Is atomic read still segfaulting on aarch64?

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the updates.

I still have a few failing tests on my end that I need to minimize and figure out if they are really bugs with the atomic implementation. But I think in the meantime it is better to merge this because it does work for most cases.

@NimishMishra NimishMishra merged commit 645e6f1 into llvm:main Oct 23, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Flang][OpenMP] Compilation error when capture-statement described in atomic read construct contains a different type
4 participants