Skip to content

[llvm][OpenMP] Add implicit cast to omp.atomic.read #114659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 17, 2025

Conversation

NimishMishra
Copy link
Contributor

Should the operands of omp.atomic.read differ, emit an implicit cast. In case of struct arguments, extract the 0-th index, emit an implicit cast if required, and store at the destination.

Fixes #112908

@llvmbot
Copy link
Member

llvmbot commented Nov 2, 2024

@llvm/pr-subscribers-mlir
@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-flang-openmp

Author: None (NimishMishra)

Changes

Should the operands of omp.atomic.read differ, emit an implicit cast. In case of struct arguments, extract the 0-th index, emit an implicit cast if required, and store at the destination.

Fixes #112908


Full diff: https://github.com/llvm/llvm-project/pull/114659.diff

2 Files Affected:

  • (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+30)
  • (modified) mlir/test/Target/LLVMIR/openmp-llvm.mlir (+71)
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index d2e4dc1c85dfd2..9e69b730884ff6 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -265,6 +265,32 @@ computeOpenMPScheduleType(ScheduleKind ClauseKind, bool HasChunks,
   return Result;
 }
 
+/// Emit an implicit cast to convert \p XRead to type of variable \p V
+static llvm::Value *emitImplicitCast(IRBuilder<> &Builder, llvm::Value *XRead,
+                                     llvm::Value *V) {
+  llvm::Type *XReadType = XRead->getType();
+  llvm::Type *VType = V->getType();
+  if (llvm::AllocaInst *vAlloca = dyn_cast<llvm::AllocaInst>(V))
+    VType = vAlloca->getAllocatedType();
+
+  if (XReadType->isStructTy() && VType->isStructTy())
+    // No need to extract or convert. A direct
+    // `store` will suffice.
+    return XRead;
+
+  if (XRead->getType()->isStructTy())
+    XRead = Builder.CreateExtractValue(XRead, /*Idxs=*/0);
+  if (VType->isIntegerTy() && XReadType->isFloatingPointTy())
+    XRead = Builder.CreateFPToSI(XRead, VType);
+  else if (VType->isFloatingPointTy() && XReadType->isIntegerTy())
+    XRead = Builder.CreateSIToFP(XRead, VType);
+  else if (VType->isIntegerTy() && XReadType->isIntegerTy())
+    XRead = Builder.CreateIntCast(XRead, VType, true);
+  else if (VType->isFloatingPointTy() && XReadType->isFloatingPointTy())
+    XRead = Builder.CreateFPCast(XRead, VType);
+  return XRead;
+}
+
 /// Make \p Source branch to \p Target.
 ///
 /// Handles two situations:
@@ -8076,6 +8102,8 @@ OpenMPIRBuilder::createAtomicRead(const LocationDescription &Loc,
     }
   }
   checkAndEmitFlushAfterAtomic(Loc, AO, AtomicKind::Read);
+  if (XRead->getType() != V.Var->getType())
+    XRead = emitImplicitCast(Builder, XRead, V.Var);
   Builder.CreateStore(XRead, V.Var, V.IsVolatile);
   return Builder.saveIP();
 }
@@ -8360,6 +8388,8 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createAtomicCapture(
     return AtomicResult.takeError();
   Value *CapturedVal =
       (IsPostfixUpdate ? AtomicResult->first : AtomicResult->second);
+  if (CapturedVal->getType() != V.Var->getType())
+    CapturedVal = emitImplicitCast(Builder, CapturedVal, V.Var);
   Builder.CreateStore(CapturedVal, V.Var, V.IsVolatile);
 
   checkAndEmitFlushAfterAtomic(Loc, AO, AtomicKind::Capture);
diff --git a/mlir/test/Target/LLVMIR/openmp-llvm.mlir b/mlir/test/Target/LLVMIR/openmp-llvm.mlir
index 49f9f3562c78b5..68dd9a66652b1c 100644
--- a/mlir/test/Target/LLVMIR/openmp-llvm.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-llvm.mlir
@@ -1368,6 +1368,77 @@ llvm.func @omp_atomic_read(%arg0 : !llvm.ptr, %arg1 : !llvm.ptr) -> () {
 
 // -----
 
+// CHECK-LABEL: @omp_atomic_read_implicit_cast
+llvm.func @omp_atomic_read_implicit_cast () {
+//CHECK: %[[Z:.*]] = alloca float, i64 1, align 4
+//CHECK: %[[Y:.*]] = alloca double, i64 1, align 8
+//CHECK: %[[X:.*]] = alloca [2 x { float, float }], i64 1, align 8
+//CHECK: %[[W:.*]] = alloca i32, i64 1, align 4
+//CHECK: %[[X_ELEMENT:.*]] = getelementptr { float, float }, ptr %3, i64 0
+  %0 = llvm.mlir.constant(1 : i64) : i64
+  %1 = llvm.alloca %0 x f32 {bindc_name = "z"} : (i64) -> !llvm.ptr
+  %2 = llvm.mlir.constant(1 : i64) : i64
+  %3 = llvm.alloca %2 x f64 {bindc_name = "y"} : (i64) -> !llvm.ptr
+  %4 = llvm.mlir.constant(1 : i64) : i64
+  %5 = llvm.alloca %4 x !llvm.array<2 x struct<(f32, f32)>> {bindc_name = "x"} : (i64) -> !llvm.ptr
+  %6 = llvm.mlir.constant(1 : i64) : i64
+  %7 = llvm.alloca %6 x i32 {bindc_name = "w"} : (i64) -> !llvm.ptr
+  %8 = llvm.mlir.constant(1 : index) : i64
+  %9 = llvm.mlir.constant(2 : index) : i64
+  %10 = llvm.mlir.constant(1 : i64) : i64
+  %11 = llvm.mlir.constant(0 : i64) : i64
+  %12 = llvm.sub %8, %10 overflow<nsw> : i64
+  %13 = llvm.mul %12, %10 overflow<nsw> : i64
+  %14 = llvm.mul %13, %10 overflow<nsw> : i64
+  %15 = llvm.add %14, %11 overflow<nsw> : i64
+  %16 = llvm.mul %10, %9 overflow<nsw> : i64
+  %17 = llvm.getelementptr %5[%15] : (!llvm.ptr, i64) -> !llvm.ptr, !llvm.struct<(f32, f32)>
+
+//CHECK: %[[ATOMIC_LOAD_TEMP:.*]] = alloca { float, float }, align 8
+//CHECK: call void @__atomic_load(i64 8, ptr %[[X_ELEMENT]], ptr %[[ATOMIC_LOAD_TEMP]], i32 0)
+//CHECK: %[[LOAD:.*]] = load { float, float }, ptr %[[ATOMIC_LOAD_TEMP]], align 8
+//CHECK: %[[EXT:.*]] = extractvalue { float, float } %[[LOAD]], 0
+//CHECK: store float %[[EXT]], ptr %[[Y]], align 4
+  omp.atomic.read %3 = %17 : !llvm.ptr, !llvm.struct<(f32, f32)>
+
+//CHECK: %[[ATOMIC_LOAD_TEMP:.*]] = load atomic i32, ptr %[[Z]] monotonic, align 4
+//CHECK: %[[CAST:.*]] = bitcast i32 %[[ATOMIC_LOAD_TEMP]] to float
+//CHECK: %[[LOAD:.*]] = fpext float %[[CAST]] to double
+//CHECK: store double %[[LOAD]], ptr %[[Y]], align 8
+  omp.atomic.read %3 = %1 : !llvm.ptr, f32
+
+//CHECK: %[[ATOMIC_LOAD_TEMP:.*]] = load atomic i32, ptr %[[W]] monotonic, align 4
+//CHECK: %[[LOAD:.*]] = sitofp i32 %[[ATOMIC_LOAD_TEMP]] to double
+//CHECK: store double %[[LOAD]], ptr %[[Y]], align 8
+  omp.atomic.read %3 = %7 : !llvm.ptr, i32
+
+//CHECK: %[[ATOMIC_LOAD_TEMP:.*]] = load atomic i64, ptr %[[Y]] monotonic, align 4
+//CHECK: %[[CAST:.*]] = bitcast i64 %[[ATOMIC_LOAD_TEMP]] to double
+//CHECK: %[[LOAD:.*]] = fptrunc double %[[CAST]] to float
+//CHECK: store float %[[LOAD]], ptr %[[Z]], align 4
+  omp.atomic.read %1 = %3 : !llvm.ptr, f64
+
+//CHECK: %[[ATOMIC_LOAD_TEMP:.*]] = load atomic i32, ptr %[[W]] monotonic, align 4
+//CHECK: %[[LOAD:.*]] = sitofp i32 %[[ATOMIC_LOAD_TEMP]] to float
+//CHECK: store float %[[LOAD]], ptr %[[Z]], align 4
+  omp.atomic.read %1 = %7 : !llvm.ptr, i32
+
+//CHECK: %[[ATOMIC_LOAD_TEMP:.*]] = load atomic i64, ptr %[[Y]] monotonic, align 4
+//CHECK: %[[CAST:.*]] = bitcast i64 %[[ATOMIC_LOAD_TEMP]] to double
+//CHECK: %[[LOAD:.*]] = fptosi double %[[CAST]] to i32
+//CHECK: store i32 %[[LOAD]], ptr %[[W]], align 4
+  omp.atomic.read %7 = %3 : !llvm.ptr, f64
+
+//CHECK: %[[ATOMIC_LOAD_TEMP:.*]] = load atomic i32, ptr %[[Z]] monotonic, align 4
+//CHECK: %[[CAST:.*]] = bitcast i32 %[[ATOMIC_LOAD_TEMP]] to float
+//CHECK: %[[LOAD:.*]] = fptosi float %[[CAST]] to i32
+//CHECK: store i32 %[[LOAD]], ptr %[[W]], align 4
+  omp.atomic.read %7 = %1 : !llvm.ptr, f32
+  llvm.return
+}
+
+// -----
+
 // CHECK-LABEL: @omp_atomic_write
 // CHECK-SAME: (ptr %[[x:.*]], i32 %[[expr:.*]])
 llvm.func @omp_atomic_write(%x: !llvm.ptr, %expr: i32) -> () {

@NimishMishra NimishMishra changed the title [llvm] Add implicit cast to omp.atomic.read [llvm][OpenMP] Add implicit cast to omp.atomic.read Nov 2, 2024
Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for circling back and fixing this. I have a few minor comments

llvm::Type *XReadType = XRead->getType();
llvm::Type *VType = V->getType();
if (llvm::AllocaInst *vAlloca = dyn_cast<llvm::AllocaInst>(V))
VType = vAlloca->getAllocatedType();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would AllocaInst::getAllocatedType be different to the value's type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the type of atomic pointer by default is ptr. We would like to know the real type (i.e. AllocatedType) in order to know whether we need a conversion or an extraction (in case of complex type).

// `store` will suffice.
return XRead;

if (XRead->getType()->isStructTy())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use XReadType here?


if (XRead->getType()->isStructTy())
XRead = Builder.CreateExtractValue(XRead, /*Idxs=*/0);
if (VType->isIntegerTy() && XReadType->isFloatingPointTy())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this can be else if

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually if you consider the test case:

program atomic_array
  complex :: x(2)
  integer :: y

  x(1) = (1.0, 1.0)
  x(2) = (2.0, 2.0)
  !$omp parallel
    !$omp atomic read
      y = x(1)
  !$omp end parallel
  print *, y
end program

Then we need to extract the real component of complex into a float, and store it to integer. We thus need this check in place to ensure the conversion takes place.

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +284 to +286
XRead = Builder.CreateFPToSI(XRead, VType);
else if (VType->isFloatingPointTy() && XReadType->isIntegerTy())
XRead = Builder.CreateSIToFP(XRead, VType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An atomic read should not have implicit semantics of converting between int and fp. That could cause some trouble, e.g. rounding.

Comment on lines +8105 to +8404
if (XRead->getType() != V.Var->getType())
XRead = emitImplicitCast(Builder, XRead, V.Var);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code above seems to already handle casting?

@NimishMishra
Copy link
Contributor Author

I wonder we should use #101966 instead which handles type punning consistently by itself

I think we can. But how do go ahead with changes in Clang codegen? Will there be opposition from community if we try to merge a solution like #101966?

@Meinersbur
Copy link
Member

I would remove the changes to Clang and AtomicsExpandPass (in the first iteration). Those were for checking whether the API is sufficiently general.

@Meinersbur
Copy link
Member

Meinersbur commented Nov 7, 2024

I added emitAtomicLoadBuiltin and emitAtomicStoreBuiltin into #101966, and use it within OpenMPIRBuilder. The patch is not ready yet, tests are not updated, tits probably not yet completely correct, and code duplication needs to be reduced.

However, I hope the benfits become clear:

  • Just one case instead of multiple depending on data type.
  • All data types (incl. complete) able to use the cmpxchg,load,store,atomicrmw etc instead of the libcall fallback. Only the data size matters, not the type.

@NimishMishra
Copy link
Contributor Author

I added emitAtomicLoadBuiltin and emitAtomicStoreBuiltin into #101966, and use it within OpenMPIRBuilder. The patch is not ready yet, tests are not updated, tits probably not yet completely correct, and code duplication needs to be reduced.

Thanks for your work on this. I went through the patch, and it does seem this approach is better. I see this PR can also help with sized libcalls.

I wonder whether work on this needs a separate PR, or whether we can merge with the current PR? Since, it is more generic and solves multiple issues at once. I am okay with picking this work and opening a new PR, where we can discuss improvements and iterate over it. Let me know what you think.

@NimishMishra
Copy link
Contributor Author

NimishMishra commented Nov 29, 2024

I wonder whether work on this needs a separate PR, or whether we can merge with the current PR? Since, it is more generic and solves multiple issues at once. I am okay with picking this work and opening a new PR, where we can discuss improvements and iterate over it. Let me know what you think.

I discussed this with @kiranktp. We will work on #101966 as a separate PR, since this approach can help with multiple issues (like sized libcalls in atomic).

If everyone is okay, we can merge this current PR for the time being. This would unblock progress towards removal of the experimental message currently. I'll revise this PR with @tblah's comments.

@tblah
Copy link
Contributor

tblah commented Jan 14, 2025

@NimishMishra is this patch still needed?

@NimishMishra NimishMishra force-pushed the atomic_read_implicit_cast branch from 08c34da to dcdc2b7 Compare January 14, 2025 11:48
@NimishMishra
Copy link
Contributor Author

@NimishMishra is this patch still needed?

I'm sorry; this fell off my radar. Yes, without this, the test-cases in #112908 will segfault. I have addressed your comments, and have added a TODO based on what Michael suggested (and added that to our internal tracking system too). I'll work on integrating Michael's solution. In the meantime, we can probably go with this.

Copy link
Contributor

@tblah tblah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@NimishMishra NimishMishra merged commit d7e48fb into llvm:main Jan 17, 2025
8 checks passed
@NimishMishra NimishMishra deleted the atomic_read_implicit_cast branch January 17, 2025 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[flang][OpenMP] support arrays of atomic numbers
4 participants