-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[llvm][OpenMP] Add implicit cast to omp.atomic.read #114659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[llvm][OpenMP] Add implicit cast to omp.atomic.read #114659
Conversation
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-flang-openmp Author: None (NimishMishra) ChangesShould the operands of Fixes #112908 Full diff: https://github.com/llvm/llvm-project/pull/114659.diff 2 Files Affected:
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index d2e4dc1c85dfd2..9e69b730884ff6 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -265,6 +265,32 @@ computeOpenMPScheduleType(ScheduleKind ClauseKind, bool HasChunks,
return Result;
}
+/// Emit an implicit cast to convert \p XRead to type of variable \p V
+static llvm::Value *emitImplicitCast(IRBuilder<> &Builder, llvm::Value *XRead,
+ llvm::Value *V) {
+ llvm::Type *XReadType = XRead->getType();
+ llvm::Type *VType = V->getType();
+ if (llvm::AllocaInst *vAlloca = dyn_cast<llvm::AllocaInst>(V))
+ VType = vAlloca->getAllocatedType();
+
+ if (XReadType->isStructTy() && VType->isStructTy())
+ // No need to extract or convert. A direct
+ // `store` will suffice.
+ return XRead;
+
+ if (XRead->getType()->isStructTy())
+ XRead = Builder.CreateExtractValue(XRead, /*Idxs=*/0);
+ if (VType->isIntegerTy() && XReadType->isFloatingPointTy())
+ XRead = Builder.CreateFPToSI(XRead, VType);
+ else if (VType->isFloatingPointTy() && XReadType->isIntegerTy())
+ XRead = Builder.CreateSIToFP(XRead, VType);
+ else if (VType->isIntegerTy() && XReadType->isIntegerTy())
+ XRead = Builder.CreateIntCast(XRead, VType, true);
+ else if (VType->isFloatingPointTy() && XReadType->isFloatingPointTy())
+ XRead = Builder.CreateFPCast(XRead, VType);
+ return XRead;
+}
+
/// Make \p Source branch to \p Target.
///
/// Handles two situations:
@@ -8076,6 +8102,8 @@ OpenMPIRBuilder::createAtomicRead(const LocationDescription &Loc,
}
}
checkAndEmitFlushAfterAtomic(Loc, AO, AtomicKind::Read);
+ if (XRead->getType() != V.Var->getType())
+ XRead = emitImplicitCast(Builder, XRead, V.Var);
Builder.CreateStore(XRead, V.Var, V.IsVolatile);
return Builder.saveIP();
}
@@ -8360,6 +8388,8 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createAtomicCapture(
return AtomicResult.takeError();
Value *CapturedVal =
(IsPostfixUpdate ? AtomicResult->first : AtomicResult->second);
+ if (CapturedVal->getType() != V.Var->getType())
+ CapturedVal = emitImplicitCast(Builder, CapturedVal, V.Var);
Builder.CreateStore(CapturedVal, V.Var, V.IsVolatile);
checkAndEmitFlushAfterAtomic(Loc, AO, AtomicKind::Capture);
diff --git a/mlir/test/Target/LLVMIR/openmp-llvm.mlir b/mlir/test/Target/LLVMIR/openmp-llvm.mlir
index 49f9f3562c78b5..68dd9a66652b1c 100644
--- a/mlir/test/Target/LLVMIR/openmp-llvm.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-llvm.mlir
@@ -1368,6 +1368,77 @@ llvm.func @omp_atomic_read(%arg0 : !llvm.ptr, %arg1 : !llvm.ptr) -> () {
// -----
+// CHECK-LABEL: @omp_atomic_read_implicit_cast
+llvm.func @omp_atomic_read_implicit_cast () {
+//CHECK: %[[Z:.*]] = alloca float, i64 1, align 4
+//CHECK: %[[Y:.*]] = alloca double, i64 1, align 8
+//CHECK: %[[X:.*]] = alloca [2 x { float, float }], i64 1, align 8
+//CHECK: %[[W:.*]] = alloca i32, i64 1, align 4
+//CHECK: %[[X_ELEMENT:.*]] = getelementptr { float, float }, ptr %3, i64 0
+ %0 = llvm.mlir.constant(1 : i64) : i64
+ %1 = llvm.alloca %0 x f32 {bindc_name = "z"} : (i64) -> !llvm.ptr
+ %2 = llvm.mlir.constant(1 : i64) : i64
+ %3 = llvm.alloca %2 x f64 {bindc_name = "y"} : (i64) -> !llvm.ptr
+ %4 = llvm.mlir.constant(1 : i64) : i64
+ %5 = llvm.alloca %4 x !llvm.array<2 x struct<(f32, f32)>> {bindc_name = "x"} : (i64) -> !llvm.ptr
+ %6 = llvm.mlir.constant(1 : i64) : i64
+ %7 = llvm.alloca %6 x i32 {bindc_name = "w"} : (i64) -> !llvm.ptr
+ %8 = llvm.mlir.constant(1 : index) : i64
+ %9 = llvm.mlir.constant(2 : index) : i64
+ %10 = llvm.mlir.constant(1 : i64) : i64
+ %11 = llvm.mlir.constant(0 : i64) : i64
+ %12 = llvm.sub %8, %10 overflow<nsw> : i64
+ %13 = llvm.mul %12, %10 overflow<nsw> : i64
+ %14 = llvm.mul %13, %10 overflow<nsw> : i64
+ %15 = llvm.add %14, %11 overflow<nsw> : i64
+ %16 = llvm.mul %10, %9 overflow<nsw> : i64
+ %17 = llvm.getelementptr %5[%15] : (!llvm.ptr, i64) -> !llvm.ptr, !llvm.struct<(f32, f32)>
+
+//CHECK: %[[ATOMIC_LOAD_TEMP:.*]] = alloca { float, float }, align 8
+//CHECK: call void @__atomic_load(i64 8, ptr %[[X_ELEMENT]], ptr %[[ATOMIC_LOAD_TEMP]], i32 0)
+//CHECK: %[[LOAD:.*]] = load { float, float }, ptr %[[ATOMIC_LOAD_TEMP]], align 8
+//CHECK: %[[EXT:.*]] = extractvalue { float, float } %[[LOAD]], 0
+//CHECK: store float %[[EXT]], ptr %[[Y]], align 4
+ omp.atomic.read %3 = %17 : !llvm.ptr, !llvm.struct<(f32, f32)>
+
+//CHECK: %[[ATOMIC_LOAD_TEMP:.*]] = load atomic i32, ptr %[[Z]] monotonic, align 4
+//CHECK: %[[CAST:.*]] = bitcast i32 %[[ATOMIC_LOAD_TEMP]] to float
+//CHECK: %[[LOAD:.*]] = fpext float %[[CAST]] to double
+//CHECK: store double %[[LOAD]], ptr %[[Y]], align 8
+ omp.atomic.read %3 = %1 : !llvm.ptr, f32
+
+//CHECK: %[[ATOMIC_LOAD_TEMP:.*]] = load atomic i32, ptr %[[W]] monotonic, align 4
+//CHECK: %[[LOAD:.*]] = sitofp i32 %[[ATOMIC_LOAD_TEMP]] to double
+//CHECK: store double %[[LOAD]], ptr %[[Y]], align 8
+ omp.atomic.read %3 = %7 : !llvm.ptr, i32
+
+//CHECK: %[[ATOMIC_LOAD_TEMP:.*]] = load atomic i64, ptr %[[Y]] monotonic, align 4
+//CHECK: %[[CAST:.*]] = bitcast i64 %[[ATOMIC_LOAD_TEMP]] to double
+//CHECK: %[[LOAD:.*]] = fptrunc double %[[CAST]] to float
+//CHECK: store float %[[LOAD]], ptr %[[Z]], align 4
+ omp.atomic.read %1 = %3 : !llvm.ptr, f64
+
+//CHECK: %[[ATOMIC_LOAD_TEMP:.*]] = load atomic i32, ptr %[[W]] monotonic, align 4
+//CHECK: %[[LOAD:.*]] = sitofp i32 %[[ATOMIC_LOAD_TEMP]] to float
+//CHECK: store float %[[LOAD]], ptr %[[Z]], align 4
+ omp.atomic.read %1 = %7 : !llvm.ptr, i32
+
+//CHECK: %[[ATOMIC_LOAD_TEMP:.*]] = load atomic i64, ptr %[[Y]] monotonic, align 4
+//CHECK: %[[CAST:.*]] = bitcast i64 %[[ATOMIC_LOAD_TEMP]] to double
+//CHECK: %[[LOAD:.*]] = fptosi double %[[CAST]] to i32
+//CHECK: store i32 %[[LOAD]], ptr %[[W]], align 4
+ omp.atomic.read %7 = %3 : !llvm.ptr, f64
+
+//CHECK: %[[ATOMIC_LOAD_TEMP:.*]] = load atomic i32, ptr %[[Z]] monotonic, align 4
+//CHECK: %[[CAST:.*]] = bitcast i32 %[[ATOMIC_LOAD_TEMP]] to float
+//CHECK: %[[LOAD:.*]] = fptosi float %[[CAST]] to i32
+//CHECK: store i32 %[[LOAD]], ptr %[[W]], align 4
+ omp.atomic.read %7 = %1 : !llvm.ptr, f32
+ llvm.return
+}
+
+// -----
+
// CHECK-LABEL: @omp_atomic_write
// CHECK-SAME: (ptr %[[x:.*]], i32 %[[expr:.*]])
llvm.func @omp_atomic_write(%x: !llvm.ptr, %expr: i32) -> () {
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for circling back and fixing this. I have a few minor comments
llvm::Type *XReadType = XRead->getType(); | ||
llvm::Type *VType = V->getType(); | ||
if (llvm::AllocaInst *vAlloca = dyn_cast<llvm::AllocaInst>(V)) | ||
VType = vAlloca->getAllocatedType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would AllocaInst::getAllocatedType be different to the value's type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the type of atomic pointer by default is ptr
. We would like to know the real type (i.e. AllocatedType
) in order to know whether we need a conversion or an extraction (in case of complex type).
// `store` will suffice. | ||
return XRead; | ||
|
||
if (XRead->getType()->isStructTy()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use XReadType
here?
|
||
if (XRead->getType()->isStructTy()) | ||
XRead = Builder.CreateExtractValue(XRead, /*Idxs=*/0); | ||
if (VType->isIntegerTy() && XReadType->isFloatingPointTy()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this can be else if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually if you consider the test case:
program atomic_array
complex :: x(2)
integer :: y
x(1) = (1.0, 1.0)
x(2) = (2.0, 2.0)
!$omp parallel
!$omp atomic read
y = x(1)
!$omp end parallel
print *, y
end program
Then we need to extract the real component of complex
into a float
, and store it to integer
. We thus need this check in place to ensure the conversion takes place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder we should use #101966 instead which handles type punning consistently by itself (https://github.com/llvm/llvm-project/blob/ee84c6bb3146d25f21c14d6d8e0abb794683f5ec/llvm/lib/Transforms/Utils/BuildBuiltins.cpp#L123-L131)
XRead = Builder.CreateFPToSI(XRead, VType); | ||
else if (VType->isFloatingPointTy() && XReadType->isIntegerTy()) | ||
XRead = Builder.CreateSIToFP(XRead, VType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An atomic read should not have implicit semantics of converting between int and fp. That could cause some trouble, e.g. rounding.
if (XRead->getType() != V.Var->getType()) | ||
XRead = emitImplicitCast(Builder, XRead, V.Var); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code above seems to already handle casting?
I would remove the changes to Clang and AtomicsExpandPass (in the first iteration). Those were for checking whether the API is sufficiently general. |
I added However, I hope the benfits become clear:
|
Thanks for your work on this. I went through the patch, and it does seem this approach is better. I see this PR can also help with sized libcalls. I wonder whether work on this needs a separate PR, or whether we can merge with the current PR? Since, it is more generic and solves multiple issues at once. I am okay with picking this work and opening a new PR, where we can discuss improvements and iterate over it. Let me know what you think. |
I discussed this with @kiranktp. We will work on #101966 as a separate PR, since this approach can help with multiple issues (like sized libcalls in atomic). If everyone is okay, we can merge this current PR for the time being. This would unblock progress towards removal of the experimental message currently. I'll revise this PR with @tblah's comments. |
@NimishMishra is this patch still needed? |
08c34da
to
dcdc2b7
Compare
I'm sorry; this fell off my radar. Yes, without this, the test-cases in #112908 will segfault. I have addressed your comments, and have added a TODO based on what Michael suggested (and added that to our internal tracking system too). I'll work on integrating Michael's solution. In the meantime, we can probably go with this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Should the operands of
omp.atomic.read
differ, emit an implicit cast. In case ofstruct
arguments, extract the 0-th index, emit an implicit cast if required, and store at the destination.Fixes #112908