-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[clang][OpenMP][SPIR-V] Fix AS of globals and set the default AS to 4 #135251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
Signed-off-by: Sarnie, Nick <[email protected]>
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: Nick Sarnie (sarnex) ChangesBased on feedback from #134399, we use the address space map that sets the default AS to 4 for OpenMP SPIR-V offload. The new AS map had the wrong target AS for After this PR, I will work on simplifying the condition of the old default AS 0 map to eventually only be OCL with no generic addrspace, but there are many failures so I wanted to do it step by step, and this is the first one. There are relatively minor changes to OpenMP codegen, mostly just addrspacecasts (because globals are AS 1 in SPIR-V, so we need to cast to AS 4/no AS somewhat often) or use the correct address space to create a global. Full diff: https://github.com/llvm/llvm-project/pull/135251.diff 8 Files Affected:
diff --git a/clang/lib/Basic/Targets/SPIR.h b/clang/lib/Basic/Targets/SPIR.h
index 5ea727364d24b..0f4f74ac95749 100644
--- a/clang/lib/Basic/Targets/SPIR.h
+++ b/clang/lib/Basic/Targets/SPIR.h
@@ -57,10 +57,11 @@ static const unsigned SPIRDefIsPrivMap[] = {
// Used by both the SPIR and SPIR-V targets.
static const unsigned SPIRDefIsGenMap[] = {
4, // Default
- // OpenCL address space values for this map are dummy and they can't be used
- 0, // opencl_global
+ // Some OpenCL address space values for this map are dummy and they can't be
+ // used
+ 1, // opencl_global
0, // opencl_local
- 0, // opencl_constant
+ 2, // opencl_constant
0, // opencl_private
0, // opencl_generic
0, // opencl_global_device
@@ -216,7 +217,8 @@ class LLVM_LIBRARY_VISIBILITY BaseSPIRTargetInfo : public TargetInfo {
/*DefaultIsGeneric=*/Opts.SYCLIsDevice ||
// The address mapping from HIP/CUDA language for device code is only
// defined for SPIR-V.
- (getTriple().isSPIRV() && Opts.CUDAIsDevice));
+ (getTriple().isSPIRV() &&
+ (Opts.CUDAIsDevice || Opts.OpenMPIsTargetDevice)));
}
void setSupportedOpenCLOpts() override {
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index 5736864d4cc6b..5780f1ded3259 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -2524,6 +2524,16 @@ void CGOpenMPRuntime::emitForDispatchInit(
Args);
}
+llvm::Value *CGOpenMPRuntime::createRuntimeFunctionArgAddrSpaceCast(
+ CodeGenFunction &CGF, llvm::FunctionCallee RuntimeFcn, size_t ArgIdx,
+ llvm::Value *Arg) {
+ llvm::Type *ParamTy = RuntimeFcn.getFunctionType()->getParamType(ArgIdx);
+ llvm::Type *ArgTy = Arg->getType();
+ if (!ParamTy->isPointerTy())
+ return Arg;
+ return CGF.Builder.CreateAddrSpaceCast(Arg, ParamTy);
+}
+
void CGOpenMPRuntime::emitForDispatchDeinit(CodeGenFunction &CGF,
SourceLocation Loc) {
if (!CGF.HaveInsertPoint())
@@ -2572,12 +2582,18 @@ static void emitForStaticInitCall(
ThreadId,
CGF.Builder.getInt32(addMonoNonMonoModifier(CGF.CGM, Schedule, M1,
M2)), // Schedule type
- Values.IL.emitRawPointer(CGF), // &isLastIter
- Values.LB.emitRawPointer(CGF), // &LB
- Values.UB.emitRawPointer(CGF), // &UB
- Values.ST.emitRawPointer(CGF), // &Stride
- CGF.Builder.getIntN(Values.IVSize, 1), // Incr
- Chunk // Chunk
+ CGOpenMPRuntime::createRuntimeFunctionArgAddrSpaceCast(
+ CGF, ForStaticInitFunction, 3,
+ Values.IL.emitRawPointer(CGF)), // &isLastIter
+ CGOpenMPRuntime::createRuntimeFunctionArgAddrSpaceCast(
+ CGF, ForStaticInitFunction, 4, Values.LB.emitRawPointer(CGF)), // &LB
+ CGOpenMPRuntime::createRuntimeFunctionArgAddrSpaceCast(
+ CGF, ForStaticInitFunction, 5, Values.UB.emitRawPointer(CGF)), // &UB
+ CGOpenMPRuntime::createRuntimeFunctionArgAddrSpaceCast(
+ CGF, ForStaticInitFunction, 6,
+ Values.ST.emitRawPointer(CGF)), // &Stride
+ CGF.Builder.getIntN(Values.IVSize, 1), // Incr
+ Chunk // Chunk
};
CGF.EmitRuntimeCall(ForStaticInitFunction, Args);
}
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.h b/clang/lib/CodeGen/CGOpenMPRuntime.h
index 4321712e1521d..c918c77b4266c 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.h
@@ -1657,6 +1657,11 @@ class CGOpenMPRuntime {
/// Returns true if the variable is a local variable in untied task.
bool isLocalVarInUntiedTask(CodeGenFunction &CGF, const VarDecl *VD) const;
+
+ static llvm::Value *
+ createRuntimeFunctionArgAddrSpaceCast(CodeGenFunction &CGF,
+ llvm::FunctionCallee RuntimeFcn,
+ size_t ArgIdx, llvm::Value *Arg);
};
/// Class supports emissionof SIMD-only code.
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index f697c13f4c522..0bfa49dee0c53 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -1217,11 +1217,13 @@ void CGOpenMPRuntimeGPU::emitParallelCall(CodeGenFunction &CGF,
CGBuilderTy &Bld = CGF.Builder;
llvm::Value *NumThreadsVal = NumThreads;
llvm::Function *WFn = WrapperFunctionsMap[OutlinedFn];
+ llvm::FunctionCallee RuntimeFn = OMPBuilder.getOrCreateRuntimeFunction(
+ CGM.getModule(), OMPRTL___kmpc_parallel_51);
llvm::Value *ID = llvm::ConstantPointerNull::get(CGM.Int8PtrTy);
if (WFn)
ID = Bld.CreateBitOrPointerCast(WFn, CGM.Int8PtrTy);
- llvm::Value *FnPtr = Bld.CreateBitOrPointerCast(OutlinedFn, CGM.Int8PtrTy);
-
+ llvm::Value *FnPtr = Bld.CreateAddrSpaceCast(OutlinedFn, CGM.Int8PtrTy);
+ FnPtr = Bld.CreateBitOrPointerCast(FnPtr, CGM.Int8PtrTy);
// Create a private scope that will globalize the arguments
// passed from the outside of the target region.
// TODO: Is that needed?
@@ -1268,14 +1270,15 @@ void CGOpenMPRuntimeGPU::emitParallelCall(CodeGenFunction &CGF,
IfCondVal,
NumThreadsVal,
llvm::ConstantInt::get(CGF.Int32Ty, -1),
- FnPtr,
- ID,
- Bld.CreateBitOrPointerCast(CapturedVarsAddrs.emitRawPointer(CGF),
- CGF.VoidPtrPtrTy),
+ createRuntimeFunctionArgAddrSpaceCast(CGF, RuntimeFn, 5, FnPtr),
+ createRuntimeFunctionArgAddrSpaceCast(CGF, RuntimeFn, 6, ID),
+ createRuntimeFunctionArgAddrSpaceCast(
+ CGF, RuntimeFn, 7,
+ Bld.CreateBitOrPointerCast(CapturedVarsAddrs.emitRawPointer(CGF),
+ CGF.VoidPtrPtrTy)),
llvm::ConstantInt::get(CGM.SizeTy, CapturedVars.size())};
- CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
- CGM.getModule(), OMPRTL___kmpc_parallel_51),
- Args);
+
+ CGF.EmitRuntimeCall(RuntimeFn, Args);
};
RegionCodeGenTy RCG(ParallelGen);
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index 156f64bb5f508..78fd65750fc02 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -585,6 +585,8 @@ static llvm::Function *emitOutlinedFunctionPrologue(
F->removeFnAttr(llvm::Attribute::NoInline);
F->addFnAttr(llvm::Attribute::AlwaysInline);
}
+ if (CGM.getTriple().isSPIRV())
+ F->setCallingConv(llvm::CallingConv::SPIR_FUNC);
// Generate the function.
CGF.StartFunction(CD, Ctx.VoidTy, F, FuncInfo, TargetArgs,
diff --git a/clang/test/OpenMP/spirv_target_addrspace.c b/clang/test/OpenMP/spirv_target_addrspace.c
new file mode 100644
index 0000000000000..9e5eeff73eed6
--- /dev/null
+++ b/clang/test/OpenMP/spirv_target_addrspace.c
@@ -0,0 +1,21 @@
+// RUN: %clang_cc1 -fopenmp -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=spirv64 -emit-llvm-bc %s -o %t-host.bc
+// RUN: %clang_cc1 -fopenmp -fopenmp-targets=spirv64 -fopenmp-is-target-device -triple spirv64 -fopenmp-host-ir-file-path %t-host.bc -emit-llvm %s -o - | FileCheck %s
+
+extern int fcn(const char[]);
+
+#pragma omp declare target
+// CHECK: @global = addrspace(1) global i32 0, align 4
+// CHECK: @.str = private unnamed_addr addrspace(1) constant [4 x i8] c"foo\00", align 1
+int global = 0;
+#pragma omp end declare target
+int main() {
+ // CHECK: = call i32 @__kmpc_target_init(ptr addrspacecast (ptr addrspace(1) @__omp_offloading_{{.*}}_kernel_environment to ptr), ptr %{{.*}})
+ #pragma omp target
+ {
+ for(int i = 0; i < 1024; i++)
+ global++;
+ fcn("foo");
+ }
+ return global;
+}
+
diff --git a/clang/test/OpenMP/spirv_target_addrspace_simd.c b/clang/test/OpenMP/spirv_target_addrspace_simd.c
new file mode 100644
index 0000000000000..31b00ab555596
--- /dev/null
+++ b/clang/test/OpenMP/spirv_target_addrspace_simd.c
@@ -0,0 +1,23 @@
+// RUN: %clang_cc1 -fopenmp -x c++ -triple x86_64-unknown-unknown -fopenmp-targets=spirv64 -emit-llvm-bc %s -o %t-host.bc
+// RUN: %clang_cc1 -fopenmp -fopenmp-targets=spirv64 -fopenmp-is-target-device -triple spirv64 -fopenmp-host-ir-file-path %t-host.bc -emit-llvm %s -o - | FileCheck %s
+
+int main() {
+ int x = 0;
+
+#pragma omp target teams distribute parallel for simd
+ for(int i = 0; i < 1024; i++)
+ x+=i;
+ return x;
+}
+
+// CHECK: @[[#STRLOC:]] = private unnamed_addr addrspace(1) constant [{{.*}} x i8] c{{.*}}, align 1
+// CHECK: @[[#IDENT:]] = private unnamed_addr addrspace(1) constant %struct.ident_t { i32 {{.*}}, i32 2050, i32 {{.*}}, i32 {{.*}}, ptr addrspacecast (ptr addrspace(1) @[[#STRLOC]] to ptr) }, align 8
+// CHECK: define internal spir_func void @__omp_offloading_{{.*}}_omp_outlined(ptr addrspace(4) noalias noundef {{.*}}., ptr addrspace(4) noalias noundef {{.*}}, i64 noundef {{.*}}) #{{.*}} {
+// CHECK: = load ptr addrspace(4), ptr addrspace(4) %{{.*}}, align 8
+// CHECK: = load i32, ptr addrspace(4) %{{.*}}, align 4
+// CHECK: = addrspacecast ptr addrspace(4) %{{.*}} to ptr
+// CHECK: = addrspacecast ptr addrspace(4) %{{.*}} to ptr
+// CHECK: = addrspacecast ptr addrspace(4) %{{.*}} to ptr
+// CHECK: = addrspacecast ptr addrspace(4) %{{.*}} to ptr
+// CHECK: call spir_func void @__kmpc_distribute_static_init{{.*}}(ptr addrspacecast (ptr addrspace(1) @[[#IDENT]] to ptr), i32 %{{.*}}, i32 {{.*}}, ptr %{{.*}}, ptr %{{.*}}, ptr %{{.*}}, ptr %{{.*}}, i32 {{.*}}, i32 %{{.*}})
+// CHECK: call spir_func void @__kmpc_distribute_static_fini{{.*}}(ptr addrspacecast (ptr addrspace(1) @[[#IDENT]] to ptr), i32 %{{.*}})
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 13b727d226738..e7dc82acb9201 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -910,6 +910,14 @@ Constant *OpenMPIRBuilder::getOrCreateIdent(Constant *SrcLocStr,
ConstantInt::get(Int32, uint32_t(LocFlags)),
ConstantInt::get(Int32, Reserve2Flags),
ConstantInt::get(Int32, SrcLocStrSize), SrcLocStr};
+
+ size_t SrcLocStrArgIdx = 4;
+ if (OpenMPIRBuilder::Ident->getElementType(SrcLocStrArgIdx)
+ ->getPointerAddressSpace() !=
+ IdentData[SrcLocStrArgIdx]->getType()->getPointerAddressSpace())
+ IdentData[SrcLocStrArgIdx] = ConstantExpr::getAddrSpaceCast(
+ SrcLocStr, OpenMPIRBuilder::Ident->getElementType(SrcLocStrArgIdx));
+
Constant *Initializer =
ConstantStruct::get(OpenMPIRBuilder::Ident, IdentData);
@@ -950,8 +958,9 @@ Constant *OpenMPIRBuilder::getOrCreateSrcLocStr(StringRef LocStr,
GV.getInitializer() == Initializer)
return SrcLocStr = ConstantExpr::getPointerCast(&GV, Int8Ptr);
- SrcLocStr = Builder.CreateGlobalString(LocStr, /* Name */ "",
- /* AddressSpace */ 0, &M);
+ SrcLocStr = Builder.CreateGlobalString(
+ LocStr, /* Name */ "",
+ M.getDataLayout().getDefaultGlobalsAddressSpace(), &M);
}
return SrcLocStr;
}
|
Signed-off-by: Sarnie, Nick <[email protected]>
// CHECK: = addrspacecast ptr addrspace(4) %{{.*}} to ptr | ||
// CHECK: = addrspacecast ptr addrspace(4) %{{.*}} to ptr | ||
// CHECK: = addrspacecast ptr addrspace(4) %{{.*}} to ptr | ||
// CHECK: call spir_func void @__kmpc_distribute_static_init{{.*}}(ptr addrspacecast (ptr addrspace(1) @[[#IDENT]] to ptr), i32 %{{.*}}, i32 {{.*}}, ptr %{{.*}}, ptr %{{.*}}, ptr %{{.*}}, ptr %{{.*}}, i32 {{.*}}, i32 %{{.*}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong. Casts from global to private are not allowed in SPIR and SPIR-V.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussing this offline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! Could you help me understand how does this change affect CUDA SPIRV?
I'm going to separate out the CUDA SPIRV regression fix as I need to fix some problems with this PR, will add you as a reviewer. |
Do you have a timeline? It would be nice to get this fixed soon. |
llvm::Value *FnPtr = Bld.CreateBitOrPointerCast(OutlinedFn, CGM.Int8PtrTy); | ||
|
||
llvm::Value *FnPtr = Bld.CreateAddrSpaceCast(OutlinedFn, CGM.Int8PtrTy); | ||
FnPtr = Bld.CreateBitOrPointerCast(FnPtr, CGM.Int8PtrTy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems spurious with opaque pointers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this one is annoying, so we are trying to cast a fcn ptr to int8 ptr (both just opaque now as you said), but the int8 ptr type is addrspace 4, and the function is addrspace 0, so we can't cast the function to the ptr in a single cast because we need to deal with the addrspace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the bitcast won't do anything in IR with opaque pointers, at this point (and would also be a NOP with untyped ptrs in SPIRV), I believe, so we could just start cleaning some up. I was not suggesting that the AS cast be removed, but rather that it be made unnecessary by emitting OutlinedFn in the "right" AS from the get-go (which, in this case, might simply boil down to defining P4 in the DataLayout string?). However, that runs afoul of something that I personally find odd in the SPIR-V spec, which we might need to discuss elsewhere / offline, which is the fact that it appears that a choice was made to put pointers to function in private (https://github.com/KhronosGroup/SPIRV-LLVM-Translator/blob/ff7db6ef1c71860772079b7c26c56b3d016e4205/lib/SPIRV/SPIRVReader.cpp#L377). That seems like a recipe for great pain, and rather quaint considering where code usually ends up being stored as well as the fact that some targets play tricks with pointers to private (e.g. making them 32-bit only).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it, thanks. Let me investigate more, will continue to use this PR for updates. Hopefully the root cause fix works because I definitely don't want to have to add addrcasts everywhere
Overall this seems a bit AS cast heavy in the OMP parts, I wonder if we have a chance to figure out if we cannot just emit things in the right ASes from the get-go? This is the path we took when cleaning up some of this stuff in Clang for C/C++, and whilst it was definitely more painful than casting locally for the person authoring the patch, it ended up being somewhat tidier/more robust. Otherwise, I have the same thought from the CUDA-fix PR, which is it might be worthwhile to "undummify" this AS map altogether. |
Thanks for the feedback, I'll work to address it. |
Based on feedback from #134399, we use the address space map that sets the default AS to 4 for OpenMP SPIR-V offload.
The new AS map had the wrong target AS for
opencl_global
andopencl_constant
, which is what we end up using to get the AS during OpenMP target codegen, so I updated it to match the old default AS 0 map.After this PR, I will work on simplifying the condition of the old default AS 0 map to eventually only be OCL with no generic addrspace, but there are many failures so I wanted to do it step by step, and this is the first one.
There are relatively minor changes to OpenMP codegen, mostly just addrspacecasts (because globals are AS 1 in SPIR-V, so we need to cast to AS 4/no AS somewhat often) or use the correct address space to create a global.