-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[OpenMP] Ensure the actual kernel is annotated with launch bounds #99927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenMP] Ensure the actual kernel is annotated with launch bounds #99927
Conversation
@llvm/pr-subscribers-offload @llvm/pr-subscribers-clang-codegen Author: Johannes Doerfert (jdoerfert) ChangesIn debug mode there is a wrapper (the kernel) around the function in which we generate the kernel code. We worked around this before to get the correct kernel name, but now we really distinguish both to attach the launch bounds to the kernel, not the inner function. Patch is 464.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/99927.diff 8 Files Affected:
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index f73d32de7c484..450371aef12b9 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -639,27 +639,42 @@ CodeGenFunction::GenerateOpenMPCapturedStmtFunction(const CapturedStmt &S,
// Build the argument list.
bool NeedWrapperFunction =
getDebugInfo() && CGM.getCodeGenOpts().hasReducedDebugInfo();
- FunctionArgList Args;
- llvm::MapVector<const Decl *, std::pair<const VarDecl *, Address>> LocalAddrs;
- llvm::DenseMap<const Decl *, std::pair<const Expr *, llvm::Value *>> VLASizes;
+ FunctionArgList Args, WrapperArgs;
+ llvm::MapVector<const Decl *, std::pair<const VarDecl *, Address>> LocalAddrs,
+ WrapperLocalAddrs;
+ llvm::DenseMap<const Decl *, std::pair<const Expr *, llvm::Value *>> VLASizes,
+ WrapperVLASizes;
SmallString<256> Buffer;
llvm::raw_svector_ostream Out(Buffer);
Out << CapturedStmtInfo->getHelperName();
- if (NeedWrapperFunction)
+
+ CodeGenFunction WrapperCGF(CGM, /*suppressNewContext=*/true);
+ llvm::Function *WrapperF = nullptr;
+ if (NeedWrapperFunction) {
+ // Emit the final kernel early to allow attributes to be added by the
+ // OpenMPI-IR-Builder.
+ FunctionOptions WrapperFO(&S, /*UIntPtrCastRequired=*/true,
+ /*RegisterCastedArgsOnly=*/true,
+ CapturedStmtInfo->getHelperName(), Loc);
+ WrapperCGF.CapturedStmtInfo = CapturedStmtInfo;
+ WrapperF =
+ emitOutlinedFunctionPrologue(WrapperCGF, Args, LocalAddrs, VLASizes,
+ WrapperCGF.CXXThisValue, WrapperFO);
Out << "_debug__";
+ }
FunctionOptions FO(&S, !NeedWrapperFunction, /*RegisterCastedArgsOnly=*/false,
Out.str(), Loc);
- llvm::Function *F = emitOutlinedFunctionPrologue(*this, Args, LocalAddrs,
- VLASizes, CXXThisValue, FO);
+ llvm::Function *F = emitOutlinedFunctionPrologue(
+ *this, WrapperArgs, WrapperLocalAddrs, WrapperVLASizes, CXXThisValue, FO);
CodeGenFunction::OMPPrivateScope LocalScope(*this);
- for (const auto &LocalAddrPair : LocalAddrs) {
+ for (const auto &LocalAddrPair : WrapperLocalAddrs) {
if (LocalAddrPair.second.first) {
LocalScope.addPrivate(LocalAddrPair.second.first,
LocalAddrPair.second.second);
}
}
(void)LocalScope.Privatize();
- for (const auto &VLASizePair : VLASizes)
+ for (const auto &VLASizePair : WrapperVLASizes)
VLASizeMap[VLASizePair.second.first] = VLASizePair.second.second;
PGO.assignRegionCounters(GlobalDecl(CD), F);
CapturedStmtInfo->EmitBody(*this, CD->getBody());
@@ -668,17 +683,10 @@ CodeGenFunction::GenerateOpenMPCapturedStmtFunction(const CapturedStmt &S,
if (!NeedWrapperFunction)
return F;
- FunctionOptions WrapperFO(&S, /*UIntPtrCastRequired=*/true,
- /*RegisterCastedArgsOnly=*/true,
- CapturedStmtInfo->getHelperName(), Loc);
- CodeGenFunction WrapperCGF(CGM, /*suppressNewContext=*/true);
- WrapperCGF.CapturedStmtInfo = CapturedStmtInfo;
- Args.clear();
- LocalAddrs.clear();
- VLASizes.clear();
- llvm::Function *WrapperF =
- emitOutlinedFunctionPrologue(WrapperCGF, Args, LocalAddrs, VLASizes,
- WrapperCGF.CXXThisValue, WrapperFO);
+ // Reverse the order.
+ WrapperF->removeFromParent();
+ F->getParent()->getFunctionList().insertAfter(F->getIterator(), WrapperF);
+
llvm::SmallVector<llvm::Value *, 4> CallArgs;
auto *PI = F->arg_begin();
for (const auto *Arg : Args) {
diff --git a/clang/test/OpenMP/ompx_attributes_codegen.cpp b/clang/test/OpenMP/ompx_attributes_codegen.cpp
index 87eb2913537ba..6c163c1875171 100644
--- a/clang/test/OpenMP/ompx_attributes_codegen.cpp
+++ b/clang/test/OpenMP/ompx_attributes_codegen.cpp
@@ -3,15 +3,17 @@
// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm-bc %s -o %t-ppc-host.bc
// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s --check-prefix=AMD
// RUN: %clang_cc1 -target-cpu gfx900 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s --check-prefix=AMD
+// RUN: %clang_cc1 -target-cpu gfx900 -fopenmp -x c++ -std=c++11 -triple amdgcn-amd-amdhsa -fopenmp-targets=amdgcn-amd-amdhsa -dwarf-version=5 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s --check-prefix=AMD
// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple nvptx64 -fopenmp-targets=nvptx64 -emit-llvm %s -fopenmp-is-target-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s --check-prefix=NVIDIA
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple nvptx64 -fopenmp-targets=nvptx64 -emit-llvm %s -fopenmp-is-target-device -dwarf-version=5 -fopenmp-host-ir-file-path %t-ppc-host.bc -o - | FileCheck %s --check-prefix=NVIDIA
// expected-no-diagnostics
// Check that the target attributes are set on the generated kernel
void func() {
- // AMD: amdgpu_kernel void @__omp_offloading[[HASH:.*]]_l16(ptr {{[^,]+}}) #0
- // AMD: amdgpu_kernel void @__omp_offloading[[HASH:.*]]_l18(ptr {{[^,]+}})
- // AMD: amdgpu_kernel void @__omp_offloading[[HASH:.*]]_l20(ptr {{[^,]+}}) #4
+ // AMD: amdgpu_kernel void @__omp_offloading[[HASH:.*]]_l18(ptr {{[^,]+}}) #0
+ // AMD: amdgpu_kernel void @__omp_offloading[[HASH:.*]]_l20(ptr {{[^,]+}})
+ // AMD: amdgpu_kernel void @__omp_offloading[[HASH:.*]]_l22(ptr {{[^,]+}}) #4
#pragma omp target ompx_attribute([[clang::amdgpu_flat_work_group_size(10, 20)]])
{}
@@ -35,6 +37,6 @@ void func() {
// NVIDIA: "omp_target_thread_limit"="20"
// NVIDIA: "omp_target_thread_limit"="45"
// NVIDIA: "omp_target_thread_limit"="17"
-// NVIDIA: !{ptr @__omp_offloading[[HASH1:.*]]_l16, !"maxntidx", i32 20}
-// NVIDIA: !{ptr @__omp_offloading[[HASH2:.*]]_l18, !"maxntidx", i32 45}
-// NVIDIA: !{ptr @__omp_offloading[[HASH3:.*]]_l20, !"maxntidx", i32 17}
+// NVIDIA: !{ptr @__omp_offloading[[HASH1:.*]]_l18, !"maxntidx", i32 20}
+// NVIDIA: !{ptr @__omp_offloading[[HASH2:.*]]_l20, !"maxntidx", i32 45}
+// NVIDIA: !{ptr @__omp_offloading[[HASH3:.*]]_l22, !"maxntidx", i32 17}
diff --git a/clang/test/OpenMP/parallel_codegen.cpp b/clang/test/OpenMP/parallel_codegen.cpp
index 9082f1c3232af..9cdb1da996152 100644
--- a/clang/test/OpenMP/parallel_codegen.cpp
+++ b/clang/test/OpenMP/parallel_codegen.cpp
@@ -115,7 +115,7 @@ int main (int argc, char **argv) {
// CHECK1-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 1
// CHECK1-NEXT: [[TMP2:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
// CHECK1-NEXT: invoke void @_Z3fooIiEvT_(i32 noundef [[TMP2]])
-// CHECK1-NEXT: to label [[INVOKE_CONT:%.*]] unwind label [[TERMINATE_LPAD:%.*]]
+// CHECK1-NEXT: to label [[INVOKE_CONT:%.*]] unwind label [[TERMINATE_LPAD:%.*]]
// CHECK1: invoke.cont:
// CHECK1-NEXT: [[TMP3:%.*]] = load i32, ptr @global, align 4
// CHECK1-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 1
@@ -123,7 +123,7 @@ int main (int argc, char **argv) {
// CHECK1-NEXT: ret void
// CHECK1: terminate.lpad:
// CHECK1-NEXT: [[TMP4:%.*]] = landingpad { ptr, i32 }
-// CHECK1-NEXT: catch ptr null
+// CHECK1-NEXT: catch ptr null
// CHECK1-NEXT: [[TMP5:%.*]] = extractvalue { ptr, i32 } [[TMP4]], 0
// CHECK1-NEXT: call void @__clang_call_terminate(ptr [[TMP5]]) #[[ATTR6:[0-9]+]]
// CHECK1-NEXT: unreachable
@@ -186,7 +186,7 @@ int main (int argc, char **argv) {
// CHECK1-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 1
// CHECK1-NEXT: [[TMP3:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
// CHECK1-NEXT: invoke void @_Z3fooIiEvT_(i32 noundef [[TMP3]])
-// CHECK1-NEXT: to label [[INVOKE_CONT:%.*]] unwind label [[TERMINATE_LPAD:%.*]]
+// CHECK1-NEXT: to label [[INVOKE_CONT:%.*]] unwind label [[TERMINATE_LPAD:%.*]]
// CHECK1: invoke.cont:
// CHECK1-NEXT: [[TMP4:%.*]] = load i32, ptr [[TMP2]], align 4
// CHECK1-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 1
@@ -194,7 +194,7 @@ int main (int argc, char **argv) {
// CHECK1-NEXT: ret void
// CHECK1: terminate.lpad:
// CHECK1-NEXT: [[TMP5:%.*]] = landingpad { ptr, i32 }
-// CHECK1-NEXT: catch ptr null
+// CHECK1-NEXT: catch ptr null
// CHECK1-NEXT: [[TMP6:%.*]] = extractvalue { ptr, i32 } [[TMP5]], 0
// CHECK1-NEXT: call void @__clang_call_terminate(ptr [[TMP6]]) #[[ATTR6]]
// CHECK1-NEXT: unreachable
@@ -233,7 +233,7 @@ int main (int argc, char **argv) {
// CHECK1-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 1
// CHECK1-NEXT: [[TMP2:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
// CHECK1-NEXT: invoke void @_Z3fooIiEvT_(i32 noundef [[TMP2]])
-// CHECK1-NEXT: to label [[INVOKE_CONT:%.*]] unwind label [[TERMINATE_LPAD:%.*]]
+// CHECK1-NEXT: to label [[INVOKE_CONT:%.*]] unwind label [[TERMINATE_LPAD:%.*]]
// CHECK1: invoke.cont:
// CHECK1-NEXT: [[TMP3:%.*]] = load i32, ptr @global, align 4
// CHECK1-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 1
@@ -241,7 +241,7 @@ int main (int argc, char **argv) {
// CHECK1-NEXT: ret void
// CHECK1: terminate.lpad:
// CHECK1-NEXT: [[TMP4:%.*]] = landingpad { ptr, i32 }
-// CHECK1-NEXT: catch ptr null
+// CHECK1-NEXT: catch ptr null
// CHECK1-NEXT: [[TMP5:%.*]] = extractvalue { ptr, i32 } [[TMP4]], 0
// CHECK1-NEXT: call void @__clang_call_terminate(ptr [[TMP5]]) #[[ATTR6]]
// CHECK1-NEXT: unreachable
@@ -278,7 +278,7 @@ int main (int argc, char **argv) {
// CHECK1-NEXT: [[TMP1:%.*]] = load i64, ptr [[VLA_ADDR]], align 8
// CHECK1-NEXT: [[TMP2:%.*]] = load ptr, ptr [[TMP0]], align 8
// CHECK1-NEXT: invoke void @_Z3fooIPPcEvT_(ptr noundef [[TMP2]])
-// CHECK1-NEXT: to label [[INVOKE_CONT:%.*]] unwind label [[TERMINATE_LPAD:%.*]]
+// CHECK1-NEXT: to label [[INVOKE_CONT:%.*]] unwind label [[TERMINATE_LPAD:%.*]]
// CHECK1: invoke.cont:
// CHECK1-NEXT: [[TMP3:%.*]] = load ptr, ptr [[VAR]], align 8
// CHECK1-NEXT: [[TMP4:%.*]] = mul nsw i64 0, [[TMP1]]
@@ -287,7 +287,7 @@ int main (int argc, char **argv) {
// CHECK1-NEXT: ret void
// CHECK1: terminate.lpad:
// CHECK1-NEXT: [[TMP5:%.*]] = landingpad { ptr, i32 }
-// CHECK1-NEXT: catch ptr null
+// CHECK1-NEXT: catch ptr null
// CHECK1-NEXT: [[TMP6:%.*]] = extractvalue { ptr, i32 } [[TMP5]], 0
// CHECK1-NEXT: call void @__clang_call_terminate(ptr [[TMP6]]) #[[ATTR6]]
// CHECK1-NEXT: unreachable
@@ -311,20 +311,20 @@ int main (int argc, char **argv) {
// CHECK2-NEXT: [[__VLA_EXPR0:%.*]] = alloca i64, align 8
// CHECK2-NEXT: store i32 0, ptr [[RETVAL]], align 4
// CHECK2-NEXT: store i32 [[ARGC]], ptr [[ARGC_ADDR]], align 4
-// CHECK2-NEXT: call void @llvm.dbg.declare(metadata ptr [[ARGC_ADDR]], metadata [[META18:![0-9]+]], metadata !DIExpression()), !dbg [[DBG19:![0-9]+]]
+// CHECK2-NEXT: tail call void @llvm.dbg.declare(metadata ptr [[ARGC_ADDR]], metadata [[META18:![0-9]+]], metadata !DIExpression()), !dbg [[DBG19:![0-9]+]]
// CHECK2-NEXT: store ptr [[ARGV]], ptr [[ARGV_ADDR]], align 8
-// CHECK2-NEXT: call void @llvm.dbg.declare(metadata ptr [[ARGV_ADDR]], metadata [[META20:![0-9]+]], metadata !DIExpression()), !dbg [[DBG21:![0-9]+]]
+// CHECK2-NEXT: tail call void @llvm.dbg.declare(metadata ptr [[ARGV_ADDR]], metadata [[META20:![0-9]+]], metadata !DIExpression()), !dbg [[DBG21:![0-9]+]]
// CHECK2-NEXT: [[TMP0:%.*]] = load i32, ptr [[ARGC_ADDR]], align 4, !dbg [[DBG22:![0-9]+]]
// CHECK2-NEXT: [[TMP1:%.*]] = zext i32 [[TMP0]] to i64, !dbg [[DBG23:![0-9]+]]
// CHECK2-NEXT: [[TMP2:%.*]] = call ptr @llvm.stacksave.p0(), !dbg [[DBG23]]
// CHECK2-NEXT: store ptr [[TMP2]], ptr [[SAVED_STACK]], align 8, !dbg [[DBG23]]
// CHECK2-NEXT: [[VLA:%.*]] = alloca i32, i64 [[TMP1]], align 16, !dbg [[DBG23]]
// CHECK2-NEXT: store i64 [[TMP1]], ptr [[__VLA_EXPR0]], align 8, !dbg [[DBG23]]
-// CHECK2-NEXT: call void @llvm.dbg.declare(metadata ptr [[__VLA_EXPR0]], metadata [[META24:![0-9]+]], metadata !DIExpression()), !dbg [[DBG26:![0-9]+]]
-// CHECK2-NEXT: call void @llvm.dbg.declare(metadata ptr [[VLA]], metadata [[META27:![0-9]+]], metadata !DIExpression()), !dbg [[DBG31:![0-9]+]]
+// CHECK2-NEXT: tail call void @llvm.dbg.declare(metadata ptr [[__VLA_EXPR0]], metadata [[META24:![0-9]+]], metadata !DIExpression()), !dbg [[DBG26:![0-9]+]]
+// CHECK2-NEXT: tail call void @llvm.dbg.declare(metadata ptr [[VLA]], metadata [[META27:![0-9]+]], metadata !DIExpression()), !dbg [[DBG31:![0-9]+]]
// CHECK2-NEXT: call void (ptr, i32, ptr, ...) @__kmpc_fork_call(ptr @[[GLOB1:[0-9]+]], i32 2, ptr @main.omp_outlined, i64 [[TMP1]], ptr [[VLA]]), !dbg [[DBG32:![0-9]+]]
-// CHECK2-NEXT: call void (ptr, i32, ptr, ...) @__kmpc_fork_call(ptr @[[GLOB5:[0-9]+]], i32 1, ptr @main.omp_outlined.2, i64 [[TMP1]]), !dbg [[DBG33:![0-9]+]]
-// CHECK2-NEXT: call void (ptr, i32, ptr, ...) @__kmpc_fork_call(ptr @[[GLOB9:[0-9]+]], i32 2, ptr @main.omp_outlined.4, i64 [[TMP1]], ptr [[VLA]]), !dbg [[DBG34:![0-9]+]]
+// CHECK2-NEXT: call void (ptr, i32, ptr, ...) @__kmpc_fork_call(ptr @[[GLOB5:[0-9]+]], i32 1, ptr @main.omp_outlined.1, i64 [[TMP1]]), !dbg [[DBG33:![0-9]+]]
+// CHECK2-NEXT: call void (ptr, i32, ptr, ...) @__kmpc_fork_call(ptr @[[GLOB9:[0-9]+]], i32 2, ptr @main.omp_outlined.3, i64 [[TMP1]], ptr [[VLA]]), !dbg [[DBG34:![0-9]+]]
// CHECK2-NEXT: [[TMP3:%.*]] = load ptr, ptr [[ARGV_ADDR]], align 8, !dbg [[DBG35:![0-9]+]]
// CHECK2-NEXT: [[CALL:%.*]] = call noundef i32 @_Z5tmainIPPcEiT_(ptr noundef [[TMP3]]), !dbg [[DBG36:![0-9]+]]
// CHECK2-NEXT: store i32 [[CALL]], ptr [[RETVAL]], align 4, !dbg [[DBG37:![0-9]+]]
@@ -342,19 +342,19 @@ int main (int argc, char **argv) {
// CHECK2-NEXT: [[VLA_ADDR:%.*]] = alloca i64, align 8
// CHECK2-NEXT: [[A_ADDR:%.*]] = alloca ptr, align 8
// CHECK2-NEXT: store ptr [[DOTGLOBAL_TID_]], ptr [[DOTGLOBAL_TID__ADDR]], align 8
-// CHECK2-NEXT: call void @llvm.dbg.declare(metadata ptr [[DOTGLOBAL_TID__ADDR]], metadata [[META47:![0-9]+]], metadata !DIExpression()), !dbg [[DBG48:![0-9]+]]
+// CHECK2-NEXT: tail call void @llvm.dbg.declare(metadata ptr [[DOTGLOBAL_TID__ADDR]], metadata [[META47:![0-9]+]], metadata !DIExpression()), !dbg [[DBG48:![0-9]+]]
// CHECK2-NEXT: store ptr [[DOTBOUND_TID_]], ptr [[DOTBOUND_TID__ADDR]], align 8
-// CHECK2-NEXT: call void @llvm.dbg.declare(metadata ptr [[DOTBOUND_TID__ADDR]], metadata [[META49:![0-9]+]], metadata !DIExpression()), !dbg [[DBG48]]
+// CHECK2-NEXT: tail call void @llvm.dbg.declare(metadata ptr [[DOTBOUND_TID__ADDR]], metadata [[META49:![0-9]+]], metadata !DIExpression()), !dbg [[DBG48]]
// CHECK2-NEXT: store i64 [[VLA]], ptr [[VLA_ADDR]], align 8
-// CHECK2-NEXT: call void @llvm.dbg.declare(metadata ptr [[VLA_ADDR]], metadata [[META50:![0-9]+]], metadata !DIExpression()), !dbg [[DBG48]]
+// CHECK2-NEXT: tail call void @llvm.dbg.declare(metadata ptr [[VLA_ADDR]], metadata [[META50:![0-9]+]], metadata !DIExpression()), !dbg [[DBG48]]
// CHECK2-NEXT: store ptr [[A]], ptr [[A_ADDR]], align 8
-// CHECK2-NEXT: call void @llvm.dbg.declare(metadata ptr [[A_ADDR]], metadata [[META51:![0-9]+]], metadata !DIExpression()), !dbg [[DBG52:![0-9]+]]
+// CHECK2-NEXT: tail call void @llvm.dbg.declare(metadata ptr [[A_ADDR]], metadata [[META51:![0-9]+]], metadata !DIExpression()), !dbg [[DBG52:![0-9]+]]
// CHECK2-NEXT: [[TMP0:%.*]] = load i64, ptr [[VLA_ADDR]], align 8, !dbg [[DBG53:![0-9]+]]
// CHECK2-NEXT: [[TMP1:%.*]] = load ptr, ptr [[A_ADDR]], align 8, !dbg [[DBG53]]
// CHECK2-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 1, !dbg [[DBG54:![0-9]+]]
// CHECK2-NEXT: [[TMP2:%.*]] = load i32, ptr [[ARRAYIDX]], align 4, !dbg [[DBG54]]
// CHECK2-NEXT: invoke void @_Z3fooIiEvT_(i32 noundef [[TMP2]])
-// CHECK2-NEXT: to label [[INVOKE_CONT:%.*]] unwind label [[TERMINATE_LPAD:%.*]], !dbg [[DBG53]]
+// CHECK2-NEXT: to label [[INVOKE_CONT:%.*]] unwind label [[TERMINATE_LPAD:%.*]], !dbg [[DBG53]]
// CHECK2: invoke.cont:
// CHECK2-NEXT: [[TMP3:%.*]] = load i32, ptr @global, align 4, !dbg [[DBG55:![0-9]+]]
// CHECK2-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i64 1, !dbg [[DBG56:![0-9]+]]
@@ -362,53 +362,53 @@ int main (int argc, char **argv) {
// CHECK2-NEXT: ret void, !dbg [[DBG55]]
// CHECK2: terminate.lpad:
// CHECK2-NEXT: [[TMP4:%.*]] = landingpad { ptr, i32 }
-// CHECK2-NEXT: catch ptr null, !dbg [[DBG53]]
+// CHECK2-NEXT: catch ptr null, !dbg [[DBG53]]
// CHECK2-NEXT: [[TMP5:%.*]] = extractvalue { ptr, i32 } [[TMP4]], 0, !dbg [[DBG53]]
// CHECK2-NEXT: call void @__clang_call_terminate(ptr [[TMP5]]) #[[ATTR7:[0-9]+]], !dbg [[DBG53]]
// CHECK2-NEXT: unreachable, !dbg [[DBG53]]
//
//
-// CHECK2-LABEL: define {{[^@]+}}@_Z3fooIiEvT_
-// CHECK2-SAME: (i32 noundef [[ARGC:%.*]]) #[[ATTR4:[0-9]+]] comdat !dbg [[DBG58:![0-9]+]] {
-// CHECK2-NEXT: entry:
-// CHECK2-NEXT: [[ARGC_ADDR:%.*]] = alloca i32, align 4
-// CHECK2-NEXT: store i32 [[ARGC]], ptr [[ARGC_ADDR]], align 4
-// CHECK2-NEXT: call void @llvm.dbg.declare(metadata ptr [[ARGC_ADDR]], metadata [[META63:![0-9]+]], metadata !DIExpression()), !dbg [[DBG64:![0-9]+]]
-// CHECK2-NEXT: ret void, !dbg [[DBG65:![0-9]+]]
-//
-//
-// CHECK2-LABEL: define {{[^@]+}}@__clang_call_terminate
-// CHECK2-SAME: (ptr noundef [[TMP0:%.*]]) #[[ATTR5:[0-9]+]] comdat {
-// CHECK2-NEXT: [[TMP2:%.*]] = call ptr @__cxa_begin_catch(ptr [[TMP0]]) #[[ATTR6:[0-9]+]]
-// CHECK2-NEXT: call void @_ZSt9terminatev() #[[ATTR7]]
-// CHECK2-NEXT: unreachable
-//
-//
// CHECK2-LABEL: define {{[^@]+}}@main.omp_outlined
-// CHECK2-SAME: (ptr noalias noundef [[DOTGLOBAL_TID_:%.*]], ptr noalias noundef [[DOTBOUND_TID_:%.*]], i64 noundef [[VLA:%.*]], ptr noundef nonnull align 4 dereferenceable(4) [[A:%.*]]) #[[ATTR3]] !dbg [[DBG66:![0-9]+]] {
+// CHECK2-SAME: (ptr noalias noundef [[DOTGLOBAL_TID_:%.*]], ptr noalias noundef [[DOTBOUND_TID_:%.*]], i64 noundef [[VLA:%.*]], ptr noundef nonnull align 4 dereferenceable(4) [[A:%.*]]) #[[ATTR3]] !dbg [[DBG58:![0-9]+]] {
// CHECK2-NEXT: entry:
// CHECK2-NEXT: [[DOTGLOBAL_TID__ADDR:%.*]] = alloca ptr, align 8
// CHECK2-NEXT: [[DOTBOUND_TID__ADDR:%.*]] = alloca ptr, align 8
// CHECK2-NEXT: [[VLA_ADDR:%.*]] = alloca i64, align 8
// CHECK2-NEXT: [[A_ADDR:%.*]] = alloca ptr, align 8
// CHECK2-NEXT: store ptr [[DOTGLOBAL_TID_]], ptr [[DOTGLOBAL_TID__ADDR]], align 8
-// CHECK2-NEXT: call void @llvm.dbg.declare(metadata ptr [[DOTGLOBAL_TID__ADDR]], metadata [[META67:![0-9]+]], metadata !DIExpression()), !dbg [[DBG68:![0-9]+]]
+// CHECK2-NEXT: tail call void @llvm.dbg.declare(metadata ptr [[DOTGLOBAL_TID__ADDR]], metadata [[META59:![0-9]+]], metadata !DIExpression()), !dbg [[DBG60:![0-9]+]]
// CHECK2-NEXT: store ptr [[DOTBOUND_TID_]], ptr [[DOTBOUND_TID__ADDR]], align 8
-// CHECK2-NEXT: call void @llvm.dbg.declare(metadata ptr [[DOTBOUND_TID__ADDR]], metadata [[META69:![0-9]+]], metadata !DIExpression()), !dbg [[DBG68]]
+// CHECK2-NEXT: tail call void @llvm.dbg.declare(metadata ptr [[DOTBOUND_TID__ADDR]], metadata [[META61:![0-9]+]], metadata !DIExpression()), !dbg [[DBG60]]
// CHECK2-NEXT: store i64 [[VLA]], ptr [[VLA_ADDR]], align 8
-// CHECK2-NEXT: call void @llvm.dbg.declare(metadata ptr [[VLA_ADDR]], metadata [[META70:![0-9]+]], metadata !DIExpression()), !dbg [[DBG68]]
+// CHECK2-NEXT: tail call void @llvm.dbg.declare(metadata ptr [[VLA_ADDR]], metadata [[META62:![0-9]+]], metadata !DIExpression()), !dbg [[DBG60]]
// CHECK2-NEXT: store ptr [[A]], ptr [[A_ADDR]], align 8
-// CHECK2-NEXT: call void @llvm.dbg.declare(metadata ptr [[A_ADDR]], metadata [[META71:![0-9]+]], met...
[truncated]
|
// We need to strip the debug prefix to get the correct kernel name. | ||
StringRef KernelName = Kernel->getName(); | ||
const std::string DebugPrefix = "_debug__"; | ||
if (KernelName.ends_with(DebugPrefix)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should emit some module metadata instead of just comparing strings. Though I suppose OpenMP doesn't let the user name their kernels so it's not broken like it would be in CUDA or HIP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tricky. Honestly, I don't think the wrapper is a great idea, but I want us to not miss out on the performance for debug builds in the next release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I don't think I ever understood what the wrapper was for in debug mode.
d499a84
to
eabe6b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we did this debugging stuff before, so nothing really changed here.
reverts: needs post merge work 3c8efd7 [OpenMP] Ensure the actual kernel is annotated with launch bounds (llvm#99927) Change-Id: I4abe79f9146ae4ce49b169d6b05eb86138229848
…9927) Summary: In debug mode there is a wrapper (the kernel) around the function in which we generate the kernel code. We worked around this before to get the correct kernel name, but now we really distinguish both to attach the launch bounds to the kernel, not the inner function. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60251128
…vm#99927) In debug mode there is a wrapper (the kernel) around the function in which we generate the kernel code. We worked around this before to get the correct kernel name, but now we really distinguish both to attach the launch bounds to the kernel, not the inner function. Change-Id: I0370e5b0dabefc3089d81891d3ed25b8634d7220
In debug mode there is a wrapper (the kernel) around the function in which we generate the kernel code. We worked around this before to get the correct kernel name, but now we really distinguish both to attach the launch bounds to the kernel, not the inner function.