Skip to content

Commit 7bc9d95

Browse files
authored
[AMDGPU] Introduce "amdgpu-sw-lower-lds" pass to lower LDS accesses. (#87265)
This PR introduces new pass "amdgpu-sw-lower-lds". This pass lowers the local data store, LDS, uses in kernel and non-kernel functions in module to use dynamically allocated global memory. Packed LDS Layout is emulated in the global memory. The lowered memory instructions from LDS to global memory are then instrumented for address sanitizer, to catch addressing errors. This pass only work when address sanitizer has been enabled and has instrumented the IR. It identifies that IR has been instrumented using "nosanitize_address" module flag. For a kernel, LDS access can be static or dynamic which are direct (accessed within kernel) and indirect (accessed through non-kernels). **Replacement of Kernel LDS accesses:** - All the LDS accesses corresponding to kernel will be packed together, where all static LDS accesses will be allocated first and then dynamic LDS follows. The total size with alignment is calculated. A new LDS global will be created for the kernel called "SW LDS" and it will have the attribute "amdgpu-lds-size" attached with value of the size calculated. All the LDS accesses in the module will be replaced by GEP with offset into the "Sw LDS". - A new "llvm.amdgcn.<kernel>.dynlds" is created per kernel accessing the dynamic LDS. This will be marked used by kernel and will have MD_absolue_symbol metadata set to total static LDS size, Since dynamic LDS allocation starts after all static LDS allocation. - A device global memory equal to the total LDS size will be allocated. At the prologue of the kernel, a single work-item from the work-group, does a "malloc" and stores the pointer of the allocation in "SW LDS". To store the offsets corresponding to all LDS accesses, another global variable is created which will be called "SW LDS metadata" in this pass. - **SW LDS:** It is LDS global of ptr type with name "llvm.amdgcn.sw.lds.<kernel-name>". - **SW LDS Metadata:** It is of struct type, with n members. n equals the number of LDS globals accessed by the kernel(direct and indirect). Each member of struct is another struct of type {i32, i32, i32}. First member corresponds to offset, second member corresponds to size of LDS global being replaced and third represents the total aligned size. It will have name "llvm.amdgcn.sw.lds.<kernel-name>.md". This global will have an intializer with static LDS related offsets and sizes initialized. But for dynamic LDS related entries, offsets will be intialized to previous static LDS allocation end offset. Sizes for them will be zero initially. These dynamic LDS offset and size values will be updated with in the kernel, since kernel can read the dynamic LDS size allocation done at runtime with query to "hidden_dynamic_lds_size" hidden kernel argument. - At the epilogue of kernel, allocated memory would be made free by the same single work-item. **Replacement of non-kernel LDS accesses:** - Multiple kernels can access the same non-kernel function. All the kernels accessing LDS through non-kernels are sorted and assigned a kernel-id. All the LDS globals accessed by non-kernels are sorted. - This information is used to build two tables: - **Base table:** Base table will have single row, with elements of the row placed as per kernel ID. Each element in the row corresponds to ptr of "SW LDS" variable created for that kernel. - **Offset table:** Offset table will have multiple rows and columns. Rows are assumed to be from 0 to (n-1). n is total number of kernels accessing the LDS through non-kernels. Each row will have m elements. m is the total number of unique LDS globals accessed by all non-kernels. Each element in the row correspond to the ptr of the replacement of LDS global done by that particular kernel. - A LDS variable in non-kernel will be replaced based on the information from base and offset tables. Based on kernel-id query, ptr of "SW LDS" for that corresponding kernel is obtained from base table. The Offset into the base "SW LDS" is obtained from corresponding element in offset table. With this information, replacement value is obtained.
1 parent 88f9ac3 commit 7bc9d95

27 files changed

+5278
-0
lines changed

llvm/lib/Target/AMDGPU/AMDGPU.h

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -268,6 +268,17 @@ struct AMDGPUAlwaysInlinePass : PassInfoMixin<AMDGPUAlwaysInlinePass> {
268268
bool GlobalOpt;
269269
};
270270

271+
void initializeAMDGPUSwLowerLDSLegacyPass(PassRegistry &);
272+
extern char &AMDGPUSwLowerLDSLegacyPassID;
273+
ModulePass *
274+
createAMDGPUSwLowerLDSLegacyPass(const AMDGPUTargetMachine *TM = nullptr);
275+
276+
struct AMDGPUSwLowerLDSPass : PassInfoMixin<AMDGPUSwLowerLDSPass> {
277+
const AMDGPUTargetMachine &TM;
278+
AMDGPUSwLowerLDSPass(const AMDGPUTargetMachine &TM_) : TM(TM_) {}
279+
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
280+
};
281+
271282
class AMDGPUCodeGenPreparePass
272283
: public PassInfoMixin<AMDGPUCodeGenPreparePass> {
273284
private:

llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ MODULE_PASS("amdgpu-always-inline", AMDGPUAlwaysInlinePass())
2020
MODULE_PASS("amdgpu-lower-buffer-fat-pointers",
2121
AMDGPULowerBufferFatPointersPass(*this))
2222
MODULE_PASS("amdgpu-lower-ctor-dtor", AMDGPUCtorDtorLoweringPass())
23+
MODULE_PASS("amdgpu-sw-lower-lds", AMDGPUSwLowerLDSPass(*this))
2324
MODULE_PASS("amdgpu-lower-module-lds", AMDGPULowerModuleLDSPass(*this))
2425
MODULE_PASS("amdgpu-perf-hint",
2526
AMDGPUPerfHintAnalysisPass(

llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp

Lines changed: 1335 additions & 0 deletions
Large diffs are not rendered by default.

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -412,6 +412,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeAMDGPUTarget() {
412412
initializeSILoadStoreOptimizerPass(*PR);
413413
initializeAMDGPUCtorDtorLoweringLegacyPass(*PR);
414414
initializeAMDGPUAlwaysInlinePass(*PR);
415+
initializeAMDGPUSwLowerLDSLegacyPass(*PR);
415416
initializeAMDGPUAttributorLegacyPass(*PR);
416417
initializeAMDGPUAnnotateKernelFeaturesPass(*PR);
417418
initializeAMDGPUAnnotateUniformValuesLegacyPass(*PR);

llvm/lib/Target/AMDGPU/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ add_llvm_target(AMDGPUCodeGen
7474
AMDGPULowerKernelArguments.cpp
7575
AMDGPULowerKernelAttributes.cpp
7676
AMDGPULowerModuleLDSPass.cpp
77+
AMDGPUSwLowerLDS.cpp
7778
AMDGPUMachineFunction.cpp
7879
AMDGPUMachineModuleInfo.cpp
7980
AMDGPUMacroFusion.cpp

llvm/test/CodeGen/AMDGPU/amdgpu-sw-lower-lds-dynamic-indirect-access-asan.ll

Lines changed: 260 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals all --version 4
2+
; RUN: opt < %s -passes=amdgpu-sw-lower-lds -amdgpu-asan-instrument-lds=false -S -mtriple=amdgcn-amd-amdhsa | FileCheck %s
3+
4+
; Test to check indirect dynamic LDS access through a non-kernel from kernel is lowered correctly.
5+
@lds_1 = internal addrspace(3) global [1 x i8] poison, align 1
6+
@lds_2 = internal addrspace(3) global [1 x i32] poison, align 2
7+
@lds_3 = external addrspace(3) global [0 x i8], align 4
8+
@lds_4 = external addrspace(3) global [0 x i8], align 8
9+
10+
;.
11+
; CHECK: @llvm.amdgcn.sw.lds.k0 = internal addrspace(3) global ptr poison, no_sanitize_address, align 8, !absolute_symbol [[META0:![0-9]+]]
12+
; CHECK: @llvm.amdgcn.k0.dynlds = external addrspace(3) global [0 x i8], no_sanitize_address, align 8, !absolute_symbol [[META1:![0-9]+]]
13+
; CHECK: @llvm.amdgcn.sw.lds.k0.md = internal addrspace(1) global %llvm.amdgcn.sw.lds.k0.md.type { %llvm.amdgcn.sw.lds.k0.md.item { i32 0, i32 8, i32 32 }, %llvm.amdgcn.sw.lds.k0.md.item { i32 32, i32 1, i32 32 }, %llvm.amdgcn.sw.lds.k0.md.item { i32 64, i32 4, i32 32 }, %llvm.amdgcn.sw.lds.k0.md.item { i32 96, i32 0, i32 32 }, %llvm.amdgcn.sw.lds.k0.md.item { i32 128, i32 0, i32 32 } }, no_sanitize_address
14+
; @llvm.amdgcn.sw.lds.base.table = internal addrspace(1) constant [1 x ptr addrspace(3)] [ptr addrspace(3) @llvm.amdgcn.sw.lds.k0], no_sanitize_address
15+
; @llvm.amdgcn.sw.lds.offset.table = internal addrspace(1) constant [1 x [2 x ptr addrspace(1)]] [[2 x ptr addrspace(1)] [ptr addrspace(1) getelementptr inbounds (%llvm.amdgcn.sw.lds.k0.md.type, ptr addrspace(1) @llvm.amdgcn.sw.lds.k0.md, i32 0, i32 3, i32 0), ptr addrspace(1) getelementptr inbounds (%llvm.amdgcn.sw.lds.k0.md.type, ptr addrspace(1) @llvm.amdgcn.sw.lds.k0.md, i32 0, i32 4, i32 0)]], no_sanitize_address
16+
;.
17+
define void @use_variables() sanitize_address {
18+
; CHECK-LABEL: define void @use_variables(
19+
; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
20+
; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.lds.kernel.id()
21+
; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds [1 x ptr addrspace(3)], ptr addrspace(1) @llvm.amdgcn.sw.lds.base.table, i32 0, i32 [[TMP1]]
22+
; CHECK-NEXT: [[TMP4:%.*]] = load ptr addrspace(3), ptr addrspace(1) [[TMP2]], align 4
23+
; CHECK-NEXT: [[TMP7:%.*]] = load ptr addrspace(1), ptr addrspace(3) [[TMP4]], align 8
24+
; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds [1 x [2 x ptr addrspace(1)]], ptr addrspace(1) @llvm.amdgcn.sw.lds.offset.table, i32 0, i32 [[TMP1]], i32 0
25+
; CHECK-NEXT: [[TMP5:%.*]] = load ptr addrspace(1), ptr addrspace(1) [[TMP6]], align 8
26+
; CHECK-NEXT: [[TMP8:%.*]] = load i32, ptr addrspace(1) [[TMP5]], align 4
27+
; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i8, ptr addrspace(3) [[TMP4]], i32 [[TMP8]]
28+
; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds [1 x [2 x ptr addrspace(1)]], ptr addrspace(1) @llvm.amdgcn.sw.lds.offset.table, i32 0, i32 [[TMP1]], i32 1
29+
; CHECK-NEXT: [[TMP12:%.*]] = load ptr addrspace(1), ptr addrspace(1) [[TMP11]], align 8
30+
; CHECK-NEXT: [[TMP10:%.*]] = load i32, ptr addrspace(1) [[TMP12]], align 4
31+
; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds i8, ptr addrspace(3) [[TMP4]], i32 [[TMP10]]
32+
; CHECK-NEXT: [[TMP13:%.*]] = ptrtoint ptr addrspace(3) [[TMP9]] to i32
33+
; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[TMP7]], i32 [[TMP13]]
34+
; CHECK-NEXT: store i8 3, ptr addrspace(1) [[TMP14]], align 4
35+
; CHECK-NEXT: [[TMP30:%.*]] = ptrtoint ptr addrspace(3) [[TMP15]] to i32
36+
; CHECK-NEXT: [[TMP31:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[TMP7]], i32 [[TMP30]]
37+
; CHECK-NEXT: store i8 3, ptr addrspace(1) [[TMP31]], align 8
38+
; CHECK-NEXT: ret void
39+
;
40+
store i8 3, ptr addrspace(3) @lds_3, align 4
41+
store i8 3, ptr addrspace(3) @lds_4, align 8
42+
ret void
43+
}
44+
45+
define amdgpu_kernel void @k0() sanitize_address {
46+
; CHECK-LABEL: define amdgpu_kernel void @k0(
47+
; CHECK-SAME: ) #[[ATTR1:[0-9]+]] !llvm.amdgcn.lds.kernel.id [[META2:![0-9]+]] {
48+
; CHECK-NEXT: WId:
49+
; CHECK-NEXT: [[TMP0:%.*]] = call i32 @llvm.amdgcn.workitem.id.x()
50+
; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.workitem.id.y()
51+
; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.amdgcn.workitem.id.z()
52+
; CHECK-NEXT: [[TMP3:%.*]] = or i32 [[TMP0]], [[TMP1]]
53+
; CHECK-NEXT: [[TMP4:%.*]] = or i32 [[TMP3]], [[TMP2]]
54+
; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i32 [[TMP4]], 0
55+
; CHECK-NEXT: br i1 [[TMP5]], label [[MALLOC:%.*]], label [[TMP21:%.*]]
56+
; CHECK: Malloc:
57+
; CHECK-NEXT: [[TMP9:%.*]] = load i32, ptr addrspace(1) getelementptr inbounds ([[LLVM_AMDGCN_SW_LDS_K0_MD_TYPE:%.*]], ptr addrspace(1) @llvm.amdgcn.sw.lds.k0.md, i32 0, i32 2, i32 0), align 4
58+
; CHECK-NEXT: [[TMP7:%.*]] = load i32, ptr addrspace(1) getelementptr inbounds ([[LLVM_AMDGCN_SW_LDS_K0_MD_TYPE]], ptr addrspace(1) @llvm.amdgcn.sw.lds.k0.md, i32 0, i32 2, i32 2), align 4
59+
; CHECK-NEXT: [[TMP8:%.*]] = add i32 [[TMP9]], [[TMP7]]
60+
; CHECK-NEXT: [[TMP6:%.*]] = call ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
61+
; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds ptr addrspace(4), ptr addrspace(4) [[TMP6]], i64 15
62+
; CHECK-NEXT: store i32 [[TMP8]], ptr addrspace(1) getelementptr inbounds ([[LLVM_AMDGCN_SW_LDS_K0_MD_TYPE]], ptr addrspace(1) @llvm.amdgcn.sw.lds.k0.md, i32 0, i32 3, i32 0), align 4
63+
; CHECK-NEXT: [[TMP11:%.*]] = load i32, ptr addrspace(4) [[TMP10]], align 4
64+
; CHECK-NEXT: store i32 [[TMP11]], ptr addrspace(1) getelementptr inbounds ([[LLVM_AMDGCN_SW_LDS_K0_MD_TYPE]], ptr addrspace(1) @llvm.amdgcn.sw.lds.k0.md, i32 0, i32 3, i32 1), align 4
65+
; CHECK-NEXT: [[TMP12:%.*]] = add i32 [[TMP11]], 7
66+
; CHECK-NEXT: [[TMP13:%.*]] = udiv i32 [[TMP12]], 8
67+
; CHECK-NEXT: [[TMP14:%.*]] = mul i32 [[TMP13]], 8
68+
; CHECK-NEXT: store i32 [[TMP14]], ptr addrspace(1) getelementptr inbounds ([[LLVM_AMDGCN_SW_LDS_K0_MD_TYPE]], ptr addrspace(1) @llvm.amdgcn.sw.lds.k0.md, i32 0, i32 3, i32 2), align 4
69+
; CHECK-NEXT: [[TMP15:%.*]] = add i32 [[TMP8]], [[TMP14]]
70+
; CHECK-NEXT: store i32 [[TMP15]], ptr addrspace(1) getelementptr inbounds ([[LLVM_AMDGCN_SW_LDS_K0_MD_TYPE]], ptr addrspace(1) @llvm.amdgcn.sw.lds.k0.md, i32 0, i32 4, i32 0), align 4
71+
; CHECK-NEXT: [[TMP27:%.*]] = load i32, ptr addrspace(4) [[TMP10]], align 4
72+
; CHECK-NEXT: store i32 [[TMP27]], ptr addrspace(1) getelementptr inbounds ([[LLVM_AMDGCN_SW_LDS_K0_MD_TYPE]], ptr addrspace(1) @llvm.amdgcn.sw.lds.k0.md, i32 0, i32 4, i32 1), align 4
73+
; CHECK-NEXT: [[TMP17:%.*]] = add i32 [[TMP27]], 7
74+
; CHECK-NEXT: [[TMP18:%.*]] = udiv i32 [[TMP17]], 8
75+
; CHECK-NEXT: [[TMP19:%.*]] = mul i32 [[TMP18]], 8
76+
; CHECK-NEXT: store i32 [[TMP19]], ptr addrspace(1) getelementptr inbounds ([[LLVM_AMDGCN_SW_LDS_K0_MD_TYPE]], ptr addrspace(1) @llvm.amdgcn.sw.lds.k0.md, i32 0, i32 4, i32 2), align 4
77+
; CHECK-NEXT: [[TMP28:%.*]] = add i32 [[TMP15]], [[TMP19]]
78+
; CHECK-NEXT: [[TMP26:%.*]] = zext i32 [[TMP28]] to i64
79+
; CHECK-NEXT: [[TMP22:%.*]] = call ptr @llvm.returnaddress(i32 0)
80+
; CHECK-NEXT: [[TMP23:%.*]] = ptrtoint ptr [[TMP22]] to i64
81+
; CHECK-NEXT: [[TMP35:%.*]] = call i64 @__asan_malloc_impl(i64 [[TMP26]], i64 [[TMP23]])
82+
; CHECK-NEXT: [[TMP20:%.*]] = inttoptr i64 [[TMP35]] to ptr addrspace(1)
83+
; CHECK-NEXT: store ptr addrspace(1) [[TMP20]], ptr addrspace(3) @llvm.amdgcn.sw.lds.k0, align 8
84+
; CHECK-NEXT: [[TMP36:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[TMP20]], i64 8
85+
; CHECK-NEXT: [[TMP37:%.*]] = ptrtoint ptr addrspace(1) [[TMP36]] to i64
86+
; CHECK-NEXT: call void @__asan_poison_region(i64 [[TMP37]], i64 24)
87+
; CHECK-NEXT: [[TMP53:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[TMP20]], i64 33
88+
; CHECK-NEXT: [[TMP73:%.*]] = ptrtoint ptr addrspace(1) [[TMP53]] to i64
89+
; CHECK-NEXT: call void @__asan_poison_region(i64 [[TMP73]], i64 31)
90+
; CHECK-NEXT: [[TMP74:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[TMP20]], i64 68
91+
; CHECK-NEXT: [[TMP75:%.*]] = ptrtoint ptr addrspace(1) [[TMP74]] to i64
92+
; CHECK-NEXT: call void @__asan_poison_region(i64 [[TMP75]], i64 28)
93+
; CHECK-NEXT: br label [[TMP21]]
94+
; CHECK: 32:
95+
; CHECK-NEXT: [[XYZCOND:%.*]] = phi i1 [ false, [[WID:%.*]] ], [ true, [[MALLOC]] ]
96+
; CHECK-NEXT: call void @llvm.amdgcn.s.barrier()
97+
; CHECK-NEXT: [[TMP31:%.*]] = load ptr addrspace(1), ptr addrspace(3) @llvm.amdgcn.sw.lds.k0, align 8
98+
; CHECK-NEXT: [[TMP24:%.*]] = load i32, ptr addrspace(1) getelementptr inbounds ([[LLVM_AMDGCN_SW_LDS_K0_MD_TYPE]], ptr addrspace(1) @llvm.amdgcn.sw.lds.k0.md, i32 0, i32 1, i32 0), align 4
99+
; CHECK-NEXT: [[TMP25:%.*]] = getelementptr inbounds i8, ptr addrspace(3) @llvm.amdgcn.sw.lds.k0, i32 [[TMP24]]
100+
; CHECK-NEXT: [[TMP29:%.*]] = load i32, ptr addrspace(1) getelementptr inbounds ([[LLVM_AMDGCN_SW_LDS_K0_MD_TYPE]], ptr addrspace(1) @llvm.amdgcn.sw.lds.k0.md, i32 0, i32 2, i32 0), align 4
101+
; CHECK-NEXT: [[TMP30:%.*]] = getelementptr inbounds i8, ptr addrspace(3) @llvm.amdgcn.sw.lds.k0, i32 [[TMP29]]
102+
; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.k0.dynlds) ]
103+
; CHECK-NEXT: call void @use_variables()
104+
; CHECK-NEXT: [[TMP38:%.*]] = ptrtoint ptr addrspace(3) [[TMP25]] to i32
105+
; CHECK-NEXT: [[TMP39:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[TMP31]], i32 [[TMP38]]
106+
; CHECK-NEXT: store i8 7, ptr addrspace(1) [[TMP39]], align 1
107+
; CHECK-NEXT: [[TMP55:%.*]] = ptrtoint ptr addrspace(3) [[TMP30]] to i32
108+
; CHECK-NEXT: [[TMP56:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[TMP31]], i32 [[TMP55]]
109+
; CHECK-NEXT: store i32 8, ptr addrspace(1) [[TMP56]], align 2
110+
; CHECK-NEXT: br label [[CONDFREE1:%.*]]
111+
; CHECK: CondFree:
112+
; CHECK-NEXT: call void @llvm.amdgcn.s.barrier()
113+
; CHECK-NEXT: br i1 [[XYZCOND]], label [[FREE:%.*]], label [[END:%.*]]
114+
; CHECK: Free:
115+
; CHECK-NEXT: [[TMP32:%.*]] = call ptr @llvm.returnaddress(i32 0)
116+
; CHECK-NEXT: [[TMP33:%.*]] = ptrtoint ptr [[TMP32]] to i64
117+
; CHECK-NEXT: [[TMP34:%.*]] = ptrtoint ptr addrspace(1) [[TMP31]] to i64
118+
; CHECK-NEXT: call void @__asan_free_impl(i64 [[TMP34]], i64 [[TMP33]])
119+
; CHECK-NEXT: br label [[END]]
120+
; CHECK: End:
121+
; CHECK-NEXT: ret void
122+
;
123+
call void @use_variables()
124+
store i8 7, ptr addrspace(3) @lds_1, align 1
125+
store i32 8, ptr addrspace(3) @lds_2, align 2
126+
ret void
127+
}
128+
129+
!llvm.module.flags = !{!0}
130+
!0 = !{i32 4, !"nosanitize_address", i32 1}
131+
132+
;.
133+
; CHECK: attributes #[[ATTR0]] = { sanitize_address }
134+
; CHECK: attributes #[[ATTR1]] = { sanitize_address "amdgpu-lds-size"="8,8" }
135+
; CHECK: attributes #[[ATTR2:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(none) }
136+
; CHECK: attributes #[[ATTR3:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
137+
; CHECK: attributes #[[ATTR4:[0-9]+]] = { convergent nocallback nofree nounwind willreturn }
138+
;.
139+
; CHECK: [[META0]] = !{i32 0, i32 1}
140+
; CHECK: [[META1]] = !{i32 8, i32 9}
141+
; CHECK: [[META2]] = !{i32 0}
142+
;.
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals all --version 4
2+
; RUN: opt < %s -passes=amdgpu-sw-lower-lds -S -mtriple=amdgcn-amd-amdhsa | FileCheck %s
3+
4+
; Test to check if direct access of dynamic LDS in kernel is lowered correctly.
5+
@lds_1 = external addrspace(3) global [0 x i8]
6+
@lds_2 = external addrspace(3) global [0 x i8]
7+
8+
;.
9+
; CHECK: @llvm.amdgcn.sw.lds.k0 = internal addrspace(3) global ptr poison, no_sanitize_address, align 1, !absolute_symbol [[META0:![0-9]+]]
10+
; CHECK: @llvm.amdgcn.k0.dynlds = external addrspace(3) global [0 x i8], no_sanitize_address, align 1, !absolute_symbol [[META1:![0-9]+]]
11+
; CHECK: @llvm.amdgcn.sw.lds.k0.md = internal addrspace(1) global %llvm.amdgcn.sw.lds.k0.md.type { %llvm.amdgcn.sw.lds.k0.md.item { i32 0, i32 8, i32 32 }, %llvm.amdgcn.sw.lds.k0.md.item { i32 32, i32 0, i32 32 } }, no_sanitize_address
12+
;.
13+
define amdgpu_kernel void @k0() sanitize_address {
14+
; CHECK-LABEL: define amdgpu_kernel void @k0(
15+
; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
16+
; CHECK-NEXT: WId:
17+
; CHECK-NEXT: [[TMP0:%.*]] = call i32 @llvm.amdgcn.workitem.id.x()
18+
; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.workitem.id.y()
19+
; CHECK-NEXT: [[TMP2:%.*]] = call i32 @llvm.amdgcn.workitem.id.z()
20+
; CHECK-NEXT: [[TMP3:%.*]] = or i32 [[TMP0]], [[TMP1]]
21+
; CHECK-NEXT: [[TMP4:%.*]] = or i32 [[TMP3]], [[TMP2]]
22+
; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i32 [[TMP4]], 0
23+
; CHECK-NEXT: br i1 [[TMP5]], label [[MALLOC:%.*]], label [[TMP7:%.*]]
24+
; CHECK: Malloc:
25+
; CHECK-NEXT: [[TMP8:%.*]] = load i32, ptr addrspace(1) @llvm.amdgcn.sw.lds.k0.md, align 4
26+
; CHECK-NEXT: [[TMP9:%.*]] = load i32, ptr addrspace(1) getelementptr inbounds ([[LLVM_AMDGCN_SW_LDS_K0_MD_TYPE:%.*]], ptr addrspace(1) @llvm.amdgcn.sw.lds.k0.md, i32 0, i32 0, i32 2), align 4
27+
; CHECK-NEXT: [[TMP24:%.*]] = add i32 [[TMP8]], [[TMP9]]
28+
; CHECK-NEXT: [[TMP20:%.*]] = call ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
29+
; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds ptr addrspace(4), ptr addrspace(4) [[TMP20]], i64 15
30+
; CHECK-NEXT: store i32 [[TMP24]], ptr addrspace(1) getelementptr inbounds ([[LLVM_AMDGCN_SW_LDS_K0_MD_TYPE]], ptr addrspace(1) @llvm.amdgcn.sw.lds.k0.md, i32 0, i32 1, i32 0), align 4
31+
; CHECK-NEXT: [[TMP13:%.*]] = load i32, ptr addrspace(4) [[TMP18]], align 4
32+
; CHECK-NEXT: store i32 [[TMP13]], ptr addrspace(1) getelementptr inbounds ([[LLVM_AMDGCN_SW_LDS_K0_MD_TYPE]], ptr addrspace(1) @llvm.amdgcn.sw.lds.k0.md, i32 0, i32 1, i32 1), align 4
33+
; CHECK-NEXT: [[TMP14:%.*]] = add i32 [[TMP13]], 0
34+
; CHECK-NEXT: [[TMP15:%.*]] = udiv i32 [[TMP14]], 1
35+
; CHECK-NEXT: [[TMP16:%.*]] = mul i32 [[TMP15]], 1
36+
; CHECK-NEXT: store i32 [[TMP16]], ptr addrspace(1) getelementptr inbounds ([[LLVM_AMDGCN_SW_LDS_K0_MD_TYPE]], ptr addrspace(1) @llvm.amdgcn.sw.lds.k0.md, i32 0, i32 1, i32 2), align 4
37+
; CHECK-NEXT: [[TMP17:%.*]] = add i32 [[TMP24]], [[TMP16]]
38+
; CHECK-NEXT: [[TMP21:%.*]] = zext i32 [[TMP17]] to i64
39+
; CHECK-NEXT: [[TMP22:%.*]] = call ptr @llvm.returnaddress(i32 0)
40+
; CHECK-NEXT: [[TMP23:%.*]] = ptrtoint ptr [[TMP22]] to i64
41+
; CHECK-NEXT: [[TMP19:%.*]] = call i64 @__asan_malloc_impl(i64 [[TMP21]], i64 [[TMP23]])
42+
; CHECK-NEXT: [[TMP6:%.*]] = inttoptr i64 [[TMP19]] to ptr addrspace(1)
43+
; CHECK-NEXT: store ptr addrspace(1) [[TMP6]], ptr addrspace(3) @llvm.amdgcn.sw.lds.k0, align 8
44+
; CHECK-NEXT: [[TMP42:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[TMP6]], i64 8
45+
; CHECK-NEXT: [[TMP44:%.*]] = ptrtoint ptr addrspace(1) [[TMP42]] to i64
46+
; CHECK-NEXT: call void @__asan_poison_region(i64 [[TMP44]], i64 24)
47+
; CHECK-NEXT: br label [[TMP7]]
48+
; CHECK: 23:
49+
; CHECK-NEXT: [[XYZCOND:%.*]] = phi i1 [ false, [[WID:%.*]] ], [ true, [[MALLOC]] ]
50+
; CHECK-NEXT: call void @llvm.amdgcn.s.barrier()
51+
; CHECK-NEXT: [[TMP28:%.*]] = load ptr addrspace(1), ptr addrspace(3) @llvm.amdgcn.sw.lds.k0, align 8
52+
; CHECK-NEXT: [[TMP10:%.*]] = load i32, ptr addrspace(1) getelementptr inbounds ([[LLVM_AMDGCN_SW_LDS_K0_MD_TYPE]], ptr addrspace(1) @llvm.amdgcn.sw.lds.k0.md, i32 0, i32 1, i32 0), align 4
53+
; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i8, ptr addrspace(3) @llvm.amdgcn.sw.lds.k0, i32 [[TMP10]]
54+
; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.k0.dynlds) ]
55+
; CHECK-NEXT: [[TMP45:%.*]] = ptrtoint ptr addrspace(3) [[TMP11]] to i32
56+
; CHECK-NEXT: [[TMP46:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[TMP28]], i32 [[TMP45]]
57+
; CHECK-NEXT: [[TMP29:%.*]] = ptrtoint ptr addrspace(1) [[TMP46]] to i64
58+
; CHECK-NEXT: [[TMP30:%.*]] = lshr i64 [[TMP29]], 3
59+
; CHECK-NEXT: [[TMP31:%.*]] = add i64 [[TMP30]], 2147450880
60+
; CHECK-NEXT: [[TMP32:%.*]] = inttoptr i64 [[TMP31]] to ptr
61+
; CHECK-NEXT: [[TMP33:%.*]] = load i8, ptr [[TMP32]], align 1
62+
; CHECK-NEXT: [[TMP34:%.*]] = icmp ne i8 [[TMP33]], 0
63+
; CHECK-NEXT: [[TMP35:%.*]] = and i64 [[TMP29]], 7
64+
; CHECK-NEXT: [[TMP36:%.*]] = trunc i64 [[TMP35]] to i8
65+
; CHECK-NEXT: [[TMP37:%.*]] = icmp sge i8 [[TMP36]], [[TMP33]]
66+
; CHECK-NEXT: [[TMP38:%.*]] = and i1 [[TMP34]], [[TMP37]]
67+
; CHECK-NEXT: [[TMP39:%.*]] = call i64 @llvm.amdgcn.ballot.i64(i1 [[TMP38]])
68+
; CHECK-NEXT: [[TMP40:%.*]] = icmp ne i64 [[TMP39]], 0
69+
; CHECK-NEXT: br i1 [[TMP40]], label [[ASAN_REPORT:%.*]], label [[TMP43:%.*]], !prof [[PROF2:![0-9]+]]
70+
; CHECK: asan.report:
71+
; CHECK-NEXT: br i1 [[TMP38]], label [[TMP41:%.*]], label [[CONDFREE:%.*]]
72+
; CHECK: 41:
73+
; CHECK-NEXT: call void @__asan_report_store1(i64 [[TMP29]]) #[[ATTR6:[0-9]+]]
74+
; CHECK-NEXT: call void @llvm.amdgcn.unreachable()
75+
; CHECK-NEXT: br label [[CONDFREE]]
76+
; CHECK: 42:
77+
; CHECK-NEXT: br label [[TMP43]]
78+
; CHECK: 43:
79+
; CHECK-NEXT: store i8 7, ptr addrspace(1) [[TMP46]], align 4
80+
; CHECK-NEXT: br label [[CONDFREE1:%.*]]
81+
; CHECK: CondFree:
82+
; CHECK-NEXT: call void @llvm.amdgcn.s.barrier()
83+
; CHECK-NEXT: br i1 [[XYZCOND]], label [[FREE:%.*]], label [[END:%.*]]
84+
; CHECK: Free:
85+
; CHECK-NEXT: [[TMP25:%.*]] = call ptr @llvm.returnaddress(i32 0)
86+
; CHECK-NEXT: [[TMP26:%.*]] = ptrtoint ptr [[TMP25]] to i64
87+
; CHECK-NEXT: [[TMP27:%.*]] = ptrtoint ptr addrspace(1) [[TMP28]] to i64
88+
; CHECK-NEXT: call void @__asan_free_impl(i64 [[TMP27]], i64 [[TMP26]])
89+
; CHECK-NEXT: br label [[END]]
90+
; CHECK: End:
91+
; CHECK-NEXT: ret void
92+
;
93+
store i8 7, ptr addrspace(3) @lds_1, align 4
94+
;store i8 8, ptr addrspace(3) @lds_2, align 8
95+
ret void
96+
}
97+
98+
!llvm.module.flags = !{!0}
99+
!0 = !{i32 4, !"nosanitize_address", i32 1}
100+
101+
;.
102+
; CHECK: attributes #[[ATTR0]] = { sanitize_address "amdgpu-lds-size"="8,8" }
103+
; CHECK: attributes #[[ATTR1:[0-9]+]] = { nocallback nofree nosync nounwind willreturn memory(none) }
104+
; CHECK: attributes #[[ATTR2:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
105+
; CHECK: attributes #[[ATTR3:[0-9]+]] = { convergent nocallback nofree nounwind willreturn }
106+
; CHECK: attributes #[[ATTR4:[0-9]+]] = { convergent nocallback nofree nounwind willreturn memory(none) }
107+
; CHECK: attributes #[[ATTR5:[0-9]+]] = { convergent nocallback nofree nounwind }
108+
; CHECK: attributes #[[ATTR6]] = { nomerge }
109+
;.
110+
; CHECK: [[META0]] = !{i32 0, i32 1}
111+
; CHECK: [[META1]] = !{i32 8, i32 9}
112+
; CHECK: [[PROF2]] = !{!"branch_weights", i32 1, i32 1048575}
113+
;.

0 commit comments

Comments
 (0)