Skip to content

Commit dce3365

Browse files
AlexVlxcjdb
authored andcommitted
[clang][CodeGen][SPIR-V][AMDGPU] Tweak AMDGCNSPIRV ABI to allow for the correct handling of aggregates passed to kernels / functions. (llvm#102776)
The AMDGPU kernel ABI is not directly representable in SPIR-V, since it relies on passing aggregates `byref`, and SPIR-V only encodes `byval` (which the AMDGPU BE disallows for kernel arguments). As a temporary solution to this mismatch, we add special handling for AMDGCN flavoured SPIR-V, whereby aggregates are passed as direct, both to kernels and to normal functions. This is not ideal (there are pathological cases where performance is heavily impacted), but empirically robust and guaranteed to work as the AMDGPU BE retains handling of `direct` passing for legacy reasons. We will revisit this in the future, but as it stands it is enough to pass a wide array of integration tests and generates correct SPIR-V and correct reverse translation into LLVM IR. The amdgpu-kernel-arg-pointer-type test is updated via the automated script, and thus becomes quite noisy.
1 parent 970eb55 commit dce3365

File tree

3 files changed

+729
-70
lines changed

3 files changed

+729
-70
lines changed

clang/lib/CodeGen/Targets/SPIR.cpp

Lines changed: 63 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,9 @@ class SPIRVABIInfo : public CommonSPIRABIInfo {
3232
void computeInfo(CGFunctionInfo &FI) const override;
3333

3434
private:
35+
ABIArgInfo classifyReturnType(QualType RetTy) const;
3536
ABIArgInfo classifyKernelArgumentType(QualType Ty) const;
37+
ABIArgInfo classifyArgumentType(QualType Ty) const;
3638
};
3739
} // end anonymous namespace
3840
namespace {
@@ -64,6 +66,27 @@ void CommonSPIRABIInfo::setCCs() {
6466
RuntimeCC = llvm::CallingConv::SPIR_FUNC;
6567
}
6668

69+
ABIArgInfo SPIRVABIInfo::classifyReturnType(QualType RetTy) const {
70+
if (getTarget().getTriple().getVendor() != llvm::Triple::AMD)
71+
return DefaultABIInfo::classifyReturnType(RetTy);
72+
if (!isAggregateTypeForABI(RetTy) || getRecordArgABI(RetTy, getCXXABI()))
73+
return DefaultABIInfo::classifyReturnType(RetTy);
74+
75+
if (const RecordType *RT = RetTy->getAs<RecordType>()) {
76+
const RecordDecl *RD = RT->getDecl();
77+
if (RD->hasFlexibleArrayMember())
78+
return DefaultABIInfo::classifyReturnType(RetTy);
79+
}
80+
81+
// TODO: The AMDGPU ABI is non-trivial to represent in SPIR-V; in order to
82+
// avoid encoding various architecture specific bits here we return everything
83+
// as direct to retain type info for things like aggregates, for later perusal
84+
// when translating back to LLVM/lowering in the BE. This is also why we
85+
// disable flattening as the outcomes can mismatch between SPIR-V and AMDGPU.
86+
// This will be revisited / optimised in the future.
87+
return ABIArgInfo::getDirect(CGT.ConvertType(RetTy), 0u, nullptr, false);
88+
}
89+
6790
ABIArgInfo SPIRVABIInfo::classifyKernelArgumentType(QualType Ty) const {
6891
if (getContext().getLangOpts().CUDAIsDevice) {
6992
// Coerce pointer arguments with default address space to CrossWorkGroup
@@ -78,18 +101,51 @@ ABIArgInfo SPIRVABIInfo::classifyKernelArgumentType(QualType Ty) const {
78101
return ABIArgInfo::getDirect(LTy, 0, nullptr, false);
79102
}
80103

81-
// Force copying aggregate type in kernel arguments by value when
82-
// compiling CUDA targeting SPIR-V. This is required for the object
83-
// copied to be valid on the device.
84-
// This behavior follows the CUDA spec
85-
// https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#global-function-argument-processing,
86-
// and matches the NVPTX implementation.
87-
if (isAggregateTypeForABI(Ty))
104+
if (isAggregateTypeForABI(Ty)) {
105+
if (getTarget().getTriple().getVendor() == llvm::Triple::AMD)
106+
// TODO: The AMDGPU kernel ABI passes aggregates byref, which is not
107+
// currently expressible in SPIR-V; SPIR-V passes aggregates byval,
108+
// which the AMDGPU kernel ABI does not allow. Passing aggregates as
109+
// direct works around this impedance mismatch, as it retains type info
110+
// and can be correctly handled, post reverse-translation, by the AMDGPU
111+
// BE, which has to support this CC for legacy OpenCL purposes. It can
112+
// be brittle and does lead to performance degradation in certain
113+
// pathological cases. This will be revisited / optimised in the future,
114+
// once a way to deal with the byref/byval impedance mismatch is
115+
// identified.
116+
return ABIArgInfo::getDirect(LTy, 0, nullptr, false);
117+
// Force copying aggregate type in kernel arguments by value when
118+
// compiling CUDA targeting SPIR-V. This is required for the object
119+
// copied to be valid on the device.
120+
// This behavior follows the CUDA spec
121+
// https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#global-function-argument-processing,
122+
// and matches the NVPTX implementation.
88123
return getNaturalAlignIndirect(Ty, /* byval */ true);
124+
}
89125
}
90126
return classifyArgumentType(Ty);
91127
}
92128

129+
ABIArgInfo SPIRVABIInfo::classifyArgumentType(QualType Ty) const {
130+
if (getTarget().getTriple().getVendor() != llvm::Triple::AMD)
131+
return DefaultABIInfo::classifyArgumentType(Ty);
132+
if (!isAggregateTypeForABI(Ty))
133+
return DefaultABIInfo::classifyArgumentType(Ty);
134+
135+
// Records with non-trivial destructors/copy-constructors should not be
136+
// passed by value.
137+
if (auto RAA = getRecordArgABI(Ty, getCXXABI()))
138+
return getNaturalAlignIndirect(Ty, RAA == CGCXXABI::RAA_DirectInMemory);
139+
140+
if (const RecordType *RT = Ty->getAs<RecordType>()) {
141+
const RecordDecl *RD = RT->getDecl();
142+
if (RD->hasFlexibleArrayMember())
143+
return DefaultABIInfo::classifyArgumentType(Ty);
144+
}
145+
146+
return ABIArgInfo::getDirect(CGT.ConvertType(Ty), 0u, nullptr, false);
147+
}
148+
93149
void SPIRVABIInfo::computeInfo(CGFunctionInfo &FI) const {
94150
// The logic is same as in DefaultABIInfo with an exception on the kernel
95151
// arguments handling.

0 commit comments

Comments
 (0)