Skip to content

[clang][CodeGen] sret args should always point to the alloca AS, so use that #114062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 55 commits into from
Feb 14, 2025
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
d2d2d3d
`sret` args should always point to the `alloca` AS, so we can use that.
AlexVlx Oct 29, 2024
693253d
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Oct 29, 2024
b5a7df0
Fix broken tests.
AlexVlx Oct 29, 2024
f6cff66
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Oct 29, 2024
2de33d4
Handle passing an `alloca`ed `sret` arg directly to a callee that exp…
AlexVlx Oct 30, 2024
6d9cb89
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Nov 1, 2024
b209d67
Add query for a possible target specific indirect arg AS.
AlexVlx Nov 2, 2024
ac6367b
Add more context to test.
AlexVlx Nov 2, 2024
c8f03e7
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Nov 4, 2024
24d8edb
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Nov 5, 2024
5ccd554
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Nov 6, 2024
9ff1d0d
Extend Indirect Args to carry an address space.
AlexVlx Nov 6, 2024
1c3e67c
Fix formatting.
AlexVlx Nov 6, 2024
c9288fc
Drop vestigial target hook.
AlexVlx Nov 7, 2024
99e03a2
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Nov 13, 2024
013790c
Tweak handling potential AS mismatches.
AlexVlx Nov 15, 2024
c4bdeab
Fix formatting.
AlexVlx Nov 15, 2024
5afb40e
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Nov 18, 2024
d07d63d
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Nov 22, 2024
eeb54e4
Remove lie.
AlexVlx Nov 24, 2024
6c0ef88
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Dec 4, 2024
abab201
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Dec 4, 2024
6e78db1
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Dec 4, 2024
7d45638
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Dec 5, 2024
f16d1d9
Generalise placing `sret`/returns in the alloca AS; remove risky defa…
AlexVlx Dec 5, 2024
0277516
Fix formatting.
AlexVlx Dec 5, 2024
056c9ec
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Dec 28, 2024
207a2ae
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Dec 29, 2024
d8bd7ab
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Jan 5, 2025
f6c8e01
Add helper accessor for `LangAS::Default -> TargetAS` queries.
AlexVlx Jan 5, 2025
0f724f8
Align AMDGPU argument classification.
AlexVlx Jan 5, 2025
7158b8d
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Jan 7, 2025
8f472f3
Tweak Swift's use of AS aware `getIndirect`.
AlexVlx Jan 7, 2025
2bdb085
Fix formatting.
AlexVlx Jan 7, 2025
99101fb
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Jan 8, 2025
86093c2
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Jan 8, 2025
4b47cd7
Remove helper, switch to using the AllocaAS for all indirects.
AlexVlx Jan 8, 2025
d103255
Fix Swift mismatch.
AlexVlx Jan 8, 2025
e325239
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Jan 8, 2025
27ef889
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Jan 14, 2025
260e96d
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Jan 23, 2025
5227aef
Fix leftover LangAS::Default.
AlexVlx Jan 23, 2025
94b51d5
Fix leftover use of LangAS::Default.
AlexVlx Jan 23, 2025
53d8462
Apply formatting suggestions.
AlexVlx Jan 23, 2025
4d2b9f7
Fix formatting.
AlexVlx Jan 23, 2025
d9595fc
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Jan 23, 2025
3acc4ff
Fix typo.
AlexVlx Jan 23, 2025
69b7937
Add test.
AlexVlx Jan 23, 2025
ddaccb8
Fix formatting (again).
AlexVlx Jan 23, 2025
f442024
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Jan 28, 2025
939af07
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Feb 3, 2025
05f0701
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Feb 10, 2025
3e10da3
Update clang/test/CodeGenOpenCL/implicit-addrspacecast-function-param…
arsenm Feb 13, 2025
0867735
Merge branch 'main' into sret_fixes
arsenm Feb 13, 2025
553ac57
Merge branch 'main' of https://github.com/llvm/llvm-project into sret…
AlexVlx Feb 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions clang/include/clang/Basic/TargetInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -1780,6 +1780,14 @@ class TargetInfo : public TransferrableTargetInfo,
return 0;
}

/// \returns Target specific address space for indirect (e.g. sret) arguments.
/// If such an address space exists, it must be convertible to and from the
/// alloca address space. If it does not, std::nullopt is returned and the
/// alloca address space will be used.
virtual std::optional<unsigned> getIndirectArgAddressSpace() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this to be a LangAS, since that's what our address-space conversion lowerings are generally expressed in terms of. This also has the advantage of avoiding a lot of heartache with implicit conversions around ABIInfo::getIndirect, since LangAS is a scoped enum. And LangAS::Default is a much more reasonable default argument for things like ABIArgInfo::getIndirect than IR addrspace 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also shouldn't be optional. There always must be a definitive IR address space.

Also I'm not sure I follow why this is still necessary if you've modified getIndirect to carry the address space. RetAI should have this info now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, I forgot to delete this. RE: unsigned vs LangAS, I used the former for symmetry with other interfaces. I agree with @rjmccall that LangAS would be safer / would make more sense, however getIndirectAliased already uses numeric vs typed.

return std::nullopt;
}

/// \returns If a target requires an address within a target specific address
/// space \p AddressSpace to be converted in order to be used, then return the
/// corresponding target specific DWARF address space.
Expand Down
35 changes: 23 additions & 12 deletions clang/lib/CodeGen/CGCall.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1672,10 +1672,11 @@ CodeGenTypes::GetFunctionType(const CGFunctionInfo &FI) {

// Add type for sret argument.
if (IRFunctionArgs.hasSRetArg()) {
QualType Ret = FI.getReturnType();
unsigned AddressSpace = CGM.getTypes().getTargetAddressSpace(Ret);
auto AddressSpace = CGM.getTarget().getIndirectArgAddressSpace();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this to come from the ABIArgInfo/retAI, for the specific value and not a new side hook. Actually, is the address space already correct in retAI.getIndirectAddrSpace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly no, that's not usable here as is, that's only for byref args (IndirectAliased). I do wonder if we should extend Indirect to also carry an AS, maybe that's the natural solution here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I was thinking, yeah. There should be plenty of space for that without inflating ABIInfo, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken a swipe at doing this; it's a bit noisier than I'd have hoped but it's mostly NFC except for AMDGPU, with other targets retaining current behaviour and having the option to adjust in the future. It does annoyingly add another relatively blind default use of AS0, but that was already there. Perhaps we can flip to defaulting to the AllocaAS with targets having the option to override the indirect AS when they classify a return type.

if (!AddressSpace)
AddressSpace = getDataLayout().getAllocaAddrSpace();
ArgTypes[IRFunctionArgs.getSRetArgNo()] =
llvm::PointerType::get(getLLVMContext(), AddressSpace);
llvm::PointerType::get(getLLVMContext(), *AddressSpace);
}

// Add type for inalloca argument.
Expand Down Expand Up @@ -5145,7 +5146,6 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
// If the call returns a temporary with struct return, create a temporary
// alloca to hold the result, unless one is given to us.
Address SRetPtr = Address::invalid();
RawAddress SRetAlloca = RawAddress::invalid();
llvm::Value *UnusedReturnSizePtr = nullptr;
if (RetAI.isIndirect() || RetAI.isInAlloca() || RetAI.isCoerceAndExpand()) {
// For virtual function pointer thunks and musttail calls, we must always
Expand All @@ -5159,16 +5159,19 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
} else if (!ReturnValue.isNull()) {
SRetPtr = ReturnValue.getAddress();
} else {
SRetPtr = CreateMemTemp(RetTy, "tmp", &SRetAlloca);
SRetPtr = CreateMemTempWithoutCast(RetTy, "tmp");
if (HaveInsertPoint() && ReturnValue.isUnused()) {
llvm::TypeSize size =
CGM.getDataLayout().getTypeAllocSize(ConvertTypeForMem(RetTy));
UnusedReturnSizePtr = EmitLifetimeStart(size, SRetAlloca.getPointer());
UnusedReturnSizePtr = EmitLifetimeStart(size, SRetPtr.getBasePointer());
}
}
if (IRFunctionArgs.hasSRetArg()) {
// If the caller allocated the return slot, it is possible that the
// alloca was AS casted to the default as, so we ensure the cast is
// stripped before binding to the sret arg, which is in the allocaAS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what? It seems really wrong to be blindly stripping pointer casts here. Can you explain what code pattern is leading to us not having a pointer in the right address space?

Copy link
Contributor Author

@AlexVlx AlexVlx Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really blind (albeit it might be somewhat tightly coupling sret with alloca), this is actually captured in current tests e.g. CodeGen/sret.c, please see: https://gcc.godbolt.org/z/TWd83dbdE. This currently works because sret gets arbitrarily placed in the default AS, switching it over to anything but will break it. This happens when we receive a pre-allocaed return value slot, which gets created in AggExprEmitter::withReturnValueSlot iff we cannot elide the temporary. This uses CreateMemTemp which inserts a cast to the default AS. An alternative would be to instead use CreateMemTempWithoutCast and to also handle the case where the slot has been pre-allocated.

IRCallArgs[IRFunctionArgs.getSRetArgNo()] =
getAsNaturalPointerTo(SRetPtr, RetTy);
getAsNaturalPointerTo(SRetPtr, RetTy)->stripPointerCasts();
} else if (RetAI.isInAlloca()) {
Address Addr =
Builder.CreateStructGEP(ArgMemory, RetAI.getInAllocaFieldIndex());
Expand Down Expand Up @@ -5390,11 +5393,19 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
V->getType()->isIntegerTy())
V = Builder.CreateZExt(V, ArgInfo.getCoerceToType());

// If the argument doesn't match, perform a bitcast to coerce it. This
// can happen due to trivial type mismatches.
// If the argument doesn't match, we are either trying to pass an
// alloca-ed sret argument directly, and the alloca AS does not match
// the default AS, case in which we AS cast it, or we have a trivial
// type mismatch, and thus perform a bitcast to coerce it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inserting the cast might not be correct. Might need to create another temporary with the other address space, and memcpy.

Is this only the inalloca case? That's the weird windows only thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is not the inalloca case, it's the case when you have e.g. a C++ move ctor (Foo(Foo&&)) which in IR expands into a function taking two pointers to the default AS (this and a pointer to the moved from arg). If you're moving into the sret arg, you try to bind this to this, and you end up here. See the no-elide-constructors test that's part of this PR.

Re: not inserting the cast, you're right it's probably not correct to insert it blindly. I think the only thing we can safely handle is if the mismatched arg is a pointer to the default AS, and should error out otherwise. The only mechanism we have for creating temporaries is allocaing them, and it's not even clear what it'd mean to create a temporary in some arbitrary AS. This is probably fine though because I think the only offenders here would be the C++ ctors (perhaps member functions in general, at worst), as their IR signature is derived from the default AS, as there's no fixed argument type to inform it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cast is probably unavoidable here. You need to support flat addressing for any of c++ to work anyway, so that's fine for the GPU case

if (FirstIRArg < IRFuncTy->getNumParams() &&
V->getType() != IRFuncTy->getParamType(FirstIRArg))
V = Builder.CreateBitCast(V, IRFuncTy->getParamType(FirstIRArg));
V->getType() != IRFuncTy->getParamType(FirstIRArg)) {
auto IRTy = IRFuncTy->getParamType(FirstIRArg);
auto MaybeSRetArg = dyn_cast_or_null<llvm::Argument>(V);
if (MaybeSRetArg && MaybeSRetArg->hasStructRetAttr())
V = Builder.CreateAddrSpaceCast(V, IRTy);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general rule, we try to use the target hook to perform address-space conversions. That target hook is expressed in terms of AST address spaces, which is one reason I think we need to thread a LangAS through. If we need to do the same for the other indirect cases, so be it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, but I'm not entirely sure that isn't simply excessive here - we already have the types, and the only mismatch for sret can be in the AS, I believe; reverting to LangAS from target ASes seems a bit roundabout. I think @arsenm had a related objection to this cast being unconditional, which I haven't handled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the prevailing existing practice in Clang CodeGen; you'll notice we do the same thing in CreateTempAlloca. We are trying to allow targets to completely own the lowering of address spaces to IR. The idea is that targets may want to distinguish address spaces in the frontend without distinguishing them in the backend, or they may decide that they need the address space conversion operation to be more complex than a simple IR addrspacecast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think that I've found a possibly acceptable middle ground (both for this and your other objection). Note that I am not rejecting the fact that we probably want LangASes threaded through the indirect interfaces that deal with ASes, but it gives me a bit of trepidation to do it as part of this PR. I'd prefer to wrap this up, as it blocks some other work, and then open a separate PR/discussion around re-doing the interfaces.

else
V = Builder.CreateBitCast(V, IRTy);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is a pointer bitcast, which isn't necessary anymore

}

if (ArgHasMaybeUndefAttr)
V = Builder.CreateFreeze(V);
Expand Down Expand Up @@ -5740,7 +5751,7 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
// pop this cleanup later on. Being eager about this is OK, since this
// temporary is 'invisible' outside of the callee.
if (UnusedReturnSizePtr)
pushFullExprCleanup<CallLifetimeEnd>(NormalEHLifetimeMarker, SRetAlloca,
pushFullExprCleanup<CallLifetimeEnd>(NormalEHLifetimeMarker, SRetPtr,
UnusedReturnSizePtr);

llvm::BasicBlock *InvokeDest = CannotThrow ? nullptr : getInvokeDest();
Expand Down
4 changes: 2 additions & 2 deletions clang/test/CodeGen/partial-reinitialization2.c
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,8 @@ void test5(void)
// CHECK-LABEL: test6
void test6(void)
{
// CHECK: [[LP:%[a-z0-9]+]] = getelementptr{{.*}}%struct.LLP2P2, ptr{{.*}}, i32 0, i32 0
// CHECK: call {{.*}}get456789(ptr {{.*}}[[LP]])
// CHECK: [[VAR:%[a-z0-9]+]] = alloca
// CHECK: call {{.*}}get456789(ptr {{.*}}sret{{.*}} [[VAR]])

// CHECK: [[CALL:%[a-z0-9]+]] = call {{.*}}@get235()
// CHECK: store{{.*}}[[CALL]], {{.*}}[[TMP0:%[a-z0-9.]+]]
Expand Down
11 changes: 11 additions & 0 deletions clang/test/CodeGen/sret.c
Original file line number Diff line number Diff line change
@@ -1,23 +1,34 @@
// RUN: %clang_cc1 %s -Wno-strict-prototypes -emit-llvm -o - | FileCheck %s
// RUN: %clang_cc1 %s -Wno-strict-prototypes -triple amdgcn-amd-amdhsa -emit-llvm -o - | FileCheck --check-prefix=NONZEROALLOCAAS %s

struct abc {
long a;
long b;
long c;
long d;
long e;
long f;
long g;
long h;
long i;
long j;
};

struct abc foo1(void);
// CHECK-DAG: declare {{.*}} @foo1(ptr dead_on_unwind writable sret(%struct.abc)
// NONZEROALLOCAAS-DAG: declare {{.*}} @foo1(ptr addrspace(5) dead_on_unwind writable sret(%struct.abc)
struct abc foo2();
// CHECK-DAG: declare {{.*}} @foo2(ptr dead_on_unwind writable sret(%struct.abc)
// NONZEROALLOCAAS-DAG: declare {{.*}} @foo2(ptr addrspace(5) dead_on_unwind writable sret(%struct.abc)
struct abc foo3(void){}
// CHECK-DAG: define {{.*}} @foo3(ptr dead_on_unwind noalias writable sret(%struct.abc)
// NONZEROALLOCAAS-DAG: define {{.*}} @foo3(ptr addrspace(5) dead_on_unwind noalias writable sret(%struct.abc)

void bar(void) {
struct abc dummy1 = foo1();
// CHECK-DAG: call {{.*}} @foo1(ptr dead_on_unwind writable sret(%struct.abc)
// NONZEROALLOCAAS-DAG: call {{.*}} @foo1(ptr addrspace(5) dead_on_unwind writable sret(%struct.abc)
struct abc dummy2 = foo2();
// CHECK-DAG: call {{.*}} @foo2(ptr dead_on_unwind writable sret(%struct.abc)
// NONZEROALLOCAAS-DAG: call {{.*}} @foo2(ptr addrspace(5) dead_on_unwind writable sret(%struct.abc)
}
6 changes: 6 additions & 0 deletions clang/test/CodeGenCXX/no-elide-constructors.cpp
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
// RUN: %clang_cc1 -std=c++98 -triple i386-unknown-unknown -fno-elide-constructors -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-CXX98
// RUN: %clang_cc1 -std=c++11 -triple i386-unknown-unknown -fno-elide-constructors -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-CXX11
// RUN: %clang_cc1 -std=c++11 -triple amdgcn-amd-amdhsa -fno-elide-constructors -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK --check-prefix=CHECK-CXX11-NONZEROALLOCAAS
// RUN: %clang_cc1 -std=c++98 -triple i386-unknown-unknown -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-CXX98-ELIDE
// RUN: %clang_cc1 -std=c++11 -triple i386-unknown-unknown -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-CXX11-ELIDE
// RUN: %clang_cc1 -std=c++11 -triple amdgcn-amd-amdhsa -emit-llvm -o - %s | FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-CXX11-NONZEROALLOCAAS-ELIDE

// Reduced from PR12208
class X {
Expand All @@ -15,6 +17,7 @@ class X {
};

// CHECK-LABEL: define{{.*}} void @_Z4Testv(
// CHECK-SAME: ptr {{.*}}dead_on_unwind noalias writable sret([[CLASS_X:%.*]]) align 1 [[AGG_RESULT:%.*]])
X Test()
{
X x;
Expand All @@ -23,8 +26,11 @@ X Test()
// sret argument.
// CHECK-CXX98: call void @_ZN1XC1ERKS_(
// CHECK-CXX11: call void @_ZN1XC1EOS_(
// CHECK-CXX11-NONZEROALLOCAAS: [[TMP0:%.*]] = addrspacecast ptr addrspace(5) [[AGG_RESULT]] to ptr
// CHECK-CXX11-NONZEROALLOCAAS-NEXT: call void @_ZN1XC1EOS_(ptr noundef nonnull align 1 dereferenceable(1) [[TMP0]]
// CHECK-CXX98-ELIDE-NOT: call void @_ZN1XC1ERKS_(
// CHECK-CXX11-ELIDE-NOT: call void @_ZN1XC1EOS_(
// CHECK-CXX11-NONZEROALLOCAAS-ELIDE-NOT: call void @_ZN1XC1EOS_(

// Make sure that the destructor for X is called.
// FIXME: This call is present even in the -ELIDE runs, but is guarded by a
Expand Down
4 changes: 2 additions & 2 deletions clang/test/CodeGenOpenCL/addr-space-struct-arg.cl
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@ kernel void ker(global Mat3X3 *in, global Mat4X4 *out) {
// AMDGCN-NEXT: ret void
//
// AMDGCN20-LABEL: define dso_local void @foo_large(
// AMDGCN20-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_MAT64X64:%.*]]) align 4 [[AGG_RESULT:%.*]], ptr addrspace(5) noundef byref([[STRUCT_MAT32X32:%.*]]) align 4 [[TMP0:%.*]]) #[[ATTR0]] {
// AMDGCN20-SAME: ptr addrspace(5) dead_on_unwind noalias writable sret([[STRUCT_MAT64X64:%.*]]) align 4 [[AGG_RESULT:%.*]], ptr addrspace(5) noundef byref([[STRUCT_MAT32X32:%.*]]) align 4 [[TMP0:%.*]]) #[[ATTR0]] {
// AMDGCN20-NEXT: [[ENTRY:.*:]]
// AMDGCN20-NEXT: [[COERCE:%.*]] = alloca [[STRUCT_MAT32X32]], align 4, addrspace(5)
// AMDGCN20-NEXT: [[IN:%.*]] = addrspacecast ptr addrspace(5) [[COERCE]] to ptr
Expand Down Expand Up @@ -335,7 +335,7 @@ Mat64X64 __attribute__((noinline)) foo_large(Mat32X32 in) {
// AMDGCN20-NEXT: [[TMP1:%.*]] = load ptr addrspace(1), ptr [[IN_ADDR_ASCAST]], align 8
// AMDGCN20-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds [[STRUCT_MAT32X32]], ptr addrspace(1) [[TMP1]], i64 1
// AMDGCN20-NEXT: call void @llvm.memcpy.p5.p1.i64(ptr addrspace(5) align 4 [[BYVAL_TEMP]], ptr addrspace(1) align 4 [[ARRAYIDX1]], i64 4096, i1 false)
// AMDGCN20-NEXT: call void @foo_large(ptr dead_on_unwind writable sret([[STRUCT_MAT64X64]]) align 4 [[TMP_ASCAST]], ptr addrspace(5) noundef byref([[STRUCT_MAT32X32]]) align 4 [[BYVAL_TEMP]]) #[[ATTR3]]
// AMDGCN20-NEXT: call void @foo_large(ptr addrspace(5) dead_on_unwind writable sret([[STRUCT_MAT64X64]]) align 4 [[TMP]], ptr addrspace(5) noundef byref([[STRUCT_MAT32X32]]) align 4 [[BYVAL_TEMP]]) #[[ATTR3]]
// AMDGCN20-NEXT: call void @llvm.memcpy.p1.p0.i64(ptr addrspace(1) align 4 [[ARRAYIDX]], ptr align 4 [[TMP_ASCAST]], i64 16384, i1 false)
// AMDGCN20-NEXT: ret void
//
Expand Down
4 changes: 2 additions & 2 deletions clang/test/CodeGenOpenCL/amdgpu-abi-struct-arg-byref.cl
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ kernel void ker(global Mat3X3 *in, global Mat4X4 *out) {
}

// AMDGCN-LABEL: define dso_local void @foo_large(
// AMDGCN-SAME: ptr dead_on_unwind noalias writable sret([[STRUCT_MAT64X64:%.*]]) align 4 [[AGG_RESULT:%.*]], ptr addrspace(5) noundef byref([[STRUCT_MAT32X32:%.*]]) align 4 [[TMP0:%.*]]) #[[ATTR0]] {
// AMDGCN-SAME: ptr addrspace(5) dead_on_unwind noalias writable sret([[STRUCT_MAT64X64:%.*]]) align 4 [[AGG_RESULT:%.*]], ptr addrspace(5) noundef byref([[STRUCT_MAT32X32:%.*]]) align 4 [[TMP0:%.*]]) #[[ATTR0]] {
// AMDGCN-NEXT: [[ENTRY:.*:]]
// AMDGCN-NEXT: [[COERCE:%.*]] = alloca [[STRUCT_MAT32X32]], align 4, addrspace(5)
// AMDGCN-NEXT: [[IN:%.*]] = addrspacecast ptr addrspace(5) [[COERCE]] to ptr
Expand Down Expand Up @@ -120,7 +120,7 @@ Mat64X64 __attribute__((noinline)) foo_large(Mat32X32 in) {
// AMDGCN-NEXT: [[TMP1:%.*]] = load ptr addrspace(1), ptr [[IN_ADDR_ASCAST]], align 8
// AMDGCN-NEXT: [[ARRAYIDX1:%.*]] = getelementptr inbounds [[STRUCT_MAT32X32]], ptr addrspace(1) [[TMP1]], i64 1
// AMDGCN-NEXT: call void @llvm.memcpy.p5.p1.i64(ptr addrspace(5) align 4 [[BYVAL_TEMP]], ptr addrspace(1) align 4 [[ARRAYIDX1]], i64 4096, i1 false)
// AMDGCN-NEXT: call void @foo_large(ptr dead_on_unwind writable sret([[STRUCT_MAT64X64]]) align 4 [[TMP_ASCAST]], ptr addrspace(5) noundef byref([[STRUCT_MAT32X32]]) align 4 [[BYVAL_TEMP]]) #[[ATTR3]]
// AMDGCN-NEXT: call void @foo_large(ptr addrspace(5) dead_on_unwind writable sret([[STRUCT_MAT64X64]]) align 4 [[TMP]], ptr addrspace(5) noundef byref([[STRUCT_MAT32X32]]) align 4 [[BYVAL_TEMP]]) #[[ATTR3]]
// AMDGCN-NEXT: call void @llvm.memcpy.p1.p0.i64(ptr addrspace(1) align 4 [[ARRAYIDX]], ptr align 4 [[TMP_ASCAST]], i64 16384, i1 false)
// AMDGCN-NEXT: ret void
//
Expand Down
Loading