-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[clang][CodeGen] sret
args should always point to the alloca
AS, so use that
#114062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-clang Author: Alex Voicu (AlexVlx) Changes
Full diff: https://github.com/llvm/llvm-project/pull/114062.diff 3 Files Affected:
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 8f4f5d3ed81601..56acfae7ae9e51 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -1672,8 +1672,7 @@ CodeGenTypes::GetFunctionType(const CGFunctionInfo &FI) {
// Add type for sret argument.
if (IRFunctionArgs.hasSRetArg()) {
- QualType Ret = FI.getReturnType();
- unsigned AddressSpace = CGM.getTypes().getTargetAddressSpace(Ret);
+ unsigned AddressSpace = CGM.getDataLayout().getAllocaAddrSpace();
ArgTypes[IRFunctionArgs.getSRetArgNo()] =
llvm::PointerType::get(getLLVMContext(), AddressSpace);
}
@@ -5145,7 +5144,6 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
// If the call returns a temporary with struct return, create a temporary
// alloca to hold the result, unless one is given to us.
Address SRetPtr = Address::invalid();
- RawAddress SRetAlloca = RawAddress::invalid();
llvm::Value *UnusedReturnSizePtr = nullptr;
if (RetAI.isIndirect() || RetAI.isInAlloca() || RetAI.isCoerceAndExpand()) {
// For virtual function pointer thunks and musttail calls, we must always
@@ -5159,16 +5157,19 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
} else if (!ReturnValue.isNull()) {
SRetPtr = ReturnValue.getAddress();
} else {
- SRetPtr = CreateMemTemp(RetTy, "tmp", &SRetAlloca);
+ SRetPtr = CreateMemTempWithoutCast(RetTy, "tmp");
if (HaveInsertPoint() && ReturnValue.isUnused()) {
llvm::TypeSize size =
CGM.getDataLayout().getTypeAllocSize(ConvertTypeForMem(RetTy));
- UnusedReturnSizePtr = EmitLifetimeStart(size, SRetAlloca.getPointer());
+ UnusedReturnSizePtr = EmitLifetimeStart(size, SRetPtr.getBasePointer());
}
}
if (IRFunctionArgs.hasSRetArg()) {
+ // If the caller allocated the return slot, it is possible that the
+ // alloca was AS casted to the default as, so we ensure the cast is
+ // stripped before binding to the sret arg, which is in the allocaAS.
IRCallArgs[IRFunctionArgs.getSRetArgNo()] =
- getAsNaturalPointerTo(SRetPtr, RetTy);
+ getAsNaturalPointerTo(SRetPtr, RetTy)->stripPointerCasts();
} else if (RetAI.isInAlloca()) {
Address Addr =
Builder.CreateStructGEP(ArgMemory, RetAI.getInAllocaFieldIndex());
@@ -5740,7 +5741,7 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
// pop this cleanup later on. Being eager about this is OK, since this
// temporary is 'invisible' outside of the callee.
if (UnusedReturnSizePtr)
- pushFullExprCleanup<CallLifetimeEnd>(NormalEHLifetimeMarker, SRetAlloca,
+ pushFullExprCleanup<CallLifetimeEnd>(NormalEHLifetimeMarker, SRetPtr,
UnusedReturnSizePtr);
llvm::BasicBlock *InvokeDest = CannotThrow ? nullptr : getInvokeDest();
diff --git a/clang/test/CodeGen/partial-reinitialization2.c b/clang/test/CodeGen/partial-reinitialization2.c
index e709c1d4ad1ee1..7949a69555031e 100644
--- a/clang/test/CodeGen/partial-reinitialization2.c
+++ b/clang/test/CodeGen/partial-reinitialization2.c
@@ -91,8 +91,8 @@ void test5(void)
// CHECK-LABEL: test6
void test6(void)
{
- // CHECK: [[LP:%[a-z0-9]+]] = getelementptr{{.*}}%struct.LLP2P2, ptr{{.*}}, i32 0, i32 0
- // CHECK: call {{.*}}get456789(ptr {{.*}}[[LP]])
+ // CHECK: [[VAR:%[a-z0-9]+]] = alloca
+ // CHECK: call {{.*}}get456789(ptr {{.*}}sret{{.*}} [[VAR]])
// CHECK: [[CALL:%[a-z0-9]+]] = call {{.*}}@get235()
// CHECK: store{{.*}}[[CALL]], {{.*}}[[TMP0:%[a-z0-9.]+]]
diff --git a/clang/test/CodeGen/sret.c b/clang/test/CodeGen/sret.c
index 6d905e89b2c6fd..3b4914f29d2bfe 100644
--- a/clang/test/CodeGen/sret.c
+++ b/clang/test/CodeGen/sret.c
@@ -1,4 +1,5 @@
// RUN: %clang_cc1 %s -Wno-strict-prototypes -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 %s -Wno-strict-prototypes -triple amdgcn-amd-amdhsa -emit-llvm -o - | FileCheck --check-prefix=NONZEROALLOCAAS %s
struct abc {
long a;
@@ -6,18 +7,28 @@ struct abc {
long c;
long d;
long e;
+ long f;
+ long g;
+ long h;
+ long i;
+ long j;
};
struct abc foo1(void);
// CHECK-DAG: declare {{.*}} @foo1(ptr dead_on_unwind writable sret(%struct.abc)
+// NONZEROALLOCAAS-DAG: declare {{.*}} @foo1(ptr addrspace(5) dead_on_unwind writable sret(%struct.abc)
struct abc foo2();
// CHECK-DAG: declare {{.*}} @foo2(ptr dead_on_unwind writable sret(%struct.abc)
+// NONZEROALLOCAAS-DAG: declare {{.*}} @foo2(ptr addrspace(5) dead_on_unwind writable sret(%struct.abc)
struct abc foo3(void){}
// CHECK-DAG: define {{.*}} @foo3(ptr dead_on_unwind noalias writable sret(%struct.abc)
+// NONZEROALLOCAAS-DAG: define {{.*}} @foo3(ptr addrspace(5) dead_on_unwind noalias writable sret(%struct.abc)
void bar(void) {
struct abc dummy1 = foo1();
// CHECK-DAG: call {{.*}} @foo1(ptr dead_on_unwind writable sret(%struct.abc)
+ // NONZEROALLOCAAS-DAG: call {{.*}} @foo1(ptr addrspace(5) dead_on_unwind writable sret(%struct.abc)
struct abc dummy2 = foo2();
// CHECK-DAG: call {{.*}} @foo2(ptr dead_on_unwind writable sret(%struct.abc)
+ // NONZEROALLOCAAS-DAG: call {{.*}} @foo2(ptr addrspace(5) dead_on_unwind writable sret(%struct.abc)
}
|
Is this a target-independent decision? I could certainly imagine a target with a generic AS wanting to specify that indirect return addresses (and maybe even parameters?) should be in that rather than the alloca AS; among other things, it would allow return values to be used to initialize objects in arbitrary memory. In C and C-derived languages, maybe that just avoids a memcpy, but in C++ it avoids a potentially non-trivial move. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a target-independent decision? I could certainly imagine a target with a generic AS wanting to specify that indirect return addresses (and maybe even parameters?) should be in that rather than the alloca AS; among other things, it would allow return values to be used to initialize objects in arbitrary memory. In C and C-derived languages, maybe that just avoids a memcpy, but in C++ it avoids a potentially non-trivial move.
This is a good point that I had not fully considered; I'll (weakly) push back by pointing out that if a target wanted to do that, it'd have to change quite a few things because now we seem to pretty much alloca
storage for sret
args by default, and thus we just end up with some extra spurious AS casts / reliance on AS inference doing the right thing. However, your question made me go and look at C++ use cases, which unearthed a pretty big (general) challenge with this change: for cases where the alloca AS and the default AS differ, we can end up trying to pass an sret
arg directly to a callee that takes a pointer to the default AS. I've added a test that exposes this and a potential fix, but I'm not hyper keen on the latter - unfortunately, I see no better way to deal with this.
…ects a pointer to the default AS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that in theory a target could want to do something else, but that would be an ABI lowering decision. It doesn't naturally come from a source level type. Supporting such a target would require more work, but given the current state of the world just using the alloca address space in this context is the most consistent and simplest decision.
clang/lib/CodeGen/CGCall.cpp
Outdated
else | ||
V = Builder.CreateBitCast(V, IRTy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this is a pointer bitcast, which isn't necessary anymore
clang/lib/CodeGen/CGCall.cpp
Outdated
// If the argument doesn't match, we are either trying to pass an | ||
// alloca-ed sret argument directly, and the alloca AS does not match | ||
// the default AS, case in which we AS cast it, or we have a trivial | ||
// type mismatch, and thus perform a bitcast to coerce it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inserting the cast might not be correct. Might need to create another temporary with the other address space, and memcpy.
Is this only the inalloca case? That's the weird windows only thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this is not the inalloca
case, it's the case when you have e.g. a C++ move ctor (Foo(Foo&&)
) which in IR expands into a function taking two pointers to the default AS (this
and a pointer to the moved from arg). If you're moving into the sret
arg, you try to bind this to this
, and you end up here. See the no-elide-constructors
test that's part of this PR.
Re: not inserting the cast, you're right it's probably not correct to insert it blindly. I think the only thing we can safely handle is if the mismatched arg is a pointer to the default AS, and should error out otherwise. The only mechanism we have for creating temporaries is alloca
ing them, and it's not even clear what it'd mean to create a temporary in some arbitrary AS. This is probably fine though because I think the only offenders here would be the C++ ctors (perhaps member functions in general, at worst), as their IR signature is derived from the default AS, as there's no fixed argument type to inform it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cast is probably unavoidable here. You need to support flat addressing for any of c++ to work anyway, so that's fine for the GPU case
@@ -23,8 +25,10 @@ X Test() | |||
// sret argument. | |||
// CHECK-CXX98: call void @_ZN1XC1ERKS_( | |||
// CHECK-CXX11: call void @_ZN1XC1EOS_( | |||
// CHECK-CXX11-NONZEROALLOCAAS: call void @_ZN1XC1EOS_( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more context checks here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made an attempt to.
clang/lib/CodeGen/CGCall.cpp
Outdated
// If the argument doesn't match, we are either trying to pass an | ||
// alloca-ed sret argument directly, and the alloca AS does not match | ||
// the default AS, case in which we AS cast it, or we have a trivial | ||
// type mismatch, and thus perform a bitcast to coerce it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cast is probably unavoidable here. You need to support flat addressing for any of c++ to work anyway, so that's fine for the GPU case
I agree that it doesn't meaningfully come from a source-level type and should be specified by the target lowering. I just want to make sure we write the new code in a way that plausibly supports the target ABI specifying something other than "it's always in the alloca AS". Can we put the required AS in the |
clang/lib/CodeGen/CGCall.cpp
Outdated
@@ -1672,10 +1672,11 @@ CodeGenTypes::GetFunctionType(const CGFunctionInfo &FI) { | |||
|
|||
// Add type for sret argument. | |||
if (IRFunctionArgs.hasSRetArg()) { | |||
QualType Ret = FI.getReturnType(); | |||
unsigned AddressSpace = CGM.getTypes().getTargetAddressSpace(Ret); | |||
auto AddressSpace = CGM.getTarget().getIndirectArgAddressSpace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect this to come from the ABIArgInfo/retAI, for the specific value and not a new side hook. Actually, is the address space already correct in retAI.getIndirectAddrSpace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly no, that's not usable here as is, that's only for byref
args (IndirectAliased). I do wonder if we should extend Indirect to also carry an AS, maybe that's the natural solution here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I was thinking, yeah. There should be plenty of space for that without inflating ABIInfo
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken a swipe at doing this; it's a bit noisier than I'd have hoped but it's mostly NFC except for AMDGPU, with other targets retaining current behaviour and having the option to adjust in the future. It does annoyingly add another relatively blind default use of AS0, but that was already there. Perhaps we can flip to defaulting to the AllocaAS with targets having the option to override the indirect AS when they classify a return type.
✅ With the latest revision this PR passed the C/C++ code formatter. |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/7/builds/10932 Here is the relevant piece of the build log for the reference
|
@AlexVlx I found this breaks my clang build, reduced the C++ source to the following reproducer https://godbolt.org/z/jGnvKeqvr. Please verify that it breaks for you as well and revert or fix. (The issue is the s() function call not using AS(5)) #pragma omp begin declare target
struct S {
~S() { };
};
S s();
struct E {
S foo;
E();
};
E::E() : foo(s()) {}
#pragma omp end declare target > ./bin/clang++ omp-bug.cpp -fopenmp --offload-arch=gfx1030 -nogpulib
clang-21: /home/jhuber/Documents/llvm/llvm-project/clang/lib/CodeGen/CGCall.cpp:5648: clang::CodeGen::RValue clang::CodeGen::CodeGenFunction::EmitCall(const clang::CodeGen::CGFunctionInfo&, const clang::CodeGen::CGCallee&, clang::CodeGen::ReturnValueSlot, const clang::CodeGen::CallArgList&, llvm::CallBase**, bool, clang::SourceLocation, bool): Assertion `IRCallArgs[i]->getType() == IRFuncTy->getParamType(i)' failed. |
…so use that (llvm#114062) `sret` arguments are always going to reside in the stack/`alloca` address space, which makes the current formulation where their AS is derived from the pointee somewhat quaint. This patch ensures that `sret` ends up pointing to the `alloca` AS in IR function signatures, and also guards agains trying to pass a casted `alloca`d pointer to a `sret` arg, which can happen for most languages, when compiled for targets that have a non-zero `alloca` AS (e.g. AMDGCN) / map `LangAS::default` to a non-zero value (SPIR-V). A target could still choose to do something different here, by e.g. overriding `classifyReturnType` behaviour. In a broader sense, this patch extends non-aliased indirect args to also carry an AS, which leads to changing the `getIndirect()` interface. At the moment we're only using this for (indirect) returns, but it allows for future handling of indirect args themselves. We default to using the AllocaAS as that matches what Clang is currently doing, however if, in the future, a target would opt for e.g. placing indirect returns in some other storage, with another AS, this will require revisiting. --------- Co-authored-by: Matt Arsenault <[email protected]> Co-authored-by: Matt Arsenault <[email protected]>
seeing breaks in downstream build of rocPRIM |
Probably need to revert the downstream revert of the original problematic patch |
Thank you for flagging this. Please see #127528. |
This removes a vestigial assertion, which would erroneously trigger even though we now correctly handle valid arg mismatches (<https://github.com/llvm/llvm-project/blob/2dda529838e622e7a79b1e26d2899f319fd7e379/clang/lib/CodeGen/CGCall.cpp#L5397>), after #114062 went in.
/cherry-pick 39ec9de |
/pull-request #127552 |
It's not quite that, we have a problem with the following pattern: https://gcc.godbolt.org/z/4bheaqYev. |
@AlexVlx The reproducer I posted above still crashes, just in a different place.
|
…so use that (llvm#114062) `sret` arguments are always going to reside in the stack/`alloca` address space, which makes the current formulation where their AS is derived from the pointee somewhat quaint. This patch ensures that `sret` ends up pointing to the `alloca` AS in IR function signatures, and also guards agains trying to pass a casted `alloca`d pointer to a `sret` arg, which can happen for most languages, when compiled for targets that have a non-zero `alloca` AS (e.g. AMDGCN) / map `LangAS::default` to a non-zero value (SPIR-V). A target could still choose to do something different here, by e.g. overriding `classifyReturnType` behaviour. In a broader sense, this patch extends non-aliased indirect args to also carry an AS, which leads to changing the `getIndirect()` interface. At the moment we're only using this for (indirect) returns, but it allows for future handling of indirect args themselves. We default to using the AllocaAS as that matches what Clang is currently doing, however if, in the future, a target would opt for e.g. placing indirect returns in some other storage, with another AS, this will require revisiting. --------- Co-authored-by: Matt Arsenault <[email protected]> Co-authored-by: Matt Arsenault <[email protected]>
This addresses two issues introduced by moving indirect args into an explicit AS (please see <#114062 (comment)> and <#114062 (comment)>): 1. Unconditionally stripping casts from a pre-allocated return slot was incorrect / insufficient (this is illustrated by the `amdgcn_sret_ctor.cpp` test); 2. Putting compiler manufactured sret args in a non default AS can lead to a C-cast (surprisingly) requiring an AS cast (this is illustrated by the `sret_cast_with_nonzero_alloca_as.cpp test). The way we handle (2), by subverting CK_BitCast emission iff a sret arg is involved, is quite naff, but I couldn't think of any other way to use a non default indirect AS and make this case work (hopefully this is a failure of imagination on my part).
This addresses two issues introduced by moving indirect args into an explicit AS (please see <llvm/llvm-project#114062 (comment)> and <llvm/llvm-project#114062 (comment)>): 1. Unconditionally stripping casts from a pre-allocated return slot was incorrect / insufficient (this is illustrated by the `amdgcn_sret_ctor.cpp` test); 2. Putting compiler manufactured sret args in a non default AS can lead to a C-cast (surprisingly) requiring an AS cast (this is illustrated by the `sret_cast_with_nonzero_alloca_as.cpp test). The way we handle (2), by subverting CK_BitCast emission iff a sret arg is involved, is quite naff, but I couldn't think of any other way to use a non default indirect AS and make this case work (hopefully this is a failure of imagination on my part).
…so use that (llvm#114062) `sret` arguments are always going to reside in the stack/`alloca` address space, which makes the current formulation where their AS is derived from the pointee somewhat quaint. This patch ensures that `sret` ends up pointing to the `alloca` AS in IR function signatures, and also guards agains trying to pass a casted `alloca`d pointer to a `sret` arg, which can happen for most languages, when compiled for targets that have a non-zero `alloca` AS (e.g. AMDGCN) / map `LangAS::default` to a non-zero value (SPIR-V). A target could still choose to do something different here, by e.g. overriding `classifyReturnType` behaviour. In a broader sense, this patch extends non-aliased indirect args to also carry an AS, which leads to changing the `getIndirect()` interface. At the moment we're only using this for (indirect) returns, but it allows for future handling of indirect args themselves. We default to using the AllocaAS as that matches what Clang is currently doing, however if, in the future, a target would opt for e.g. placing indirect returns in some other storage, with another AS, this will require revisiting. --------- Co-authored-by: Matt Arsenault <[email protected]> Co-authored-by: Matt Arsenault <[email protected]>
This addresses two issues introduced by moving indirect args into an explicit AS (please see <llvm#114062 (comment)> and <llvm#114062 (comment)>): 1. Unconditionally stripping casts from a pre-allocated return slot was incorrect / insufficient (this is illustrated by the `amdgcn_sret_ctor.cpp` test); 2. Putting compiler manufactured sret args in a non default AS can lead to a C-cast (surprisingly) requiring an AS cast (this is illustrated by the `sret_cast_with_nonzero_alloca_as.cpp test). The way we handle (2), by subverting CK_BitCast emission iff a sret arg is involved, is quite naff, but I couldn't think of any other way to use a non default indirect AS and make this case work (hopefully this is a failure of imagination on my part).
This addresses two issues introduced by moving indirect args into an explicit AS (please see <llvm#114062 (comment)> and <llvm#114062 (comment)>): 1. Unconditionally stripping casts from a pre-allocated return slot was incorrect / insufficient (this is illustrated by the `amdgcn_sret_ctor.cpp` test); 2. Putting compiler manufactured sret args in a non default AS can lead to a C-cast (surprisingly) requiring an AS cast (this is illustrated by the `sret_cast_with_nonzero_alloca_as.cpp test). The way we handle (2), by subverting CK_BitCast emission iff a sret arg is involved, is quite naff, but I couldn't think of any other way to use a non default indirect AS and make this case work (hopefully this is a failure of imagination on my part).
) Recent community change llvm/llvm-project#114062 enabled the use of alloca address space for sret arguments. This causes several issues for sycl, particularly for the SPIR target where this leads to invalid address-space-casts from the local address space. The new option -foffload-use-alloca-addrspace-for-srets is TRUE by default (and produces the current community behavior) and is set to FALSE in sycl device compilation modes (where the prior behavior is retained). The commit also reverts some test changes made to reflect the current community behavior. --------- Co-authored-by: Premanand M Rao <[email protected]>
sret
arguments are always going to reside in the stack/alloca
address space, which makes the current formulation where their AS is derived from the pointee somewhat quaint. This patch ensures thatsret
ends up pointing to thealloca
AS in IR function signatures, and also guards agains trying to pass a castedalloca
d pointer to asret
arg, which can happen for most languages, when compiled for targets that have a non-zeroalloca
AS (e.g. AMDGCN) / mapLangAS::default
to a non-zero value (SPIR-V). A target could still choose to do something different here, by e.g. overridingclassifyReturnType
behaviour.In a broader sense, this patch extends non-aliased indirect args to also carry an AS, which leads to changing the
getIndirect()
interface. At the moment we're only using this for (indirect) returns, but it allows for future handling of indirect args themselves. We default to using the AllocaAS as that matches what Clang is currently doing, however if, in the future, a target would opt for e.g. placing indirect returns in some other storage, with another AS, this will require revisiting.