Skip to content

[clang][CodeGen] sret args should always point to the alloca AS, so use that #114062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 55 commits into from
Feb 14, 2025

Conversation

AlexVlx
Copy link
Contributor

@AlexVlx AlexVlx commented Oct 29, 2024

sret arguments are always going to reside in the stack/alloca address space, which makes the current formulation where their AS is derived from the pointee somewhat quaint. This patch ensures that sret ends up pointing to the alloca AS in IR function signatures, and also guards agains trying to pass a casted allocad pointer to a sret arg, which can happen for most languages, when compiled for targets that have a non-zero alloca AS (e.g. AMDGCN) / map LangAS::default to a non-zero value (SPIR-V). A target could still choose to do something different here, by e.g. overriding classifyReturnType behaviour.

In a broader sense, this patch extends non-aliased indirect args to also carry an AS, which leads to changing the getIndirect() interface. At the moment we're only using this for (indirect) returns, but it allows for future handling of indirect args themselves. We default to using the AllocaAS as that matches what Clang is currently doing, however if, in the future, a target would opt for e.g. placing indirect returns in some other storage, with another AS, this will require revisiting.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Oct 29, 2024
@AlexVlx AlexVlx requested review from rjmccall and arsenm October 29, 2024 14:27
@llvmbot
Copy link
Member

llvmbot commented Oct 29, 2024

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Alex Voicu (AlexVlx)

Changes

sret arguments are always going to reside in the stack/alloca address space, which makes the current formulation where their AS is derived from the pointee somewhat quaint. This patch ensures that sret ends up pointing to the alloca AS in IR function signatures, and also guards agains trying to pass a casted allocad pointer to a sret arg, which can happen for most languages, when compiled for targets that have a non-zero alloca AS (e.g. AMDGCN) / map LangAS::default to a non-zero value (SPIR-V).


Full diff: https://github.com/llvm/llvm-project/pull/114062.diff

3 Files Affected:

  • (modified) clang/lib/CodeGen/CGCall.cpp (+8-7)
  • (modified) clang/test/CodeGen/partial-reinitialization2.c (+2-2)
  • (modified) clang/test/CodeGen/sret.c (+11)
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 8f4f5d3ed81601..56acfae7ae9e51 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -1672,8 +1672,7 @@ CodeGenTypes::GetFunctionType(const CGFunctionInfo &FI) {
 
   // Add type for sret argument.
   if (IRFunctionArgs.hasSRetArg()) {
-    QualType Ret = FI.getReturnType();
-    unsigned AddressSpace = CGM.getTypes().getTargetAddressSpace(Ret);
+    unsigned AddressSpace = CGM.getDataLayout().getAllocaAddrSpace();
     ArgTypes[IRFunctionArgs.getSRetArgNo()] =
         llvm::PointerType::get(getLLVMContext(), AddressSpace);
   }
@@ -5145,7 +5144,6 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
   // If the call returns a temporary with struct return, create a temporary
   // alloca to hold the result, unless one is given to us.
   Address SRetPtr = Address::invalid();
-  RawAddress SRetAlloca = RawAddress::invalid();
   llvm::Value *UnusedReturnSizePtr = nullptr;
   if (RetAI.isIndirect() || RetAI.isInAlloca() || RetAI.isCoerceAndExpand()) {
     // For virtual function pointer thunks and musttail calls, we must always
@@ -5159,16 +5157,19 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
     } else if (!ReturnValue.isNull()) {
       SRetPtr = ReturnValue.getAddress();
     } else {
-      SRetPtr = CreateMemTemp(RetTy, "tmp", &SRetAlloca);
+      SRetPtr = CreateMemTempWithoutCast(RetTy, "tmp");
       if (HaveInsertPoint() && ReturnValue.isUnused()) {
         llvm::TypeSize size =
             CGM.getDataLayout().getTypeAllocSize(ConvertTypeForMem(RetTy));
-        UnusedReturnSizePtr = EmitLifetimeStart(size, SRetAlloca.getPointer());
+        UnusedReturnSizePtr = EmitLifetimeStart(size, SRetPtr.getBasePointer());
       }
     }
     if (IRFunctionArgs.hasSRetArg()) {
+      // If the caller allocated the return slot, it is possible that the
+      // alloca was AS casted to the default as, so we ensure the cast is
+      // stripped before binding to the sret arg, which is in the allocaAS.
       IRCallArgs[IRFunctionArgs.getSRetArgNo()] =
-          getAsNaturalPointerTo(SRetPtr, RetTy);
+          getAsNaturalPointerTo(SRetPtr, RetTy)->stripPointerCasts();
     } else if (RetAI.isInAlloca()) {
       Address Addr =
           Builder.CreateStructGEP(ArgMemory, RetAI.getInAllocaFieldIndex());
@@ -5740,7 +5741,7 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
   // pop this cleanup later on. Being eager about this is OK, since this
   // temporary is 'invisible' outside of the callee.
   if (UnusedReturnSizePtr)
-    pushFullExprCleanup<CallLifetimeEnd>(NormalEHLifetimeMarker, SRetAlloca,
+    pushFullExprCleanup<CallLifetimeEnd>(NormalEHLifetimeMarker, SRetPtr,
                                          UnusedReturnSizePtr);
 
   llvm::BasicBlock *InvokeDest = CannotThrow ? nullptr : getInvokeDest();
diff --git a/clang/test/CodeGen/partial-reinitialization2.c b/clang/test/CodeGen/partial-reinitialization2.c
index e709c1d4ad1ee1..7949a69555031e 100644
--- a/clang/test/CodeGen/partial-reinitialization2.c
+++ b/clang/test/CodeGen/partial-reinitialization2.c
@@ -91,8 +91,8 @@ void test5(void)
 // CHECK-LABEL: test6
 void test6(void)
 {
-  // CHECK: [[LP:%[a-z0-9]+]] = getelementptr{{.*}}%struct.LLP2P2, ptr{{.*}}, i32 0, i32 0
-  // CHECK: call {{.*}}get456789(ptr {{.*}}[[LP]])
+  // CHECK: [[VAR:%[a-z0-9]+]] = alloca
+  // CHECK: call {{.*}}get456789(ptr {{.*}}sret{{.*}} [[VAR]])
 
   // CHECK: [[CALL:%[a-z0-9]+]] = call {{.*}}@get235()
   // CHECK: store{{.*}}[[CALL]], {{.*}}[[TMP0:%[a-z0-9.]+]]
diff --git a/clang/test/CodeGen/sret.c b/clang/test/CodeGen/sret.c
index 6d905e89b2c6fd..3b4914f29d2bfe 100644
--- a/clang/test/CodeGen/sret.c
+++ b/clang/test/CodeGen/sret.c
@@ -1,4 +1,5 @@
 // RUN: %clang_cc1 %s -Wno-strict-prototypes -emit-llvm -o - | FileCheck %s
+// RUN: %clang_cc1 %s -Wno-strict-prototypes -triple amdgcn-amd-amdhsa -emit-llvm -o - | FileCheck --check-prefix=NONZEROALLOCAAS %s
 
 struct abc {
  long a;
@@ -6,18 +7,28 @@ struct abc {
  long c;
  long d;
  long e;
+ long f;
+ long g;
+ long h;
+ long i;
+ long j;
 };
 
 struct abc foo1(void);
 // CHECK-DAG: declare {{.*}} @foo1(ptr dead_on_unwind writable sret(%struct.abc)
+// NONZEROALLOCAAS-DAG: declare {{.*}} @foo1(ptr addrspace(5) dead_on_unwind writable sret(%struct.abc)
 struct abc foo2();
 // CHECK-DAG: declare {{.*}} @foo2(ptr dead_on_unwind writable sret(%struct.abc)
+// NONZEROALLOCAAS-DAG: declare {{.*}} @foo2(ptr addrspace(5) dead_on_unwind writable sret(%struct.abc)
 struct abc foo3(void){}
 // CHECK-DAG: define {{.*}} @foo3(ptr dead_on_unwind noalias writable sret(%struct.abc)
+// NONZEROALLOCAAS-DAG: define {{.*}} @foo3(ptr addrspace(5) dead_on_unwind noalias writable sret(%struct.abc)
 
 void bar(void) {
   struct abc dummy1 = foo1();
   // CHECK-DAG: call {{.*}} @foo1(ptr dead_on_unwind writable sret(%struct.abc)
+  // NONZEROALLOCAAS-DAG: call {{.*}} @foo1(ptr addrspace(5) dead_on_unwind writable sret(%struct.abc)
   struct abc dummy2 = foo2();
   // CHECK-DAG: call {{.*}} @foo2(ptr dead_on_unwind writable sret(%struct.abc)
+  // NONZEROALLOCAAS-DAG: call {{.*}} @foo2(ptr addrspace(5) dead_on_unwind writable sret(%struct.abc)
 }

@AlexVlx AlexVlx requested a review from bogner October 29, 2024 14:27
@AlexVlx AlexVlx removed the clang Clang issues not falling into any other category label Oct 29, 2024
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Oct 29, 2024
@rjmccall
Copy link
Contributor

Is this a target-independent decision? I could certainly imagine a target with a generic AS wanting to specify that indirect return addresses (and maybe even parameters?) should be in that rather than the alloca AS; among other things, it would allow return values to be used to initialize objects in arbitrary memory. In C and C-derived languages, maybe that just avoids a memcpy, but in C++ it avoids a potentially non-trivial move.

Copy link
Contributor Author

@AlexVlx AlexVlx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a target-independent decision? I could certainly imagine a target with a generic AS wanting to specify that indirect return addresses (and maybe even parameters?) should be in that rather than the alloca AS; among other things, it would allow return values to be used to initialize objects in arbitrary memory. In C and C-derived languages, maybe that just avoids a memcpy, but in C++ it avoids a potentially non-trivial move.

This is a good point that I had not fully considered; I'll (weakly) push back by pointing out that if a target wanted to do that, it'd have to change quite a few things because now we seem to pretty much alloca storage for sret args by default, and thus we just end up with some extra spurious AS casts / reliance on AS inference doing the right thing. However, your question made me go and look at C++ use cases, which unearthed a pretty big (general) challenge with this change: for cases where the alloca AS and the default AS differ, we can end up trying to pass an sret arg directly to a callee that takes a pointer to the default AS. I've added a test that exposes this and a potential fix, but I'm not hyper keen on the latter - unfortunately, I see no better way to deal with this.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that in theory a target could want to do something else, but that would be an ABI lowering decision. It doesn't naturally come from a source level type. Supporting such a target would require more work, but given the current state of the world just using the alloca address space in this context is the most consistent and simplest decision.

Comment on lines 5404 to 5405
else
V = Builder.CreateBitCast(V, IRTy);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is a pointer bitcast, which isn't necessary anymore

Comment on lines 5394 to 5397
// If the argument doesn't match, we are either trying to pass an
// alloca-ed sret argument directly, and the alloca AS does not match
// the default AS, case in which we AS cast it, or we have a trivial
// type mismatch, and thus perform a bitcast to coerce it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inserting the cast might not be correct. Might need to create another temporary with the other address space, and memcpy.

Is this only the inalloca case? That's the weird windows only thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is not the inalloca case, it's the case when you have e.g. a C++ move ctor (Foo(Foo&&)) which in IR expands into a function taking two pointers to the default AS (this and a pointer to the moved from arg). If you're moving into the sret arg, you try to bind this to this, and you end up here. See the no-elide-constructors test that's part of this PR.

Re: not inserting the cast, you're right it's probably not correct to insert it blindly. I think the only thing we can safely handle is if the mismatched arg is a pointer to the default AS, and should error out otherwise. The only mechanism we have for creating temporaries is allocaing them, and it's not even clear what it'd mean to create a temporary in some arbitrary AS. This is probably fine though because I think the only offenders here would be the C++ ctors (perhaps member functions in general, at worst), as their IR signature is derived from the default AS, as there's no fixed argument type to inform it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cast is probably unavoidable here. You need to support flat addressing for any of c++ to work anyway, so that's fine for the GPU case

@@ -23,8 +25,10 @@ X Test()
// sret argument.
// CHECK-CXX98: call void @_ZN1XC1ERKS_(
// CHECK-CXX11: call void @_ZN1XC1EOS_(
// CHECK-CXX11-NONZEROALLOCAAS: call void @_ZN1XC1EOS_(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more context checks here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made an attempt to.

Comment on lines 5394 to 5397
// If the argument doesn't match, we are either trying to pass an
// alloca-ed sret argument directly, and the alloca AS does not match
// the default AS, case in which we AS cast it, or we have a trivial
// type mismatch, and thus perform a bitcast to coerce it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cast is probably unavoidable here. You need to support flat addressing for any of c++ to work anyway, so that's fine for the GPU case

@rjmccall
Copy link
Contributor

rjmccall commented Oct 30, 2024

I agree that it doesn't meaningfully come from a source-level type and should be specified by the target lowering. I just want to make sure we write the new code in a way that plausibly supports the target ABI specifying something other than "it's always in the alloca AS". Can we put the required AS in the ABIInfo for the result/parameter and then just make a best effort to perform AS conversions in all the places necessary? (It'd be a requirement that the alloca AS can be promoted into that AS; there's not really any other way it could be implemented.) And then, of course, if we miss a few then some target will have some bugs to fix, but we'll at least have tried.

@llvmbot llvmbot added the clang:frontend Language frontend issues, e.g. anything involving "Sema" label Nov 2, 2024
@@ -1672,10 +1672,11 @@ CodeGenTypes::GetFunctionType(const CGFunctionInfo &FI) {

// Add type for sret argument.
if (IRFunctionArgs.hasSRetArg()) {
QualType Ret = FI.getReturnType();
unsigned AddressSpace = CGM.getTypes().getTargetAddressSpace(Ret);
auto AddressSpace = CGM.getTarget().getIndirectArgAddressSpace();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this to come from the ABIArgInfo/retAI, for the specific value and not a new side hook. Actually, is the address space already correct in retAI.getIndirectAddrSpace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly no, that's not usable here as is, that's only for byref args (IndirectAliased). I do wonder if we should extend Indirect to also carry an AS, maybe that's the natural solution here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I was thinking, yeah. There should be plenty of space for that without inflating ABIInfo, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken a swipe at doing this; it's a bit noisier than I'd have hoped but it's mostly NFC except for AMDGPU, with other targets retaining current behaviour and having the option to adjust in the future. It does annoyingly add another relatively blind default use of AS0, but that was already there. Perhaps we can flip to defaulting to the AllocaAS with targets having the option to override the indirect AS when they classify a return type.

Copy link

github-actions bot commented Nov 6, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@AlexVlx AlexVlx merged commit 39ec9de into llvm:main Feb 14, 2025
8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Feb 14, 2025

LLVM Buildbot has detected a new failure on builder flang-runtime-cuda-clang running on as-builder-7 while building clang at step 10 "build-flang-runtime-FortranRuntime".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/7/builds/10932

Here is the relevant piece of the build log for the reference
Step 10 (build-flang-runtime-FortranRuntime) failure: cmake (failure)
...
      |                           ^~~~~~~~~~~~~~
1 warning generated.
In file included from /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-clang/llvm-project/flang/runtime/extrema.cpp:13:
In file included from /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-clang/llvm-project/flang/runtime/reduction-templates.h:24:
In file included from /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-clang/llvm-project/flang/runtime/numeric-templates.h:22:
In file included from /home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-clang/llvm-project/flang/runtime/tools.h:17:
/home/buildbot/worker/as-builder-7/ramdisk/flang-runtime-cuda-clang/llvm-project/flang/runtime/../include/flang/Runtime/freestanding-tools.h:115:27: warning: unused function 'MemmoveWrapper' [-Wunused-function]
  115 | static RT_API_ATTRS void *MemmoveWrapper(
      |                           ^~~~~~~~~~~~~~
1 warning generated.
command timed out: 1200 seconds without output running [b'cmake', b'--build', b'.', b'--target', b'FortranRuntime'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1786.866735

@jhuber6
Copy link
Contributor

jhuber6 commented Feb 14, 2025

@AlexVlx I found this breaks my clang build, reduced the C++ source to the following reproducer https://godbolt.org/z/jGnvKeqvr. Please verify that it breaks for you as well and revert or fix. (The issue is the s() function call not using AS(5))

#pragma omp begin declare target
struct S {
  ~S() { };
};
S s();
struct E {
  S foo;
  E();
};
E::E() : foo(s()) {}
#pragma omp end declare target
> ./bin/clang++ omp-bug.cpp -fopenmp --offload-arch=gfx1030 -nogpulib
clang-21: /home/jhuber/Documents/llvm/llvm-project/clang/lib/CodeGen/CGCall.cpp:5648: clang::CodeGen::RValue clang::CodeGen::CodeGenFunction::EmitCall(const clang::CodeGen::CGFunctionInfo&, const clang::CodeGen::CGCallee&, clang::CodeGen::ReturnValueSlot, const clang::CodeGen::CallArgList&, llvm::CallBase**, bool, clang::SourceLocation, bool): Assertion `IRCallArgs[i]->getType() == IRFuncTy->getParamType(i)' failed.

joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Feb 14, 2025
…so use that (llvm#114062)

`sret` arguments are always going to reside in the stack/`alloca`
address space, which makes the current formulation where their AS is
derived from the pointee somewhat quaint. This patch ensures that `sret`
ends up pointing to the `alloca` AS in IR function signatures, and also
guards agains trying to pass a casted `alloca`d pointer to a `sret` arg,
which can happen for most languages, when compiled for targets that have
a non-zero `alloca` AS (e.g. AMDGCN) / map `LangAS::default` to a
non-zero value (SPIR-V). A target could still choose to do something
different here, by e.g. overriding `classifyReturnType` behaviour.

In a broader sense, this patch extends non-aliased indirect args to also
carry an AS, which leads to changing the `getIndirect()` interface. At
the moment we're only using this for (indirect) returns, but it allows
for future handling of indirect args themselves. We default to using the
AllocaAS as that matches what Clang is currently doing, however if, in
the future, a target would opt for e.g. placing indirect returns in some
other storage, with another AS, this will require revisiting.

---------

Co-authored-by: Matt Arsenault <[email protected]>
Co-authored-by: Matt Arsenault <[email protected]>
@ronlieb
Copy link
Contributor

ronlieb commented Feb 16, 2025

seeing breaks in downstream build of rocPRIM

@arsenm
Copy link
Contributor

arsenm commented Feb 16, 2025

seeing breaks in downstream build of rocPRIM

Probably need to revert the downstream revert of the original problematic patch

@AlexVlx
Copy link
Contributor Author

AlexVlx commented Feb 17, 2025

@AlexVlx I found this breaks my clang build, reduced the C++ source to the following reproducer https://godbolt.org/z/jGnvKeqvr. Please verify that it breaks for you as well and revert or fix. (The issue is the s() function call not using AS(5))

#pragma omp begin declare target
struct S {
  ~S() { };
};
S s();
struct E {
  S foo;
  E();
};
E::E() : foo(s()) {}
#pragma omp end declare target
> ./bin/clang++ omp-bug.cpp -fopenmp --offload-arch=gfx1030 -nogpulib
clang-21: /home/jhuber/Documents/llvm/llvm-project/clang/lib/CodeGen/CGCall.cpp:5648: clang::CodeGen::RValue clang::CodeGen::CodeGenFunction::EmitCall(const clang::CodeGen::CGFunctionInfo&, const clang::CodeGen::CGCallee&, clang::CodeGen::ReturnValueSlot, const clang::CodeGen::CallArgList&, llvm::CallBase**, bool, clang::SourceLocation, bool): Assertion `IRCallArgs[i]->getType() == IRFuncTy->getParamType(i)' failed.

Thank you for flagging this. Please see #127528.

AlexVlx added a commit that referenced this pull request Feb 17, 2025
This removes a vestigial assertion, which would erroneously trigger even
though we now correctly handle valid arg mismatches
(<https://github.com/llvm/llvm-project/blob/2dda529838e622e7a79b1e26d2899f319fd7e379/clang/lib/CodeGen/CGCall.cpp#L5397>),
after #114062 went in.
@arsenm
Copy link
Contributor

arsenm commented Feb 18, 2025

/cherry-pick 39ec9de

@llvmbot
Copy link
Member

llvmbot commented Feb 18, 2025

/pull-request #127552

@AlexVlx
Copy link
Contributor Author

AlexVlx commented Feb 18, 2025

seeing breaks in downstream build of rocPRIM

Probably need to revert the downstream revert of the original problematic patch

It's not quite that, we have a problem with the following pattern: https://gcc.godbolt.org/z/4bheaqYev. r is a returned val so it will come from an / as the result of uncasted alloca, therefore its address will point to the AllocaAS. C and C++ casts yield bitcasts, not AS casts, so we end up with an C cast expr (see AST) that tries to bitcast from a pointer to the AllocaAS to a pointer to flat / generic, which is invalid. I'm not entirely yet sure how to fix this yet except going to an earlier iteration of this where we use casted allocas everywhere and handle sret a bit more noisily.

@jhuber6
Copy link
Contributor

jhuber6 commented Feb 18, 2025

@AlexVlx The reproducer I posted above still crashes, just in a different place.

clang-21: /home/jhuber/Documents/llvm/llvm-project/llvm/lib/IR/Instructions.cpp:744: void llvm::CallInst::init(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*> >, const llvm::Twine&): Assertion `(i >= FTy->getNumParams() || FTy->getParamType(i) == Args[i]->getType()) && "Calling a function with a bad signature!"' failed.

sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Feb 24, 2025
…so use that (llvm#114062)

`sret` arguments are always going to reside in the stack/`alloca`
address space, which makes the current formulation where their AS is
derived from the pointee somewhat quaint. This patch ensures that `sret`
ends up pointing to the `alloca` AS in IR function signatures, and also
guards agains trying to pass a casted `alloca`d pointer to a `sret` arg,
which can happen for most languages, when compiled for targets that have
a non-zero `alloca` AS (e.g. AMDGCN) / map `LangAS::default` to a
non-zero value (SPIR-V). A target could still choose to do something
different here, by e.g. overriding `classifyReturnType` behaviour.

In a broader sense, this patch extends non-aliased indirect args to also
carry an AS, which leads to changing the `getIndirect()` interface. At
the moment we're only using this for (indirect) returns, but it allows
for future handling of indirect args themselves. We default to using the
AllocaAS as that matches what Clang is currently doing, however if, in
the future, a target would opt for e.g. placing indirect returns in some
other storage, with another AS, this will require revisiting.

---------

Co-authored-by: Matt Arsenault <[email protected]>
Co-authored-by: Matt Arsenault <[email protected]>
arsenm pushed a commit that referenced this pull request Feb 27, 2025
This addresses two issues introduced by moving indirect args into an
explicit AS (please see
<#114062 (comment)>
and
<#114062 (comment)>):

1. Unconditionally stripping casts from a pre-allocated return slot was
incorrect / insufficient (this is illustrated by the
`amdgcn_sret_ctor.cpp` test);
2. Putting compiler manufactured sret args in a non default AS can lead
to a C-cast (surprisingly) requiring an AS cast (this is illustrated by
the `sret_cast_with_nonzero_alloca_as.cpp test).

The way we handle (2), by subverting CK_BitCast emission iff a sret arg
is involved, is quite naff, but I couldn't think of any other way to use
a non default indirect AS and make this case work (hopefully this is a
failure of imagination on my part).
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 27, 2025
This addresses two issues introduced by moving indirect args into an
explicit AS (please see
<llvm/llvm-project#114062 (comment)>
and
<llvm/llvm-project#114062 (comment)>):

1. Unconditionally stripping casts from a pre-allocated return slot was
incorrect / insufficient (this is illustrated by the
`amdgcn_sret_ctor.cpp` test);
2. Putting compiler manufactured sret args in a non default AS can lead
to a C-cast (surprisingly) requiring an AS cast (this is illustrated by
the `sret_cast_with_nonzero_alloca_as.cpp test).

The way we handle (2), by subverting CK_BitCast emission iff a sret arg
is involved, is quite naff, but I couldn't think of any other way to use
a non default indirect AS and make this case work (hopefully this is a
failure of imagination on my part).
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 1, 2025
…so use that (llvm#114062)

`sret` arguments are always going to reside in the stack/`alloca`
address space, which makes the current formulation where their AS is
derived from the pointee somewhat quaint. This patch ensures that `sret`
ends up pointing to the `alloca` AS in IR function signatures, and also
guards agains trying to pass a casted `alloca`d pointer to a `sret` arg,
which can happen for most languages, when compiled for targets that have
a non-zero `alloca` AS (e.g. AMDGCN) / map `LangAS::default` to a
non-zero value (SPIR-V). A target could still choose to do something
different here, by e.g. overriding `classifyReturnType` behaviour.

In a broader sense, this patch extends non-aliased indirect args to also
carry an AS, which leads to changing the `getIndirect()` interface. At
the moment we're only using this for (indirect) returns, but it allows
for future handling of indirect args themselves. We default to using the
AllocaAS as that matches what Clang is currently doing, however if, in
the future, a target would opt for e.g. placing indirect returns in some
other storage, with another AS, this will require revisiting.

---------

Co-authored-by: Matt Arsenault <[email protected]>
Co-authored-by: Matt Arsenault <[email protected]>
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 1, 2025
This addresses two issues introduced by moving indirect args into an
explicit AS (please see
<llvm#114062 (comment)>
and
<llvm#114062 (comment)>):

1. Unconditionally stripping casts from a pre-allocated return slot was
incorrect / insufficient (this is illustrated by the
`amdgcn_sret_ctor.cpp` test);
2. Putting compiler manufactured sret args in a non default AS can lead
to a C-cast (surprisingly) requiring an AS cast (this is illustrated by
the `sret_cast_with_nonzero_alloca_as.cpp test).

The way we handle (2), by subverting CK_BitCast emission iff a sret arg
is involved, is quite naff, but I couldn't think of any other way to use
a non default indirect AS and make this case work (hopefully this is a
failure of imagination on my part).
joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Mar 3, 2025
This addresses two issues introduced by moving indirect args into an
explicit AS (please see
<llvm#114062 (comment)>
and
<llvm#114062 (comment)>):

1. Unconditionally stripping casts from a pre-allocated return slot was
incorrect / insufficient (this is illustrated by the
`amdgcn_sret_ctor.cpp` test);
2. Putting compiler manufactured sret args in a non default AS can lead
to a C-cast (surprisingly) requiring an AS cast (this is illustrated by
the `sret_cast_with_nonzero_alloca_as.cpp test).

The way we handle (2), by subverting CK_BitCast emission iff a sret arg
is involved, is quite naff, but I couldn't think of any other way to use
a non default indirect AS and make this case work (hopefully this is a
failure of imagination on my part).
aelovikov-intel pushed a commit to intel/llvm that referenced this pull request Apr 22, 2025
)

Recent community change llvm/llvm-project#114062
enabled the use of
alloca address space for sret arguments. This causes several issues for
sycl, particularly for the SPIR
target where this leads to invalid address-space-casts from the local
address space.

The new option -foffload-use-alloca-addrspace-for-srets is TRUE by
default (and produces the current
community behavior) and is set to FALSE in sycl device compilation modes
(where the prior behavior is
retained). The commit also reverts some test changes made to reflect the
current community behavior.

---------

Co-authored-by: Premanand M Rao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
Development

Successfully merging this pull request may close these issues.

7 participants