Skip to content

[clang][CodeGen][OpenCL] Fix alloca handling #113930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions clang/lib/CodeGen/CGCall.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1648,6 +1648,8 @@ CodeGenTypes::GetFunctionType(const CGFunctionInfo &FI) {
// Add type for sret argument.
if (IRFunctionArgs.hasSRetArg()) {
QualType Ret = FI.getReturnType();
if (CGM.getLangOpts().OpenCL)
Ret = getContext().getAddrSpaceQualType(Ret, LangAS::opencl_private);
unsigned AddressSpace = CGM.getTypes().getTargetAddressSpace(Ret);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like extra steps around directly calling getContext().getTargetAddressSpace(LangAS::opencl_private)

But more importantly, I would expect this to have been handled by the target API lowering code. The language wouldn't factor in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a reasonable expectation but it's not handled at the moment, and it appears that people just gave up on it and relied on the terrible PrivateIsDefault AS maps. I'm not really adding much functionality here, just ensuring that we can safely survive moving away from that.

ArgTypes[IRFunctionArgs.getSRetArgNo()] =
llvm::PointerType::get(getLLVMContext(), AddressSpace);
Expand Down
6 changes: 5 additions & 1 deletion clang/lib/CodeGen/CGExpr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -108,11 +108,15 @@ RawAddress CodeGenFunction::CreateTempAlloca(llvm::Type *Ty, CharUnits Align,
if (AllocaAddr)
*AllocaAddr = Alloca;
llvm::Value *V = Alloca.getPointer();
assert((!getLangOpts().OpenCL ||
CGM.getTarget().getTargetAddressSpace(getASTAllocaAddressSpace()) ==
CGM.getTarget().getTargetAddressSpace(LangAS::opencl_private)) &&
"For OpenCL allocas must allocate in the private address space!");
Comment on lines +106 to +109
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion probably does not hold for nvptx. Alloca returns a generic pointer and is later lowered to the whatever they call private + cast in the backend

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you certain? That's not how I'm reading its ASMAP, think this'd work just fine it aliases generic and private (and appears they designed their ASMap to handle precisely this case). Am I missing something in particular? The above is only a concern in Clang, once it's made it to the BE it's free to do whatever, really.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the nature of the PTX hack. They pretend everything is 0, and then fix up stack later. The backend then runs NVPTXLowerAlloca to introduce a bunch of casts to show that it really was an addrspace(5) pointer to begin with around most use contexts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then this is safe / the assert won't flare and it matches existing behaviour, does it not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (but will be invalid if they ever fix up that hack, as there have been posts suggesting favor of in the past)

// Alloca always returns a pointer in alloca address space, which may
// be different from the type defined by the language. For example,
// in C++ the auto variables are in the default address space. Therefore
// cast alloca to the default address space when necessary.
if (getASTAllocaAddressSpace() != LangAS::Default) {
if (!getLangOpts().OpenCL && getASTAllocaAddressSpace() != LangAS::Default) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see why the language mode would factor in. The below code also seems more complex than necessary. I would expect there to be a language typed value, and there to be an IR value. You would simply insert the cast if they didn't match, so not sure what the performAddrSpaceCast hook is for. Is that a holdover from before addrspacecast existed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The language factors in here because you cannot multi-purpose / multi-map LangAS::Default, which is why we (and others) used the hacked AS map that had private as default. Maybe there's a more elegant solution to this that does things upstream, and perhaps we'll implement it later, but this is borderline NFC to deal with dropping the privateIsDefault hacked AS maps. As for the rest of the code, I've not authored it, so I cannot comment one way or the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What language type value? The issue is not with the allocated value's type, it's around the allocation's type, or rather how it is presented. This code has been there for years, I've not added it, I'm not going to touch it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole casting block should be deleted. A memory temporary should be the natural alloca type. Any user that wants the cast should handle this in the narrow use case.

This cannot be language dependent. The current behavior is ABI breaking for aggregates passed on the stack (see the device enqueue tests for an example, the byref temporary argument is broken to be a flat pointer). We are also just creating a lot of unnecessary casting spam for later passes to clean up in most contexts

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok , this is the version with the cast. I think the naming / default should be inverted, but that's an issue for another day.

There still shouldn't be modality here, and at least one other context is using the wrong version

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching CreateMemTemp to use CreateTempAllocaWithoutCast seems to help

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Baby step that fixes my immediate issue: #129837

auto DestAddrSpace = getContext().getTargetAddressSpace(LangAS::Default);
llvm::IRBuilderBase::InsertPointGuard IPG(Builder);
// When ArraySize is nullptr, alloca is inserted at AllocaInsertPt,
Expand Down
169 changes: 70 additions & 99 deletions clang/test/CodeGenOpenCL/addr-space-struct-arg.cl

Large diffs are not rendered by default.

35 changes: 15 additions & 20 deletions clang/test/CodeGenOpenCL/amdgcn-automatic-variable.cl
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,8 @@
// CL20-SAME: ptr noundef [[X:%.*]]) #[[ATTR0:[0-9]+]] {
// CL20-NEXT: [[ENTRY:.*:]]
// CL20-NEXT: [[X_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
// CL20-NEXT: [[X_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[X_ADDR]] to ptr
// CL20-NEXT: store ptr [[X]], ptr [[X_ADDR_ASCAST]], align 8
// CL20-NEXT: [[TMP0:%.*]] = load ptr, ptr [[X_ADDR_ASCAST]], align 8
// CL20-NEXT: store ptr [[X]], ptr addrspace(5) [[X_ADDR]], align 8
// CL20-NEXT: [[TMP0:%.*]] = load ptr, ptr addrspace(5) [[X_ADDR]], align 8
// CL20-NEXT: store i32 1, ptr [[TMP0]], align 4
// CL20-NEXT: ret void
//
Expand Down Expand Up @@ -55,22 +54,19 @@ void func1(int *x) {
// CL20-NEXT: [[LP1:%.*]] = alloca ptr, align 8, addrspace(5)
// CL20-NEXT: [[LP2:%.*]] = alloca ptr, align 8, addrspace(5)
// CL20-NEXT: [[LVC:%.*]] = alloca i32, align 4, addrspace(5)
// CL20-NEXT: store i32 1, ptr addrspace(5) [[LV1]], align 4
// CL20-NEXT: store i32 2, ptr addrspace(5) [[LV2]], align 4
// CL20-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [100 x i32], ptr addrspace(5) [[LA]], i64 0, i64 0
// CL20-NEXT: store i32 3, ptr addrspace(5) [[ARRAYIDX]], align 4
// CL20-NEXT: [[LV1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[LV1]] to ptr
// CL20-NEXT: [[LV2_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[LV2]] to ptr
// CL20-NEXT: [[LA_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[LA]] to ptr
// CL20-NEXT: [[LP1_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[LP1]] to ptr
// CL20-NEXT: [[LP2_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[LP2]] to ptr
// CL20-NEXT: [[LVC_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[LVC]] to ptr
// CL20-NEXT: store i32 1, ptr [[LV1_ASCAST]], align 4
// CL20-NEXT: store i32 2, ptr [[LV2_ASCAST]], align 4
// CL20-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [100 x i32], ptr [[LA_ASCAST]], i64 0, i64 0
// CL20-NEXT: store i32 3, ptr [[ARRAYIDX]], align 4
// CL20-NEXT: store ptr [[LV1_ASCAST]], ptr [[LP1_ASCAST]], align 8
// CL20-NEXT: [[ARRAYDECAY:%.*]] = getelementptr inbounds [100 x i32], ptr [[LA_ASCAST]], i64 0, i64 0
// CL20-NEXT: store ptr [[ARRAYDECAY]], ptr [[LP2_ASCAST]], align 8
// CL20-NEXT: call void @func1(ptr noundef [[LV1_ASCAST]]) #[[ATTR2:[0-9]+]]
// CL20-NEXT: store i32 4, ptr [[LVC_ASCAST]], align 4
// CL20-NEXT: store i32 4, ptr [[LV1_ASCAST]], align 4
// CL20-NEXT: store ptr [[LV1_ASCAST]], ptr addrspace(5) [[LP1]], align 8
// CL20-NEXT: [[ARRAYDECAY:%.*]] = getelementptr inbounds [100 x i32], ptr addrspace(5) [[LA]], i64 0, i64 0
// CL20-NEXT: [[ARRAYDECAY_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[ARRAYDECAY]] to ptr
// CL20-NEXT: store ptr [[ARRAYDECAY_ASCAST]], ptr addrspace(5) [[LP2]], align 8
// CL20-NEXT: [[LV1_ASCAST1:%.*]] = addrspacecast ptr addrspace(5) [[LV1]] to ptr
// CL20-NEXT: call void @func1(ptr noundef [[LV1_ASCAST1]]) #[[ATTR2:[0-9]+]]
// CL20-NEXT: store i32 4, ptr addrspace(5) [[LVC]], align 4
// CL20-NEXT: store i32 4, ptr addrspace(5) [[LV1]], align 4
// CL20-NEXT: ret void
//
void func2(void) {
Expand Down Expand Up @@ -102,8 +98,7 @@ void func2(void) {
// CL20-SAME: ) #[[ATTR0]] {
// CL20-NEXT: [[ENTRY:.*:]]
// CL20-NEXT: [[A:%.*]] = alloca [16 x [1 x float]], align 4, addrspace(5)
// CL20-NEXT: [[A_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[A]] to ptr
// CL20-NEXT: call void @llvm.memset.p0.i64(ptr align 4 [[A_ASCAST]], i8 0, i64 64, i1 false)
// CL20-NEXT: call void @llvm.memset.p5.i64(ptr addrspace(5) align 4 [[A]], i8 0, i64 64, i1 false)
// CL20-NEXT: ret void
//
void func3(void) {
Expand Down
Loading
Loading