Skip to content

Commit 234cc81

Browse files
[LLVM][Coroutines] Create .noalloc variant of switch ABI coroutine ramp functions during CoroSplit (#99283)
This patch is episode two of the coroutine HALO improvement project published on discourse: https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044 Previously CoroElide depends on inlining, and its analysis does not work very well with code generated by the C++ frontend due the existence of many customization points. There has been issue reported to upstream how ineffective the original CoroElide was in real world applications. For C++ users, this set of patches aim to fix this problem by providing library authors and users deterministic HALO behaviour for some well-behaved coroutine `Task` types. The stack begins with a library side attribute on the `Task` class that guarantees no unstructured concurrency when coroutines are awaited directly with `co_await`ed as a prvalue. This attribute on Task types gives us lifetime guarantees and makes C++ FE capable to telling the ME which coroutine calls are elidable. We convey such information from FE through the attribute `coro_elide_safe`. This patch modifies CoroSplit to create a variant of the coroutine ramp function that 1) does not use heap allocated frame, instead take an additional parameter as the pointer to the frame. Such parameter is attributed with `dereferenceble` and `align` to convey size and align requirements for the frame. 2) always stores cleanup instead of destroy address for `coro.destroy()` actions. In a later patch, we will have a new pass that runs right after CoroSplit to find usages of the callee coroutine attributed `coro_elide_safe` in presplit coroutine callers, allocates the frame on its "stack", transform those usages to call the `noalloc` ramp function variant. (note I put quotes on the word "stack" here, because for presplit coroutine, any alloca will be spilled into the frame when it's being split) The C++ Frontend attribute implementation that works with this change can be found at #99282 The pass that makes use of the new `noalloc` split can be found at #99285
1 parent e17a39b commit 234cc81

File tree

5 files changed

+191
-26
lines changed

5 files changed

+191
-26
lines changed

llvm/docs/Coroutines.rst

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2022,6 +2022,12 @@ The pass CoroSplit builds coroutine frame and outlines resume and destroy parts
20222022
into separate functions. This pass also lowers `coro.await.suspend.void`_,
20232023
`coro.await.suspend.bool`_ and `coro.await.suspend.handle`_ intrinsics.
20242024

2025+
CoroAnnotationElide
2026+
-------------------
2027+
This pass finds all usages of coroutines that are "must elide" and replaces
2028+
`coro.begin` intrinsic with an address of a coroutine frame placed on its caller
2029+
and replaces `coro.alloc` and `coro.free` intrinsics with `false` and `null`
2030+
respectively to remove the deallocation code.
20252031

20262032
CoroElide
20272033
---------
@@ -2049,6 +2055,18 @@ the coroutine must reach the final suspend point when it get destroyed.
20492055

20502056
This attribute only works for switched-resume coroutines now.
20512057

2058+
coro_elide_safe
2059+
---------------
2060+
2061+
When a Call or Invoke instruction to switch ABI coroutine `f` is marked with
2062+
`coro_elide_safe`, CoroSplitPass generates a `f.noalloc` ramp function.
2063+
`f.noalloc` has one more argument than its original ramp function `f`, which is
2064+
the pointer to the allocated frame. `f.noalloc` also suppressed any allocations
2065+
or deallocations that may be guarded by `@llvm.coro.alloc` and `@llvm.coro.free`.
2066+
2067+
CoroAnnotationElidePass performs the heap elision when possible. Note that for
2068+
recursive or mutually recursive functions this elision is usually not possible.
2069+
20522070
Metadata
20532071
========
20542072

llvm/lib/Transforms/Coroutines/CoroInternal.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,13 @@ bool declaresIntrinsics(const Module &M,
2626
const std::initializer_list<StringRef>);
2727
void replaceCoroFree(CoroIdInst *CoroId, bool Elide);
2828

29+
/// Replaces all @llvm.coro.alloc intrinsics calls associated with a given
30+
/// call @llvm.coro.id instruction with boolean value false.
31+
void suppressCoroAllocs(CoroIdInst *CoroId);
32+
/// Replaces CoroAllocs with boolean value false.
33+
void suppressCoroAllocs(LLVMContext &Context,
34+
ArrayRef<CoroAllocInst *> CoroAllocs);
35+
2936
/// Attempts to rewrite the location operand of debug intrinsics in terms of
3037
/// the coroutine frame pointer, folding pointer offsets into the DIExpression
3138
/// of the intrinsic.

llvm/lib/Transforms/Coroutines/CoroSplit.cpp

Lines changed: 124 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
#include "llvm/ADT/PriorityWorklist.h"
2626
#include "llvm/ADT/SmallPtrSet.h"
2727
#include "llvm/ADT/SmallVector.h"
28+
#include "llvm/ADT/StringExtras.h"
2829
#include "llvm/ADT/StringRef.h"
2930
#include "llvm/ADT/Twine.h"
3031
#include "llvm/Analysis/CFG.h"
@@ -1177,6 +1178,14 @@ static void updateAsyncFuncPointerContextSize(coro::Shape &Shape) {
11771178
Shape.AsyncLowering.AsyncFuncPointer->setInitializer(NewFuncPtrStruct);
11781179
}
11791180

1181+
static TypeSize getFrameSizeForShape(coro::Shape &Shape) {
1182+
// In the same function all coro.sizes should have the same result type.
1183+
auto *SizeIntrin = Shape.CoroSizes.back();
1184+
Module *M = SizeIntrin->getModule();
1185+
const DataLayout &DL = M->getDataLayout();
1186+
return DL.getTypeAllocSize(Shape.FrameTy);
1187+
}
1188+
11801189
static void replaceFrameSizeAndAlignment(coro::Shape &Shape) {
11811190
if (Shape.ABI == coro::ABI::Async)
11821191
updateAsyncFuncPointerContextSize(Shape);
@@ -1192,10 +1201,8 @@ static void replaceFrameSizeAndAlignment(coro::Shape &Shape) {
11921201

11931202
// In the same function all coro.sizes should have the same result type.
11941203
auto *SizeIntrin = Shape.CoroSizes.back();
1195-
Module *M = SizeIntrin->getModule();
1196-
const DataLayout &DL = M->getDataLayout();
1197-
auto Size = DL.getTypeAllocSize(Shape.FrameTy);
1198-
auto *SizeConstant = ConstantInt::get(SizeIntrin->getType(), Size);
1204+
auto *SizeConstant =
1205+
ConstantInt::get(SizeIntrin->getType(), getFrameSizeForShape(Shape));
11991206

12001207
for (CoroSizeInst *CS : Shape.CoroSizes) {
12011208
CS->replaceAllUsesWith(SizeConstant);
@@ -1452,6 +1459,75 @@ struct SwitchCoroutineSplitter {
14521459
setCoroInfo(F, Shape, Clones);
14531460
}
14541461

1462+
// Create a variant of ramp function that does not perform heap allocation
1463+
// for a switch ABI coroutine.
1464+
//
1465+
// The newly split `.noalloc` ramp function has the following differences:
1466+
// - Has one additional frame pointer parameter in lieu of dynamic
1467+
// allocation.
1468+
// - Suppressed allocations by replacing coro.alloc and coro.free.
1469+
static Function *createNoAllocVariant(Function &F, coro::Shape &Shape,
1470+
SmallVectorImpl<Function *> &Clones) {
1471+
assert(Shape.ABI == coro::ABI::Switch);
1472+
auto *OrigFnTy = F.getFunctionType();
1473+
auto OldParams = OrigFnTy->params();
1474+
1475+
SmallVector<Type *> NewParams;
1476+
NewParams.reserve(OldParams.size() + 1);
1477+
NewParams.append(OldParams.begin(), OldParams.end());
1478+
NewParams.push_back(PointerType::getUnqual(Shape.FrameTy));
1479+
1480+
auto *NewFnTy = FunctionType::get(OrigFnTy->getReturnType(), NewParams,
1481+
OrigFnTy->isVarArg());
1482+
Function *NoAllocF =
1483+
Function::Create(NewFnTy, F.getLinkage(), F.getName() + ".noalloc");
1484+
1485+
ValueToValueMapTy VMap;
1486+
unsigned int Idx = 0;
1487+
for (const auto &I : F.args()) {
1488+
VMap[&I] = NoAllocF->getArg(Idx++);
1489+
}
1490+
// We just appended the frame pointer as the last argument of the new
1491+
// function.
1492+
auto FrameIdx = NoAllocF->arg_size() - 1;
1493+
SmallVector<ReturnInst *, 4> Returns;
1494+
CloneFunctionInto(NoAllocF, &F, VMap,
1495+
CloneFunctionChangeType::LocalChangesOnly, Returns);
1496+
1497+
if (Shape.CoroBegin) {
1498+
auto *NewCoroBegin =
1499+
cast_if_present<CoroBeginInst>(VMap[Shape.CoroBegin]);
1500+
auto *NewCoroId = cast<CoroIdInst>(NewCoroBegin->getId());
1501+
coro::replaceCoroFree(NewCoroId, /*Elide=*/true);
1502+
coro::suppressCoroAllocs(NewCoroId);
1503+
NewCoroBegin->replaceAllUsesWith(NoAllocF->getArg(FrameIdx));
1504+
NewCoroBegin->eraseFromParent();
1505+
}
1506+
1507+
Module *M = F.getParent();
1508+
M->getFunctionList().insert(M->end(), NoAllocF);
1509+
1510+
removeUnreachableBlocks(*NoAllocF);
1511+
auto NewAttrs = NoAllocF->getAttributes();
1512+
// When we elide allocation, we read these attributes to determine the
1513+
// frame size and alignment.
1514+
addFramePointerAttrs(NewAttrs, NoAllocF->getContext(), FrameIdx,
1515+
Shape.FrameSize, Shape.FrameAlign,
1516+
/*NoAlias=*/false);
1517+
1518+
NoAllocF->setAttributes(NewAttrs);
1519+
1520+
Clones.push_back(NoAllocF);
1521+
// Reset the original function's coro info, make the new noalloc variant
1522+
// connected to the original ramp function.
1523+
setCoroInfo(F, Shape, Clones);
1524+
// After copying, set the linkage to internal linkage. Original function
1525+
// may have different linkage, but optimization dependent on this function
1526+
// generally relies on LTO.
1527+
NoAllocF->setLinkage(llvm::GlobalValue::InternalLinkage);
1528+
return NoAllocF;
1529+
}
1530+
14551531
private:
14561532
// Create a resume clone by cloning the body of the original function, setting
14571533
// new entry block and replacing coro.suspend an appropriate value to force
@@ -1910,6 +1986,33 @@ class PrettyStackTraceFunction : public PrettyStackTraceEntry {
19101986
};
19111987
} // namespace
19121988

1989+
/// Remove calls to llvm.coro.end in the original function.
1990+
static void removeCoroEndsFromRampFunction(const coro::Shape &Shape) {
1991+
if (Shape.ABI != coro::ABI::Switch) {
1992+
for (auto *End : Shape.CoroEnds) {
1993+
replaceCoroEnd(End, Shape, Shape.FramePtr, /*in resume*/ false, nullptr);
1994+
}
1995+
} else {
1996+
for (llvm::AnyCoroEndInst *End : Shape.CoroEnds) {
1997+
auto &Context = End->getContext();
1998+
End->replaceAllUsesWith(ConstantInt::getFalse(Context));
1999+
End->eraseFromParent();
2000+
}
2001+
}
2002+
}
2003+
2004+
static bool hasSafeElideCaller(Function &F) {
2005+
for (auto *U : F.users()) {
2006+
if (auto *CB = dyn_cast<CallBase>(U)) {
2007+
auto *Caller = CB->getFunction();
2008+
if (Caller && Caller->isPresplitCoroutine() &&
2009+
CB->hasFnAttr(llvm::Attribute::CoroElideSafe))
2010+
return true;
2011+
}
2012+
}
2013+
return false;
2014+
}
2015+
19132016
static coro::Shape
19142017
splitCoroutine(Function &F, SmallVectorImpl<Function *> &Clones,
19152018
TargetTransformInfo &TTI, bool OptimizeFrame,
@@ -1929,10 +2032,15 @@ splitCoroutine(Function &F, SmallVectorImpl<Function *> &Clones,
19292032
simplifySuspendPoints(Shape);
19302033
buildCoroutineFrame(F, Shape, TTI, MaterializableCallback);
19312034
replaceFrameSizeAndAlignment(Shape);
2035+
bool isNoSuspendCoroutine = Shape.CoroSuspends.empty();
2036+
2037+
bool shouldCreateNoAllocVariant = !isNoSuspendCoroutine &&
2038+
Shape.ABI == coro::ABI::Switch &&
2039+
hasSafeElideCaller(F);
19322040

19332041
// If there are no suspend points, no split required, just remove
19342042
// the allocation and deallocation blocks, they are not needed.
1935-
if (Shape.CoroSuspends.empty()) {
2043+
if (isNoSuspendCoroutine) {
19362044
handleNoSuspendCoroutine(Shape);
19372045
} else {
19382046
switch (Shape.ABI) {
@@ -1962,22 +2070,13 @@ splitCoroutine(Function &F, SmallVectorImpl<Function *> &Clones,
19622070
coro::salvageDebugInfo(ArgToAllocaMap, *DDI, false /*UseEntryValue*/);
19632071
for (DbgVariableRecord *DVR : DbgVariableRecords)
19642072
coro::salvageDebugInfo(ArgToAllocaMap, *DVR, false /*UseEntryValue*/);
1965-
return Shape;
1966-
}
19672073

1968-
/// Remove calls to llvm.coro.end in the original function.
1969-
static void removeCoroEndsFromRampFunction(const coro::Shape &Shape) {
1970-
if (Shape.ABI != coro::ABI::Switch) {
1971-
for (auto *End : Shape.CoroEnds) {
1972-
replaceCoroEnd(End, Shape, Shape.FramePtr, /*in resume*/ false, nullptr);
1973-
}
1974-
} else {
1975-
for (llvm::AnyCoroEndInst *End : Shape.CoroEnds) {
1976-
auto &Context = End->getContext();
1977-
End->replaceAllUsesWith(ConstantInt::getFalse(Context));
1978-
End->eraseFromParent();
1979-
}
1980-
}
2074+
removeCoroEndsFromRampFunction(Shape);
2075+
2076+
if (shouldCreateNoAllocVariant)
2077+
SwitchCoroutineSplitter::createNoAllocVariant(F, Shape, Clones);
2078+
2079+
return Shape;
19812080
}
19822081

19832082
static void updateCallGraphAfterCoroutineSplit(
@@ -2108,13 +2207,12 @@ PreservedAnalyses CoroSplitPass::run(LazyCallGraph::SCC &C,
21082207
F.setSplittedCoroutine();
21092208

21102209
SmallVector<Function *, 4> Clones;
2111-
auto &ORE = FAM.getResult<OptimizationRemarkEmitterAnalysis>(F);
2112-
const coro::Shape Shape =
2210+
coro::Shape Shape =
21132211
splitCoroutine(F, Clones, FAM.getResult<TargetIRAnalysis>(F),
21142212
OptimizeFrame, MaterializableCallback);
2115-
removeCoroEndsFromRampFunction(Shape);
21162213
updateCallGraphAfterCoroutineSplit(*N, Shape, Clones, C, CG, AM, UR, FAM);
21172214

2215+
auto &ORE = FAM.getResult<OptimizationRemarkEmitterAnalysis>(F);
21182216
ORE.emit([&]() {
21192217
return OptimizationRemark(DEBUG_TYPE, "CoroSplit", &F)
21202218
<< "Split '" << ore::NV("function", F.getName())
@@ -2130,9 +2228,9 @@ PreservedAnalyses CoroSplitPass::run(LazyCallGraph::SCC &C,
21302228
}
21312229
}
21322230

2133-
for (auto *PrepareFn : PrepareFns) {
2134-
replaceAllPrepares(PrepareFn, CG, C);
2135-
}
2231+
for (auto *PrepareFn : PrepareFns) {
2232+
replaceAllPrepares(PrepareFn, CG, C);
2233+
}
21362234

21372235
return PreservedAnalyses::none();
21382236
}

llvm/lib/Transforms/Coroutines/Coroutines.cpp

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,6 +145,33 @@ void coro::replaceCoroFree(CoroIdInst *CoroId, bool Elide) {
145145
}
146146
}
147147

148+
void coro::suppressCoroAllocs(CoroIdInst *CoroId) {
149+
SmallVector<CoroAllocInst *, 4> CoroAllocs;
150+
for (User *U : CoroId->users())
151+
if (auto *CA = dyn_cast<CoroAllocInst>(U))
152+
CoroAllocs.push_back(CA);
153+
154+
if (CoroAllocs.empty())
155+
return;
156+
157+
coro::suppressCoroAllocs(CoroId->getContext(), CoroAllocs);
158+
}
159+
160+
// Replacing llvm.coro.alloc with false will suppress dynamic
161+
// allocation as it is expected for the frontend to generate the code that
162+
// looks like:
163+
// id = coro.id(...)
164+
// mem = coro.alloc(id) ? malloc(coro.size()) : 0;
165+
// coro.begin(id, mem)
166+
void coro::suppressCoroAllocs(LLVMContext &Context,
167+
ArrayRef<CoroAllocInst *> CoroAllocs) {
168+
auto *False = ConstantInt::getFalse(Context);
169+
for (auto *CA : CoroAllocs) {
170+
CA->replaceAllUsesWith(False);
171+
CA->eraseFromParent();
172+
}
173+
}
174+
148175
static void clear(coro::Shape &Shape) {
149176
Shape.CoroBegin = nullptr;
150177
Shape.CoroEnds.clear();

llvm/test/Transforms/Coroutines/coro-split-00.ll

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,13 @@ suspend:
3232
ret ptr %hdl
3333
}
3434

35+
; Make a safe_elide call to f and CoroSplit should generate the .noalloc variant
36+
define void @caller() presplitcoroutine {
37+
entry:
38+
%ptr = call ptr @f() #1
39+
ret void
40+
}
41+
3542
; CHECK-LABEL: @f() !func_sanitize !0 {
3643
; CHECK: call ptr @malloc
3744
; CHECK: @llvm.coro.begin(token %id, ptr %phi)
@@ -63,6 +70,13 @@ suspend:
6370
; CHECK-NOT: call void @free(
6471
; CHECK: ret void
6572

73+
; CHECK-LABEL: @f.noalloc(ptr noundef nonnull align 8 dereferenceable(24) %{{.*}})
74+
; CHECK-NOT: call ptr @malloc
75+
; CHECK: call void @print(i32 0)
76+
; CHECK-NOT: call void @print(i32 1)
77+
; CHECK-NOT: call void @free(
78+
; CHECK: ret ptr %{{.*}}
79+
6680
declare ptr @llvm.coro.free(token, ptr)
6781
declare i32 @llvm.coro.size.i32()
6882
declare i8 @llvm.coro.suspend(token, i1)
@@ -79,3 +93,4 @@ declare void @print(i32)
7993
declare void @free(ptr) willreturn allockind("free") "alloc-family"="malloc"
8094

8195
!0 = !{i32 846595819, ptr null}
96+
attributes #1 = { coro_elide_safe }

0 commit comments

Comments
 (0)