-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[LLVM][Coroutines] Perform HALO on "coro_must_elide" coroutines #98974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-clang-modules Author: Yuxuan Chen (yuxuanchen1997) ChangesThis patch is the middle end implementation for the coroutine HALO improvement project published on discourse: https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044/7 Previously CoroElide depends on inlining, and its analysis does not work very well with code generated by the C++ frontend due the existence of many customization points. There has been issue reported to upstream how ineffective the original CoroElide was in real world applications. For C++ users, this set of patches aim to fix this problem by providing library authors and users deterministic HALO behaviour for some well-behaved coroutine This patch modifies CoroSplit to create a variant of the coroutine ramp function that 1) does not use heap allocated frame, instead take an additional parameter as the pointer to the frame. Such parameter is attributed with Additionally, we have a new pass that runs right after CoroSplit to find usages of the callee coroutine annotated (note I put quotes on the word "stack" here, because for presplit coroutine, any alloca will be spilled into the frame when it's being split) The C++ Frontend attribute implementation that works with this change can be found at #98971 Patch is 75.86 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/98974.diff 39 Files Affected:
diff --git a/clang/include/clang/AST/ExprCXX.h b/clang/include/clang/AST/ExprCXX.h
index c2feac525c1ea..0cf62aee41b66 100644
--- a/clang/include/clang/AST/ExprCXX.h
+++ b/clang/include/clang/AST/ExprCXX.h
@@ -5082,7 +5082,8 @@ class CoroutineSuspendExpr : public Expr {
enum SubExpr { Operand, Common, Ready, Suspend, Resume, Count };
Stmt *SubExprs[SubExpr::Count];
- OpaqueValueExpr *OpaqueValue = nullptr;
+ OpaqueValueExpr *CommonExprOpaqueValue = nullptr;
+ OpaqueValueExpr *InplaceCallOpaqueValue = nullptr;
public:
// These types correspond to the three C++ 'await_suspend' return variants
@@ -5090,10 +5091,10 @@ class CoroutineSuspendExpr : public Expr {
CoroutineSuspendExpr(StmtClass SC, SourceLocation KeywordLoc, Expr *Operand,
Expr *Common, Expr *Ready, Expr *Suspend, Expr *Resume,
- OpaqueValueExpr *OpaqueValue)
+ OpaqueValueExpr *CommonExprOpaqueValue)
: Expr(SC, Resume->getType(), Resume->getValueKind(),
Resume->getObjectKind()),
- KeywordLoc(KeywordLoc), OpaqueValue(OpaqueValue) {
+ KeywordLoc(KeywordLoc), CommonExprOpaqueValue(CommonExprOpaqueValue) {
SubExprs[SubExpr::Operand] = Operand;
SubExprs[SubExpr::Common] = Common;
SubExprs[SubExpr::Ready] = Ready;
@@ -5128,7 +5129,16 @@ class CoroutineSuspendExpr : public Expr {
}
/// getOpaqueValue - Return the opaque value placeholder.
- OpaqueValueExpr *getOpaqueValue() const { return OpaqueValue; }
+ OpaqueValueExpr *getCommonExprOpaqueValue() const {
+ return CommonExprOpaqueValue;
+ }
+
+ OpaqueValueExpr *getInplaceCallOpaqueValue() const {
+ return InplaceCallOpaqueValue;
+ }
+ void setInplaceCallOpaqueValue(OpaqueValueExpr *E) {
+ InplaceCallOpaqueValue = E;
+ }
Expr *getReadyExpr() const {
return static_cast<Expr*>(SubExprs[SubExpr::Ready]);
@@ -5194,9 +5204,9 @@ class CoawaitExpr : public CoroutineSuspendExpr {
public:
CoawaitExpr(SourceLocation CoawaitLoc, Expr *Operand, Expr *Common,
Expr *Ready, Expr *Suspend, Expr *Resume,
- OpaqueValueExpr *OpaqueValue, bool IsImplicit = false)
+ OpaqueValueExpr *CommonExprOpaqueValue, bool IsImplicit = false)
: CoroutineSuspendExpr(CoawaitExprClass, CoawaitLoc, Operand, Common,
- Ready, Suspend, Resume, OpaqueValue) {
+ Ready, Suspend, Resume, CommonExprOpaqueValue) {
CoawaitBits.IsImplicit = IsImplicit;
}
@@ -5275,9 +5285,9 @@ class CoyieldExpr : public CoroutineSuspendExpr {
public:
CoyieldExpr(SourceLocation CoyieldLoc, Expr *Operand, Expr *Common,
Expr *Ready, Expr *Suspend, Expr *Resume,
- OpaqueValueExpr *OpaqueValue)
+ OpaqueValueExpr *CommonExprOpaqueValue)
: CoroutineSuspendExpr(CoyieldExprClass, CoyieldLoc, Operand, Common,
- Ready, Suspend, Resume, OpaqueValue) {}
+ Ready, Suspend, Resume, CommonExprOpaqueValue) {}
CoyieldExpr(SourceLocation CoyieldLoc, QualType Ty, Expr *Operand,
Expr *Common)
: CoroutineSuspendExpr(CoyieldExprClass, CoyieldLoc, Ty, Operand,
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 1293d0ddbc117..e482c9daf9fb3 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1217,6 +1217,14 @@ def CoroDisableLifetimeBound : InheritableAttr {
let SimpleHandler = 1;
}
+def CoroInplaceTask : InheritableAttr {
+ let Spellings = [Clang<"coro_inplace_task">];
+ let Subjects = SubjectList<[CXXRecord]>;
+ let LangOpts = [CPlusPlus];
+ let Documentation = [CoroInplaceTaskDoc];
+ let SimpleHandler = 1;
+}
+
// OSObject-based attributes.
def OSConsumed : InheritableParamAttr {
let Spellings = [Clang<"os_consumed">];
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 09cf4f80bd999..21d59dedec578 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -8108,6 +8108,25 @@ but do not pass them to the underlying coroutine or pass them by value.
}];
}
+def CoroInplaceTaskDoc : Documentation {
+ let Category = DocCatDecl;
+ let Content = [{
+The ``[[clang::coro_inplace_task]]`` is a class attribute which can be applied
+to a coroutine return type.
+
+When a coroutine function that returns such a type calls another coroutine function,
+the compiler performs heap allocation elision when the following conditions are all met:
+- callee coroutine function returns a type that is annotated with ``[[clang::coro_inplace_task]]``.
+- The callee coroutine function is inlined.
+- In caller coroutine, the return value of the callee is a prvalue or an xvalue, and
+- The temporary expression containing the callee coroutine object is immediately co_awaited.
+
+The behavior is undefined if any of the following condition was met:
+- the caller coroutine is destroyed earlier than the callee coroutine.
+
+ }];
+}
+
def CountedByDocs : Documentation {
let Category = DocCatField;
let Content = [{
diff --git a/clang/lib/CodeGen/CGBlocks.cpp b/clang/lib/CodeGen/CGBlocks.cpp
index 066139b1c78c7..684fda7440731 100644
--- a/clang/lib/CodeGen/CGBlocks.cpp
+++ b/clang/lib/CodeGen/CGBlocks.cpp
@@ -1163,7 +1163,8 @@ llvm::Type *CodeGenModule::getGenericBlockLiteralType() {
}
RValue CodeGenFunction::EmitBlockCallExpr(const CallExpr *E,
- ReturnValueSlot ReturnValue) {
+ ReturnValueSlot ReturnValue,
+ llvm::CallBase **CallOrInvoke) {
const auto *BPT = E->getCallee()->getType()->castAs<BlockPointerType>();
llvm::Value *BlockPtr = EmitScalarExpr(E->getCallee());
llvm::Type *GenBlockTy = CGM.getGenericBlockLiteralType();
@@ -1220,7 +1221,7 @@ RValue CodeGenFunction::EmitBlockCallExpr(const CallExpr *E,
CGCallee Callee(CGCalleeInfo(), Func);
// And call the block.
- return EmitCall(FnInfo, Callee, ReturnValue, Args);
+ return EmitCall(FnInfo, Callee, ReturnValue, Args, CallOrInvoke);
}
Address CodeGenFunction::GetAddrOfBlockDecl(const VarDecl *variable) {
diff --git a/clang/lib/CodeGen/CGCUDARuntime.cpp b/clang/lib/CodeGen/CGCUDARuntime.cpp
index c14a9d3f2bbbc..1e1da1e2411a7 100644
--- a/clang/lib/CodeGen/CGCUDARuntime.cpp
+++ b/clang/lib/CodeGen/CGCUDARuntime.cpp
@@ -25,7 +25,8 @@ CGCUDARuntime::~CGCUDARuntime() {}
RValue CGCUDARuntime::EmitCUDAKernelCallExpr(CodeGenFunction &CGF,
const CUDAKernelCallExpr *E,
- ReturnValueSlot ReturnValue) {
+ ReturnValueSlot ReturnValue,
+ llvm::CallBase **CallOrInvoke) {
llvm::BasicBlock *ConfigOKBlock = CGF.createBasicBlock("kcall.configok");
llvm::BasicBlock *ContBlock = CGF.createBasicBlock("kcall.end");
@@ -35,7 +36,7 @@ RValue CGCUDARuntime::EmitCUDAKernelCallExpr(CodeGenFunction &CGF,
eval.begin(CGF);
CGF.EmitBlock(ConfigOKBlock);
- CGF.EmitSimpleCallExpr(E, ReturnValue);
+ CGF.EmitSimpleCallExpr(E, ReturnValue, CallOrInvoke);
CGF.EmitBranch(ContBlock);
CGF.EmitBlock(ContBlock);
diff --git a/clang/lib/CodeGen/CGCUDARuntime.h b/clang/lib/CodeGen/CGCUDARuntime.h
index 8030d632cc3d2..86f776004ee7c 100644
--- a/clang/lib/CodeGen/CGCUDARuntime.h
+++ b/clang/lib/CodeGen/CGCUDARuntime.h
@@ -21,6 +21,7 @@
#include "llvm/IR/GlobalValue.h"
namespace llvm {
+class CallBase;
class Function;
class GlobalVariable;
}
@@ -82,9 +83,10 @@ class CGCUDARuntime {
CGCUDARuntime(CodeGenModule &CGM) : CGM(CGM) {}
virtual ~CGCUDARuntime();
- virtual RValue EmitCUDAKernelCallExpr(CodeGenFunction &CGF,
- const CUDAKernelCallExpr *E,
- ReturnValueSlot ReturnValue);
+ virtual RValue
+ EmitCUDAKernelCallExpr(CodeGenFunction &CGF, const CUDAKernelCallExpr *E,
+ ReturnValueSlot ReturnValue,
+ llvm::CallBase **CallOrInvoke = nullptr);
/// Emits a kernel launch stub.
virtual void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args) = 0;
diff --git a/clang/lib/CodeGen/CGCXXABI.h b/clang/lib/CodeGen/CGCXXABI.h
index 7dcc539111996..687ff7fb84444 100644
--- a/clang/lib/CodeGen/CGCXXABI.h
+++ b/clang/lib/CodeGen/CGCXXABI.h
@@ -485,11 +485,11 @@ class CGCXXABI {
llvm::PointerUnion<const CXXDeleteExpr *, const CXXMemberCallExpr *>;
/// Emit the ABI-specific virtual destructor call.
- virtual llvm::Value *EmitVirtualDestructorCall(CodeGenFunction &CGF,
- const CXXDestructorDecl *Dtor,
- CXXDtorType DtorType,
- Address This,
- DeleteOrMemberCallExpr E) = 0;
+ virtual llvm::Value *
+ EmitVirtualDestructorCall(CodeGenFunction &CGF, const CXXDestructorDecl *Dtor,
+ CXXDtorType DtorType, Address This,
+ DeleteOrMemberCallExpr E,
+ llvm::CallBase **CallOrInvoke) = 0;
virtual void adjustCallArgsForDestructorThunk(CodeGenFunction &CGF,
GlobalDecl GD,
diff --git a/clang/lib/CodeGen/CGClass.cpp b/clang/lib/CodeGen/CGClass.cpp
index 0a595bb998d26..c56716fbd0590 100644
--- a/clang/lib/CodeGen/CGClass.cpp
+++ b/clang/lib/CodeGen/CGClass.cpp
@@ -2191,15 +2191,11 @@ static bool canEmitDelegateCallArgs(CodeGenFunction &CGF,
return true;
}
-void CodeGenFunction::EmitCXXConstructorCall(const CXXConstructorDecl *D,
- CXXCtorType Type,
- bool ForVirtualBase,
- bool Delegating,
- Address This,
- CallArgList &Args,
- AggValueSlot::Overlap_t Overlap,
- SourceLocation Loc,
- bool NewPointerIsChecked) {
+void CodeGenFunction::EmitCXXConstructorCall(
+ const CXXConstructorDecl *D, CXXCtorType Type, bool ForVirtualBase,
+ bool Delegating, Address This, CallArgList &Args,
+ AggValueSlot::Overlap_t Overlap, SourceLocation Loc,
+ bool NewPointerIsChecked, llvm::CallBase **CallOrInvoke) {
const CXXRecordDecl *ClassDecl = D->getParent();
if (!NewPointerIsChecked)
@@ -2247,7 +2243,7 @@ void CodeGenFunction::EmitCXXConstructorCall(const CXXConstructorDecl *D,
const CGFunctionInfo &Info = CGM.getTypes().arrangeCXXConstructorCall(
Args, D, Type, ExtraArgs.Prefix, ExtraArgs.Suffix, PassPrototypeArgs);
CGCallee Callee = CGCallee::forDirect(CalleePtr, GlobalDecl(D, Type));
- EmitCall(Info, Callee, ReturnValueSlot(), Args, nullptr, false, Loc);
+ EmitCall(Info, Callee, ReturnValueSlot(), Args, CallOrInvoke, false, Loc);
// Generate vtable assumptions if we're constructing a complete object
// with a vtable. We don't do this for base subobjects for two reasons:
diff --git a/clang/lib/CodeGen/CGCoroutine.cpp b/clang/lib/CodeGen/CGCoroutine.cpp
index a8a70186c2c5a..feeb152a1fcb5 100644
--- a/clang/lib/CodeGen/CGCoroutine.cpp
+++ b/clang/lib/CodeGen/CGCoroutine.cpp
@@ -12,9 +12,11 @@
#include "CGCleanup.h"
#include "CodeGenFunction.h"
-#include "llvm/ADT/ScopeExit.h"
+#include "clang/AST/ExprCXX.h"
#include "clang/AST/StmtCXX.h"
#include "clang/AST/StmtVisitor.h"
+#include "llvm/ADT/ScopeExit.h"
+#include "llvm/IR/Intrinsics.h"
using namespace clang;
using namespace CodeGen;
@@ -223,12 +225,22 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
CoroutineSuspendExpr const &S,
AwaitKind Kind, AggValueSlot aggSlot,
bool ignoreResult, bool forLValue) {
- auto *E = S.getCommonExpr();
+ auto &Builder = CGF.Builder;
- auto CommonBinder =
- CodeGenFunction::OpaqueValueMappingData::bind(CGF, S.getOpaqueValue(), E);
- auto UnbindCommonOnExit =
- llvm::make_scope_exit([&] { CommonBinder.unbind(CGF); });
+ // If S.getInplaceCallOpaqueValue() is null, we don't have a nested opaque
+ // value for common expression.
+ std::optional<CodeGenFunction::OpaqueValueMapping> OperandMapping;
+ if (auto *CallOV = S.getInplaceCallOpaqueValue()) {
+ auto *CE = cast<CallExpr>(CallOV->getSourceExpr());
+ llvm::CallBase *CallOrInvoke = nullptr;
+ LValue CallResult = CGF.EmitCallExprLValue(CE, &CallOrInvoke);
+ if (CallOrInvoke)
+ CallOrInvoke->addAnnotationMetadata("coro_must_elide");
+
+ OperandMapping.emplace(CGF, CallOV, CallResult);
+ }
+ CodeGenFunction::OpaqueValueMapping BindCommon(CGF,
+ S.getCommonExprOpaqueValue());
auto Prefix = buildSuspendPrefixStr(Coro, Kind);
BasicBlock *ReadyBlock = CGF.createBasicBlock(Prefix + Twine(".ready"));
@@ -241,7 +253,6 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
// Otherwise, emit suspend logic.
CGF.EmitBlock(SuspendBlock);
- auto &Builder = CGF.Builder;
llvm::Function *CoroSave = CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_save);
auto *NullPtr = llvm::ConstantPointerNull::get(CGF.CGM.Int8PtrTy);
auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr});
@@ -256,7 +267,8 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
SmallVector<llvm::Value *, 3> SuspendIntrinsicCallArgs;
SuspendIntrinsicCallArgs.push_back(
- CGF.getOrCreateOpaqueLValueMapping(S.getOpaqueValue()).getPointer(CGF));
+ CGF.getOrCreateOpaqueLValueMapping(S.getCommonExprOpaqueValue())
+ .getPointer(CGF));
SuspendIntrinsicCallArgs.push_back(CGF.CurCoro.Data->CoroBegin);
SuspendIntrinsicCallArgs.push_back(SuspendWrapper);
@@ -455,7 +467,7 @@ CodeGenFunction::generateAwaitSuspendWrapper(Twine const &CoroName,
Builder.CreateLoad(GetAddrOfLocalVar(&FrameDecl));
auto AwaiterBinder = CodeGenFunction::OpaqueValueMappingData::bind(
- *this, S.getOpaqueValue(), AwaiterLValue);
+ *this, S.getCommonExprOpaqueValue(), AwaiterLValue);
auto *SuspendRet = EmitScalarExpr(S.getSuspendExpr());
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index af1e1a25d1d8b..0ca2f5bbe823b 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -5437,16 +5437,17 @@ RValue CodeGenFunction::EmitRValueForField(LValue LV,
//===--------------------------------------------------------------------===//
RValue CodeGenFunction::EmitCallExpr(const CallExpr *E,
- ReturnValueSlot ReturnValue) {
+ ReturnValueSlot ReturnValue,
+ llvm::CallBase **CallOrInvoke) {
// Builtins never have block type.
if (E->getCallee()->getType()->isBlockPointerType())
- return EmitBlockCallExpr(E, ReturnValue);
+ return EmitBlockCallExpr(E, ReturnValue, CallOrInvoke);
if (const auto *CE = dyn_cast<CXXMemberCallExpr>(E))
- return EmitCXXMemberCallExpr(CE, ReturnValue);
+ return EmitCXXMemberCallExpr(CE, ReturnValue, CallOrInvoke);
if (const auto *CE = dyn_cast<CUDAKernelCallExpr>(E))
- return EmitCUDAKernelCallExpr(CE, ReturnValue);
+ return EmitCUDAKernelCallExpr(CE, ReturnValue, CallOrInvoke);
// A CXXOperatorCallExpr is created even for explicit object methods, but
// these should be treated like static function call.
@@ -5454,7 +5455,7 @@ RValue CodeGenFunction::EmitCallExpr(const CallExpr *E,
if (const auto *MD =
dyn_cast_if_present<CXXMethodDecl>(CE->getCalleeDecl());
MD && MD->isImplicitObjectMemberFunction())
- return EmitCXXOperatorMemberCallExpr(CE, MD, ReturnValue);
+ return EmitCXXOperatorMemberCallExpr(CE, MD, ReturnValue, CallOrInvoke);
CGCallee callee = EmitCallee(E->getCallee());
@@ -5467,14 +5468,17 @@ RValue CodeGenFunction::EmitCallExpr(const CallExpr *E,
return EmitCXXPseudoDestructorExpr(callee.getPseudoDestructorExpr());
}
- return EmitCall(E->getCallee()->getType(), callee, E, ReturnValue);
+ return EmitCall(E->getCallee()->getType(), callee, E, ReturnValue,
+ /*Chain=*/nullptr, CallOrInvoke);
}
/// Emit a CallExpr without considering whether it might be a subclass.
RValue CodeGenFunction::EmitSimpleCallExpr(const CallExpr *E,
- ReturnValueSlot ReturnValue) {
+ ReturnValueSlot ReturnValue,
+ llvm::CallBase **CallOrInvoke) {
CGCallee Callee = EmitCallee(E->getCallee());
- return EmitCall(E->getCallee()->getType(), Callee, E, ReturnValue);
+ return EmitCall(E->getCallee()->getType(), Callee, E, ReturnValue,
+ /*Chain=*/nullptr, CallOrInvoke);
}
// Detect the unusual situation where an inline version is shadowed by a
@@ -5678,8 +5682,9 @@ LValue CodeGenFunction::EmitBinaryOperatorLValue(const BinaryOperator *E) {
llvm_unreachable("bad evaluation kind");
}
-LValue CodeGenFunction::EmitCallExprLValue(const CallExpr *E) {
- RValue RV = EmitCallExpr(E);
+LValue CodeGenFunction::EmitCallExprLValue(const CallExpr *E,
+ llvm::CallBase **CallOrInvoke) {
+ RValue RV = EmitCallExpr(E, ReturnValueSlot(), CallOrInvoke);
if (!RV.isScalar())
return MakeAddrLValue(RV.getAggregateAddress(), E->getType(),
@@ -5802,9 +5807,11 @@ LValue CodeGenFunction::EmitStmtExprLValue(const StmtExpr *E) {
AlignmentSource::Decl);
}
-RValue CodeGenFunction::EmitCall(QualType CalleeType, const CGCallee &OrigCallee,
- const CallExpr *E, ReturnValueSlot ReturnValue,
- llvm::Value *Chain) {
+RValue CodeGenFunction::EmitCall(QualType CalleeType,
+ const CGCallee &OrigCallee, const CallExpr *E,
+ ReturnValueSlot ReturnValue,
+ llvm::Value *Chain,
+ llvm::CallBase **CallOrInvoke) {
// Get the actual function type. The callee type will always be a pointer to
// function type or a block pointer type.
assert(CalleeType->isFunctionPointerType() &&
@@ -6015,8 +6022,8 @@ RValue CodeGenFunction::EmitCall(QualType CalleeType, const CGCallee &OrigCallee
Address(Handle, Handle->getType(), CGM.getPointerAlign()));
Callee.setFunctionPointer(Stub);
}
- llvm::CallBase *CallOrInvoke = nullptr;
- RValue Call = EmitCall(FnInfo, Callee, ReturnValue, Args, &CallOrInvoke,
+ llvm::CallBase *LocalCallOrInvoke = nullptr;
+ RValue Call = EmitCall(FnInfo, Callee, ReturnValue, Args, &LocalCallOrInvoke,
E == MustTailCall, E->getExprLoc());
// Generate function declaration DISuprogram in order to be used
@@ -6025,11 +6032,13 @@ RValue CodeGenFunction::EmitCall(QualType CalleeType, const CGCallee &OrigCallee
if (auto *CalleeDecl = dyn_cast_or_null<FunctionDecl>(TargetDecl)) {
FunctionArgList Args;
QualType ResTy = BuildFunctionArgList(CalleeDecl, Args);
- DI->EmitFuncDeclForCallSite(CallOrInvoke,
+ DI->EmitFuncDeclForCallSite(LocalCallOrInvoke,
DI->getFunctionType(CalleeDecl, ResTy, Args),
CalleeDecl);
}
}
+ if (CallOrInvoke)
+ *CallOrInvoke = LocalCallOrInvoke;
return Call;
}
diff --git a/clang/lib/CodeGen/CGExprCXX.cpp b/clang/lib/CodeGen/CGExprCXX.cpp
index 8eb6ab7381acb..1214bb054fb8d 100644
--- a/clang/lib/CodeGen/CGExprCXX.cpp
+++ b/clang/lib/CodeGen/CGExprCXX.cpp
@@ -84,23 +84,24 @@ commonEmitCXXMemberOrOperatorCall(CodeGenFunction &CGF, GlobalDecl GD,
RValue CodeGenFunction::EmitCXXMemberOrOperatorCall(
const CXXMethodDecl *MD, const CGCallee &Callee,
- Return...
[truncated]
|
70f7fd9
to
deb6a63
Compare
deb6a63
to
66f9d41
Compare
Every time we change the coroutine semantics in LLVM IR, https://llvm.org/docs/Coroutines.html . It is also helpful for the reviewers to understand what you're doing. And also it will be helpful for reviewers to provide a high level design for this and a lower level introduction for what this patch does too. |
Some quick feedbacks:
|
I also think Attributes are neater if they work. Instructions don't have attributes but CallBase does. I tried to add Attributes on the CallBase but however the |
Implement noalloc copy add CoroAnnotationElidePass
8d229c6
to
12420f2
Compare
@ChuanqiXu9, I have changed to use Attributes now. |
12420f2
to
f9877f2
Compare
f9877f2
to
8aca8e3
Compare
It looks like the diff has some problems (it contains clang's change). And how do you think about the suggestion to split the middle end patch? |
I was trying to find a stack review solution. See https://reviewstack.dev/llvm/llvm-project/pull/98974 You can compare between commits on this UI. What's the stacked PRs approach you were talking about? |
For example, you have 2 patches A and B, and B dependent on A. Then you can send the patch A to |
Can do that sure. I will send out new stacked PRs. |
This patch is the middle end implementation for the coroutine HALO improvement project published on discourse: https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044/7
Previously CoroElide depends on inlining, and its analysis does not work very well with code generated by the C++ frontend due the existence of many customization points. There has been issue reported to upstream how ineffective the original CoroElide was in real world applications.
For C++ users, this set of patches aim to fix this problem by providing library authors and users deterministic HALO behaviour for some well-behaved coroutine
Task
types. The stack begins with a library side attribute on theTask
class that guarantees no unstructured concurrency when coroutines are awaited directly withco_await
ed as a prvalue. This attribute on Task types gives us lifetime guarantees and makes C++ FE capable to telling the ME which coroutine calls are elidable. We convey such information from FE through the attributecoro_must_elide
.This patch modifies CoroSplit to create a variant of the coroutine ramp function that 1) does not use heap allocated frame, instead take an additional parameter as the pointer to the frame. Such parameter is attributed with
dereferenceble
andalign
to convey size and align requirements for the frame. 2) always stores cleanup instead of destroy address forcoro.destroy()
actions.Additionally, we have a new pass that runs right after CoroSplit to find usages of the callee coroutine attributed
coro_must_elide
in presplit coroutine callers, allocates the frame on its "stack", transform those usages to call thenoalloc
ramp function variant.(note I put quotes on the word "stack" here, because for presplit coroutine, any alloca will be spilled into the frame when it's being split)
The C++ Frontend attribute implementation that works with this change can be found at #98971