Skip to content

[Clang] C++20 Coroutines: Introduce Frontend Attribute [[clang::coro_await_elidable]] #99282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 9, 2024

Conversation

yuxuanchen1997
Copy link
Member

@yuxuanchen1997 yuxuanchen1997 commented Jul 17, 2024

This patch is the frontend implementation of the coroutine elide improvement project detailed in this discourse post: https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044

This patch proposes a C++ struct/class attribute [[clang::coro_await_elidable]]. This notion of await elidable task gives developers and library authors a certainty that coroutine heap elision happens in a predictable way.

Originally, after we lower a coroutine to LLVM IR, CoroElide is responsible for analysis of whether an elision can happen. Take this as an example:

Task foo();
Task bar() {
  co_await foo();
}

For CoroElide to happen, the ramp function of foo must be inlined into bar. This inlining happens after foo has been split but bar is usually still a presplit coroutine. If foo is indeed a coroutine, the inlined coro.id intrinsics of foo is visible within bar. CoroElide then runs an analysis to figure out whether the SSA value of coro.begin() of foo gets destroyed before bar terminates.

Task types are rarely simple enough for the destroy logic of the task to reference the SSA value from coro.begin() directly. Hence, the pass is very ineffective for even the most trivial C++ Task types. Improving CoroElide by implementing more powerful analyses is possible, however it doesn't give us the predictability when we expect elision to happen.

The approach we want to take with this language extension generally originates from the philosophy that library implementations of Task types has the control over the structured concurrency guarantees we demand for elision to happen. That is, the lifetime for the callee's frame is shorter to that of the caller.

The [[clang::coro_await_elidable]] is a class attribute which can be applied to a coroutine return type.

When a coroutine function that returns such a type calls another coroutine function, the compiler performs heap allocation elision when the following conditions are all met:

  • callee coroutine function returns a type that is annotated with [[clang::coro_await_elidable]].
  • In caller coroutine, the return value of the callee is a prvalue that is immediately co_awaited.

From the C++ perspective, it makes sense because we can ensure the lifetime of elided callee cannot exceed that of the caller if we can guarantee that the caller coroutine is never destroyed earlier than the callee coroutine. This is not generally true for any C++ programs. However, the library that implements Task types and executors may provide this guarantee to the compiler, providing the user with certainty that HALO will work on their programs.

After this patch, when compiling coroutines that return a type with such attribute, the frontend checks that the type of the operand of co_await expressions (not operator co_await). If it's also attributed with [[clang::coro_await_elidable]], the FE emits metadata on the call or invoke instruction as a hint for a later middle end pass to elide the elision.

The original patch version is #94693 and as suggested, the patch is split into frontend and middle end solutions into stacked PRs.

The middle end CoroSplit patch can be found at #99283
The middle end transformation that performs the elide can be found at #99285

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang:codegen IR generation bugs: mangling, exceptions, etc. coroutines C++20 coroutines llvm:ir llvm:transforms labels Jul 17, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 17, 2024

@llvm/pr-subscribers-coroutines
@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-clang-modules

Author: Yuxuan Chen (yuxuanchen1997)

Changes

This patch is the frontend implementation of the coroutine elide improvement project detailed in this discourse post: https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044

This patch proposes a C++ struct/class attribute [[clang::coro_inplace_task]]. This notion of inplace task gives developers and library authors a certainty that coroutine heap elision happens in a predictable way.

Originally, after we lower a coroutine to LLVM IR, CoroElide is responsible for analysis of whether an elision can happen. Take this as an example:

Task foo();
Task bar() {
  co_await foo();
}

For CoroElide to happen, the ramp function of foo must be inlined into bar. This inlining happens after foo has been split but bar is usually still a presplit coroutine. If foo is indeed a coroutine, the inlined coro.id intrinsics of foo is visible within bar. CoroElide then runs an analysis to figure out whether the SSA value of coro.begin() of foo gets destroyed before bar terminates.

Task types are rarely simple enough for the destroy logic of the task to reference the SSA value from coro.begin() directly. Hence, the pass is very ineffective for even the most trivial C++ Task types. Improving CoroElide by implementing more powerful analyses is possible, however it doesn't give us the predictability when we expect elision to happen.

The approach we want to take with this language extension generally originates from the philosophy that library implementations of Task types has the control over the structured concurrency guarantees we demand for elision to happen. That is, the lifetime for the callee's frame is shorter to that of the caller.

The [[clang::coro_inplace_task]] is a class attribute which can be applied to a coroutine return type.

When a coroutine function that returns such a type calls another coroutine function, the compiler performs heap allocation elision when the following conditions are all met:

  • callee coroutine function returns a type that is annotated with [[clang::coro_inplace_task]].
  • In caller coroutine, the return value of the callee is a prvalue that is immediately co_awaited.

From the C++ perspective, it makes sense because we can ensure the lifetime of elided callee cannot exceed that of the caller if we can guarantee that the caller coroutine is never destroyed earlier than the callee coroutine. This is not generally true for any C++ programs. However, the library that implements Task types and executors may provide this guarantee to the compiler, providing the user with certainty that HALO will work on their programs.

After this patch, when compiling coroutines that return a type with such attribute, the frontend checks that the type of the operand of co_await expressions (not operator co_await). If it's also attributed with [[clang::coro_inplace_task]], the FE emits metadata on the call or invoke instruction as a hint for a later middle end pass to elide the elision.

The original patch version is #94693 and as suggested, the patch is split into frontend and middle end solutions into stacked PRs.

The middle end CoroElide patch can be found at <PR>.
The middle end transformation that performs the elide can be found at <PR>.


Patch is 52.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/99282.diff

24 Files Affected:

  • (modified) clang/include/clang/AST/ExprCXX.h (+18-8)
  • (modified) clang/include/clang/Basic/Attr.td (+8)
  • (modified) clang/include/clang/Basic/AttrDocs.td (+19)
  • (modified) clang/lib/CodeGen/CGBlocks.cpp (+3-2)
  • (modified) clang/lib/CodeGen/CGCUDARuntime.cpp (+3-2)
  • (modified) clang/lib/CodeGen/CGCUDARuntime.h (+5-3)
  • (modified) clang/lib/CodeGen/CGCXXABI.h (+5-5)
  • (modified) clang/lib/CodeGen/CGClass.cpp (+6-10)
  • (modified) clang/lib/CodeGen/CGCoroutine.cpp (+21-9)
  • (modified) clang/lib/CodeGen/CGExpr.cpp (+25-16)
  • (modified) clang/lib/CodeGen/CGExprCXX.cpp (+33-27)
  • (modified) clang/lib/CodeGen/CodeGenFunction.h (+37-27)
  • (modified) clang/lib/CodeGen/ItaniumCXXABI.cpp (+9-7)
  • (modified) clang/lib/CodeGen/MicrosoftCXXABI.cpp (+10-8)
  • (modified) clang/lib/Sema/SemaCoroutine.cpp (+56-2)
  • (modified) clang/lib/Serialization/ASTReaderStmt.cpp (+8-2)
  • (modified) clang/lib/Serialization/ASTWriterStmt.cpp (+2-1)
  • (added) clang/test/CodeGenCoroutines/Inputs/utility.h (+13)
  • (added) clang/test/CodeGenCoroutines/coro-must-elide.cpp (+87)
  • (modified) clang/test/Misc/pragma-attribute-supported-attributes-list.test (+1)
  • (modified) llvm/include/llvm/Bitcode/LLVMBitCodes.h (+1)
  • (modified) llvm/include/llvm/IR/Attributes.td (+3)
  • (modified) llvm/lib/Bitcode/Writer/BitcodeWriter.cpp (+2)
  • (modified) llvm/lib/Transforms/Utils/CodeExtractor.cpp (+1)
diff --git a/clang/include/clang/AST/ExprCXX.h b/clang/include/clang/AST/ExprCXX.h
index c2feac525c1ea..0cf62aee41b66 100644
--- a/clang/include/clang/AST/ExprCXX.h
+++ b/clang/include/clang/AST/ExprCXX.h
@@ -5082,7 +5082,8 @@ class CoroutineSuspendExpr : public Expr {
   enum SubExpr { Operand, Common, Ready, Suspend, Resume, Count };
 
   Stmt *SubExprs[SubExpr::Count];
-  OpaqueValueExpr *OpaqueValue = nullptr;
+  OpaqueValueExpr *CommonExprOpaqueValue = nullptr;
+  OpaqueValueExpr *InplaceCallOpaqueValue = nullptr;
 
 public:
   // These types correspond to the three C++ 'await_suspend' return variants
@@ -5090,10 +5091,10 @@ class CoroutineSuspendExpr : public Expr {
 
   CoroutineSuspendExpr(StmtClass SC, SourceLocation KeywordLoc, Expr *Operand,
                        Expr *Common, Expr *Ready, Expr *Suspend, Expr *Resume,
-                       OpaqueValueExpr *OpaqueValue)
+                       OpaqueValueExpr *CommonExprOpaqueValue)
       : Expr(SC, Resume->getType(), Resume->getValueKind(),
              Resume->getObjectKind()),
-        KeywordLoc(KeywordLoc), OpaqueValue(OpaqueValue) {
+        KeywordLoc(KeywordLoc), CommonExprOpaqueValue(CommonExprOpaqueValue) {
     SubExprs[SubExpr::Operand] = Operand;
     SubExprs[SubExpr::Common] = Common;
     SubExprs[SubExpr::Ready] = Ready;
@@ -5128,7 +5129,16 @@ class CoroutineSuspendExpr : public Expr {
   }
 
   /// getOpaqueValue - Return the opaque value placeholder.
-  OpaqueValueExpr *getOpaqueValue() const { return OpaqueValue; }
+  OpaqueValueExpr *getCommonExprOpaqueValue() const {
+    return CommonExprOpaqueValue;
+  }
+
+  OpaqueValueExpr *getInplaceCallOpaqueValue() const {
+    return InplaceCallOpaqueValue;
+  }
+  void setInplaceCallOpaqueValue(OpaqueValueExpr *E) {
+    InplaceCallOpaqueValue = E;
+  }
 
   Expr *getReadyExpr() const {
     return static_cast<Expr*>(SubExprs[SubExpr::Ready]);
@@ -5194,9 +5204,9 @@ class CoawaitExpr : public CoroutineSuspendExpr {
 public:
   CoawaitExpr(SourceLocation CoawaitLoc, Expr *Operand, Expr *Common,
               Expr *Ready, Expr *Suspend, Expr *Resume,
-              OpaqueValueExpr *OpaqueValue, bool IsImplicit = false)
+              OpaqueValueExpr *CommonExprOpaqueValue, bool IsImplicit = false)
       : CoroutineSuspendExpr(CoawaitExprClass, CoawaitLoc, Operand, Common,
-                             Ready, Suspend, Resume, OpaqueValue) {
+                             Ready, Suspend, Resume, CommonExprOpaqueValue) {
     CoawaitBits.IsImplicit = IsImplicit;
   }
 
@@ -5275,9 +5285,9 @@ class CoyieldExpr : public CoroutineSuspendExpr {
 public:
   CoyieldExpr(SourceLocation CoyieldLoc, Expr *Operand, Expr *Common,
               Expr *Ready, Expr *Suspend, Expr *Resume,
-              OpaqueValueExpr *OpaqueValue)
+              OpaqueValueExpr *CommonExprOpaqueValue)
       : CoroutineSuspendExpr(CoyieldExprClass, CoyieldLoc, Operand, Common,
-                             Ready, Suspend, Resume, OpaqueValue) {}
+                             Ready, Suspend, Resume, CommonExprOpaqueValue) {}
   CoyieldExpr(SourceLocation CoyieldLoc, QualType Ty, Expr *Operand,
               Expr *Common)
       : CoroutineSuspendExpr(CoyieldExprClass, CoyieldLoc, Ty, Operand,
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 1293d0ddbc117..e482c9daf9fb3 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1217,6 +1217,14 @@ def CoroDisableLifetimeBound : InheritableAttr {
   let SimpleHandler = 1;
 }
 
+def CoroInplaceTask : InheritableAttr {
+  let Spellings = [Clang<"coro_inplace_task">];
+  let Subjects = SubjectList<[CXXRecord]>;
+  let LangOpts = [CPlusPlus];
+  let Documentation = [CoroInplaceTaskDoc];
+  let SimpleHandler = 1;
+}
+
 // OSObject-based attributes.
 def OSConsumed : InheritableParamAttr {
   let Spellings = [Clang<"os_consumed">];
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 09cf4f80bd999..21d59dedec578 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -8108,6 +8108,25 @@ but do not pass them to the underlying coroutine or pass them by value.
 }];
 }
 
+def CoroInplaceTaskDoc : Documentation {
+  let Category = DocCatDecl;
+  let Content = [{
+The ``[[clang::coro_inplace_task]]`` is a class attribute which can be applied
+to a coroutine return type.
+
+When a coroutine function that returns such a type calls another coroutine function,
+the compiler performs heap allocation elision when the following conditions are all met:
+- callee coroutine function returns a type that is annotated with ``[[clang::coro_inplace_task]]``.
+- The callee coroutine function is inlined.
+- In caller coroutine, the return value of the callee is a prvalue or an xvalue, and
+- The temporary expression containing the callee coroutine object is immediately co_awaited.
+
+The behavior is undefined if any of the following condition was met:
+- the caller coroutine is destroyed earlier than the callee coroutine.
+
+  }];
+}
+
 def CountedByDocs : Documentation {
   let Category = DocCatField;
   let Content = [{
diff --git a/clang/lib/CodeGen/CGBlocks.cpp b/clang/lib/CodeGen/CGBlocks.cpp
index 066139b1c78c7..684fda7440731 100644
--- a/clang/lib/CodeGen/CGBlocks.cpp
+++ b/clang/lib/CodeGen/CGBlocks.cpp
@@ -1163,7 +1163,8 @@ llvm::Type *CodeGenModule::getGenericBlockLiteralType() {
 }
 
 RValue CodeGenFunction::EmitBlockCallExpr(const CallExpr *E,
-                                          ReturnValueSlot ReturnValue) {
+                                          ReturnValueSlot ReturnValue,
+                                          llvm::CallBase **CallOrInvoke) {
   const auto *BPT = E->getCallee()->getType()->castAs<BlockPointerType>();
   llvm::Value *BlockPtr = EmitScalarExpr(E->getCallee());
   llvm::Type *GenBlockTy = CGM.getGenericBlockLiteralType();
@@ -1220,7 +1221,7 @@ RValue CodeGenFunction::EmitBlockCallExpr(const CallExpr *E,
   CGCallee Callee(CGCalleeInfo(), Func);
 
   // And call the block.
-  return EmitCall(FnInfo, Callee, ReturnValue, Args);
+  return EmitCall(FnInfo, Callee, ReturnValue, Args, CallOrInvoke);
 }
 
 Address CodeGenFunction::GetAddrOfBlockDecl(const VarDecl *variable) {
diff --git a/clang/lib/CodeGen/CGCUDARuntime.cpp b/clang/lib/CodeGen/CGCUDARuntime.cpp
index c14a9d3f2bbbc..1e1da1e2411a7 100644
--- a/clang/lib/CodeGen/CGCUDARuntime.cpp
+++ b/clang/lib/CodeGen/CGCUDARuntime.cpp
@@ -25,7 +25,8 @@ CGCUDARuntime::~CGCUDARuntime() {}
 
 RValue CGCUDARuntime::EmitCUDAKernelCallExpr(CodeGenFunction &CGF,
                                              const CUDAKernelCallExpr *E,
-                                             ReturnValueSlot ReturnValue) {
+                                             ReturnValueSlot ReturnValue,
+                                             llvm::CallBase **CallOrInvoke) {
   llvm::BasicBlock *ConfigOKBlock = CGF.createBasicBlock("kcall.configok");
   llvm::BasicBlock *ContBlock = CGF.createBasicBlock("kcall.end");
 
@@ -35,7 +36,7 @@ RValue CGCUDARuntime::EmitCUDAKernelCallExpr(CodeGenFunction &CGF,
 
   eval.begin(CGF);
   CGF.EmitBlock(ConfigOKBlock);
-  CGF.EmitSimpleCallExpr(E, ReturnValue);
+  CGF.EmitSimpleCallExpr(E, ReturnValue, CallOrInvoke);
   CGF.EmitBranch(ContBlock);
 
   CGF.EmitBlock(ContBlock);
diff --git a/clang/lib/CodeGen/CGCUDARuntime.h b/clang/lib/CodeGen/CGCUDARuntime.h
index 8030d632cc3d2..86f776004ee7c 100644
--- a/clang/lib/CodeGen/CGCUDARuntime.h
+++ b/clang/lib/CodeGen/CGCUDARuntime.h
@@ -21,6 +21,7 @@
 #include "llvm/IR/GlobalValue.h"
 
 namespace llvm {
+class CallBase;
 class Function;
 class GlobalVariable;
 }
@@ -82,9 +83,10 @@ class CGCUDARuntime {
   CGCUDARuntime(CodeGenModule &CGM) : CGM(CGM) {}
   virtual ~CGCUDARuntime();
 
-  virtual RValue EmitCUDAKernelCallExpr(CodeGenFunction &CGF,
-                                        const CUDAKernelCallExpr *E,
-                                        ReturnValueSlot ReturnValue);
+  virtual RValue
+  EmitCUDAKernelCallExpr(CodeGenFunction &CGF, const CUDAKernelCallExpr *E,
+                         ReturnValueSlot ReturnValue,
+                         llvm::CallBase **CallOrInvoke = nullptr);
 
   /// Emits a kernel launch stub.
   virtual void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args) = 0;
diff --git a/clang/lib/CodeGen/CGCXXABI.h b/clang/lib/CodeGen/CGCXXABI.h
index 7dcc539111996..687ff7fb84444 100644
--- a/clang/lib/CodeGen/CGCXXABI.h
+++ b/clang/lib/CodeGen/CGCXXABI.h
@@ -485,11 +485,11 @@ class CGCXXABI {
       llvm::PointerUnion<const CXXDeleteExpr *, const CXXMemberCallExpr *>;
 
   /// Emit the ABI-specific virtual destructor call.
-  virtual llvm::Value *EmitVirtualDestructorCall(CodeGenFunction &CGF,
-                                                 const CXXDestructorDecl *Dtor,
-                                                 CXXDtorType DtorType,
-                                                 Address This,
-                                                 DeleteOrMemberCallExpr E) = 0;
+  virtual llvm::Value *
+  EmitVirtualDestructorCall(CodeGenFunction &CGF, const CXXDestructorDecl *Dtor,
+                            CXXDtorType DtorType, Address This,
+                            DeleteOrMemberCallExpr E,
+                            llvm::CallBase **CallOrInvoke) = 0;
 
   virtual void adjustCallArgsForDestructorThunk(CodeGenFunction &CGF,
                                                 GlobalDecl GD,
diff --git a/clang/lib/CodeGen/CGClass.cpp b/clang/lib/CodeGen/CGClass.cpp
index 0a595bb998d26..c56716fbd0590 100644
--- a/clang/lib/CodeGen/CGClass.cpp
+++ b/clang/lib/CodeGen/CGClass.cpp
@@ -2191,15 +2191,11 @@ static bool canEmitDelegateCallArgs(CodeGenFunction &CGF,
   return true;
 }
 
-void CodeGenFunction::EmitCXXConstructorCall(const CXXConstructorDecl *D,
-                                             CXXCtorType Type,
-                                             bool ForVirtualBase,
-                                             bool Delegating,
-                                             Address This,
-                                             CallArgList &Args,
-                                             AggValueSlot::Overlap_t Overlap,
-                                             SourceLocation Loc,
-                                             bool NewPointerIsChecked) {
+void CodeGenFunction::EmitCXXConstructorCall(
+    const CXXConstructorDecl *D, CXXCtorType Type, bool ForVirtualBase,
+    bool Delegating, Address This, CallArgList &Args,
+    AggValueSlot::Overlap_t Overlap, SourceLocation Loc,
+    bool NewPointerIsChecked, llvm::CallBase **CallOrInvoke) {
   const CXXRecordDecl *ClassDecl = D->getParent();
 
   if (!NewPointerIsChecked)
@@ -2247,7 +2243,7 @@ void CodeGenFunction::EmitCXXConstructorCall(const CXXConstructorDecl *D,
   const CGFunctionInfo &Info = CGM.getTypes().arrangeCXXConstructorCall(
       Args, D, Type, ExtraArgs.Prefix, ExtraArgs.Suffix, PassPrototypeArgs);
   CGCallee Callee = CGCallee::forDirect(CalleePtr, GlobalDecl(D, Type));
-  EmitCall(Info, Callee, ReturnValueSlot(), Args, nullptr, false, Loc);
+  EmitCall(Info, Callee, ReturnValueSlot(), Args, CallOrInvoke, false, Loc);
 
   // Generate vtable assumptions if we're constructing a complete object
   // with a vtable.  We don't do this for base subobjects for two reasons:
diff --git a/clang/lib/CodeGen/CGCoroutine.cpp b/clang/lib/CodeGen/CGCoroutine.cpp
index a8a70186c2c5a..f04287b871029 100644
--- a/clang/lib/CodeGen/CGCoroutine.cpp
+++ b/clang/lib/CodeGen/CGCoroutine.cpp
@@ -12,9 +12,11 @@
 
 #include "CGCleanup.h"
 #include "CodeGenFunction.h"
-#include "llvm/ADT/ScopeExit.h"
+#include "clang/AST/ExprCXX.h"
 #include "clang/AST/StmtCXX.h"
 #include "clang/AST/StmtVisitor.h"
+#include "llvm/ADT/ScopeExit.h"
+#include "llvm/IR/Intrinsics.h"
 
 using namespace clang;
 using namespace CodeGen;
@@ -223,12 +225,22 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
                                     CoroutineSuspendExpr const &S,
                                     AwaitKind Kind, AggValueSlot aggSlot,
                                     bool ignoreResult, bool forLValue) {
-  auto *E = S.getCommonExpr();
+  auto &Builder = CGF.Builder;
 
-  auto CommonBinder =
-      CodeGenFunction::OpaqueValueMappingData::bind(CGF, S.getOpaqueValue(), E);
-  auto UnbindCommonOnExit =
-      llvm::make_scope_exit([&] { CommonBinder.unbind(CGF); });
+  // If S.getInplaceCallOpaqueValue() is null, we don't have a nested opaque
+  // value for common expression.
+  std::optional<CodeGenFunction::OpaqueValueMapping> OperandMapping;
+  if (auto *CallOV = S.getInplaceCallOpaqueValue()) {
+    auto *CE = cast<CallExpr>(CallOV->getSourceExpr());
+    llvm::CallBase *CallOrInvoke = nullptr;
+    LValue CallResult = CGF.EmitCallExprLValue(CE, &CallOrInvoke);
+    if (CallOrInvoke)
+      CallOrInvoke->addFnAttr(llvm::Attribute::CoroMustElide);
+
+    OperandMapping.emplace(CGF, CallOV, CallResult);
+  }
+  CodeGenFunction::OpaqueValueMapping BindCommon(CGF,
+                                                 S.getCommonExprOpaqueValue());
 
   auto Prefix = buildSuspendPrefixStr(Coro, Kind);
   BasicBlock *ReadyBlock = CGF.createBasicBlock(Prefix + Twine(".ready"));
@@ -241,7 +253,6 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
   // Otherwise, emit suspend logic.
   CGF.EmitBlock(SuspendBlock);
 
-  auto &Builder = CGF.Builder;
   llvm::Function *CoroSave = CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_save);
   auto *NullPtr = llvm::ConstantPointerNull::get(CGF.CGM.Int8PtrTy);
   auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr});
@@ -256,7 +267,8 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
 
   SmallVector<llvm::Value *, 3> SuspendIntrinsicCallArgs;
   SuspendIntrinsicCallArgs.push_back(
-      CGF.getOrCreateOpaqueLValueMapping(S.getOpaqueValue()).getPointer(CGF));
+      CGF.getOrCreateOpaqueLValueMapping(S.getCommonExprOpaqueValue())
+          .getPointer(CGF));
 
   SuspendIntrinsicCallArgs.push_back(CGF.CurCoro.Data->CoroBegin);
   SuspendIntrinsicCallArgs.push_back(SuspendWrapper);
@@ -455,7 +467,7 @@ CodeGenFunction::generateAwaitSuspendWrapper(Twine const &CoroName,
       Builder.CreateLoad(GetAddrOfLocalVar(&FrameDecl));
 
   auto AwaiterBinder = CodeGenFunction::OpaqueValueMappingData::bind(
-      *this, S.getOpaqueValue(), AwaiterLValue);
+      *this, S.getCommonExprOpaqueValue(), AwaiterLValue);
 
   auto *SuspendRet = EmitScalarExpr(S.getSuspendExpr());
 
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index af1e1a25d1d8b..0ca2f5bbe823b 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -5437,16 +5437,17 @@ RValue CodeGenFunction::EmitRValueForField(LValue LV,
 //===--------------------------------------------------------------------===//
 
 RValue CodeGenFunction::EmitCallExpr(const CallExpr *E,
-                                     ReturnValueSlot ReturnValue) {
+                                     ReturnValueSlot ReturnValue,
+                                     llvm::CallBase **CallOrInvoke) {
   // Builtins never have block type.
   if (E->getCallee()->getType()->isBlockPointerType())
-    return EmitBlockCallExpr(E, ReturnValue);
+    return EmitBlockCallExpr(E, ReturnValue, CallOrInvoke);
 
   if (const auto *CE = dyn_cast<CXXMemberCallExpr>(E))
-    return EmitCXXMemberCallExpr(CE, ReturnValue);
+    return EmitCXXMemberCallExpr(CE, ReturnValue, CallOrInvoke);
 
   if (const auto *CE = dyn_cast<CUDAKernelCallExpr>(E))
-    return EmitCUDAKernelCallExpr(CE, ReturnValue);
+    return EmitCUDAKernelCallExpr(CE, ReturnValue, CallOrInvoke);
 
   // A CXXOperatorCallExpr is created even for explicit object methods, but
   // these should be treated like static function call.
@@ -5454,7 +5455,7 @@ RValue CodeGenFunction::EmitCallExpr(const CallExpr *E,
     if (const auto *MD =
             dyn_cast_if_present<CXXMethodDecl>(CE->getCalleeDecl());
         MD && MD->isImplicitObjectMemberFunction())
-      return EmitCXXOperatorMemberCallExpr(CE, MD, ReturnValue);
+      return EmitCXXOperatorMemberCallExpr(CE, MD, ReturnValue, CallOrInvoke);
 
   CGCallee callee = EmitCallee(E->getCallee());
 
@@ -5467,14 +5468,17 @@ RValue CodeGenFunction::EmitCallExpr(const CallExpr *E,
     return EmitCXXPseudoDestructorExpr(callee.getPseudoDestructorExpr());
   }
 
-  return EmitCall(E->getCallee()->getType(), callee, E, ReturnValue);
+  return EmitCall(E->getCallee()->getType(), callee, E, ReturnValue,
+                  /*Chain=*/nullptr, CallOrInvoke);
 }
 
 /// Emit a CallExpr without considering whether it might be a subclass.
 RValue CodeGenFunction::EmitSimpleCallExpr(const CallExpr *E,
-                                           ReturnValueSlot ReturnValue) {
+                                           ReturnValueSlot ReturnValue,
+                                           llvm::CallBase **CallOrInvoke) {
   CGCallee Callee = EmitCallee(E->getCallee());
-  return EmitCall(E->getCallee()->getType(), Callee, E, ReturnValue);
+  return EmitCall(E->getCallee()->getType(), Callee, E, ReturnValue,
+                  /*Chain=*/nullptr, CallOrInvoke);
 }
 
 // Detect the unusual situation where an inline version is shadowed by a
@@ -5678,8 +5682,9 @@ LValue CodeGenFunction::EmitBinaryOperatorLValue(const BinaryOperator *E) {
   llvm_unreachable("bad evaluation kind");
 }
 
-LValue CodeGenFunction::EmitCallExprLValue(const CallExpr *E) {
-  RValue RV = EmitCallExpr(E);
+LValue CodeGenFunction::EmitCallExprLValue(const CallExpr *E,
+                                           llvm::CallBase **CallOrInvoke) {
+  RValue RV = EmitCallExpr(E, ReturnValueSlot(), CallOrInvoke);
 
   if (!RV.isScalar())
     return MakeAddrLValue(RV.getAggregateAddress(), E->getType(),
@@ -5802,9 +5807,11 @@ LValue CodeGenFunction::EmitStmtExprLValue(const StmtExpr *E) {
                         AlignmentSource::Decl);
 }
 
-RValue CodeGenFunction::EmitCall(QualType CalleeType, const CGCallee &OrigCallee,
-                                 const CallExpr *E, ReturnValueSlot ReturnValue,
-                                 llvm::Value *Chain) {
+RValue CodeGenFunction::EmitCall(QualType CalleeType,
+                                 const CGCallee &OrigCallee, const CallExpr *E,
+                                 ReturnValueSlot ReturnValue,
+                                 llvm::Value *Chain,
+                                 llvm::CallBase **CallOrInvoke) {
   // Get the actual function type. The callee type will always be a pointer to
   // function type or a block pointer type.
   assert(CalleeType->isFunctionPointerType() &&
@@ -6015,8 +6022,8 @@ RValue CodeGenFunction::EmitCall(QualType CalleeType, const CGCallee &OrigCallee
         Address(Handle, Handle->getType(), CGM.getPointerAlign()));
     Callee.setFunctionPointer(Stub);
   }
-  llvm::CallBase *CallOrInvoke = nullptr;
-  RValue Call = EmitCall(FnInfo, Callee, ReturnValue, Args, &CallOrInvoke,
+  llvm::CallBase *LocalCallOrInvoke = nullptr;
+  RValue Call = EmitCall(FnInfo, Callee, ReturnValue, Args, &LocalCallOrInvoke,
                          E == MustTailCall, E->getExprLoc());
 
   // Generate function declaration DISuprogram in order to be used
@@ -6025,11 +6032,13 @@ RValue CodeGenFunction::EmitCall(QualType CalleeType, const CGCallee &OrigCallee
     if (auto *CalleeDecl = dyn_cast_or_null<FunctionDecl>(TargetDecl)) {
       FunctionArgList Args;
       QualType ResTy = BuildFunctionArgList(CalleeDecl, Args);
-      DI->EmitFuncDeclForCallSite(CallOrInvoke,
+      DI->EmitFuncDeclForCallSite(LocalCallOrInvoke,
                                   DI->getFunctionType(CalleeDecl, ResTy, Args),
                                   CalleeDecl);
     }
   }
+  if (CallOrInvoke)
+    *CallOrInvoke = LocalCallOrInvoke;
 
   return Call;
 }
diff --git a/clang/lib/CodeGen/CGExprCXX.cpp b/clang/lib/CodeGen/CGExprCXX.cpp
index 8eb6ab7381acb..1214bb054fb8d 100644
--- a/clang/lib/CodeGen/CGExprCXX.cpp
+++ b/clang/lib/CodeGen/CGExprCXX.cpp
@@ -84,23 +84,24 @@ commonEmitCXXMemberOrOperatorCall(CodeGenFunction &CGF, GlobalDecl GD,
 
 RValue CodeGenFunction::EmitCXXMemberOrOperatorCall(
     const CXXMethodDecl *MD, const CGCallee &Callee,
-    Retur...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jul 17, 2024

@llvm/pr-subscribers-clang

Author: Yuxuan Chen (yuxuanchen1997)

Changes

This patch is the frontend implementation of the coroutine elide improvement project detailed in this discourse post: https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044

This patch proposes a C++ struct/class attribute [[clang::coro_inplace_task]]. This notion of inplace task gives developers and library authors a certainty that coroutine heap elision happens in a predictable way.

Originally, after we lower a coroutine to LLVM IR, CoroElide is responsible for analysis of whether an elision can happen. Take this as an example:

Task foo();
Task bar() {
  co_await foo();
}

For CoroElide to happen, the ramp function of foo must be inlined into bar. This inlining happens after foo has been split but bar is usually still a presplit coroutine. If foo is indeed a coroutine, the inlined coro.id intrinsics of foo is visible within bar. CoroElide then runs an analysis to figure out whether the SSA value of coro.begin() of foo gets destroyed before bar terminates.

Task types are rarely simple enough for the destroy logic of the task to reference the SSA value from coro.begin() directly. Hence, the pass is very ineffective for even the most trivial C++ Task types. Improving CoroElide by implementing more powerful analyses is possible, however it doesn't give us the predictability when we expect elision to happen.

The approach we want to take with this language extension generally originates from the philosophy that library implementations of Task types has the control over the structured concurrency guarantees we demand for elision to happen. That is, the lifetime for the callee's frame is shorter to that of the caller.

The [[clang::coro_inplace_task]] is a class attribute which can be applied to a coroutine return type.

When a coroutine function that returns such a type calls another coroutine function, the compiler performs heap allocation elision when the following conditions are all met:

  • callee coroutine function returns a type that is annotated with [[clang::coro_inplace_task]].
  • In caller coroutine, the return value of the callee is a prvalue that is immediately co_awaited.

From the C++ perspective, it makes sense because we can ensure the lifetime of elided callee cannot exceed that of the caller if we can guarantee that the caller coroutine is never destroyed earlier than the callee coroutine. This is not generally true for any C++ programs. However, the library that implements Task types and executors may provide this guarantee to the compiler, providing the user with certainty that HALO will work on their programs.

After this patch, when compiling coroutines that return a type with such attribute, the frontend checks that the type of the operand of co_await expressions (not operator co_await). If it's also attributed with [[clang::coro_inplace_task]], the FE emits metadata on the call or invoke instruction as a hint for a later middle end pass to elide the elision.

The original patch version is #94693 and as suggested, the patch is split into frontend and middle end solutions into stacked PRs.

The middle end CoroElide patch can be found at <PR>.
The middle end transformation that performs the elide can be found at <PR>.


Patch is 52.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/99282.diff

24 Files Affected:

  • (modified) clang/include/clang/AST/ExprCXX.h (+18-8)
  • (modified) clang/include/clang/Basic/Attr.td (+8)
  • (modified) clang/include/clang/Basic/AttrDocs.td (+19)
  • (modified) clang/lib/CodeGen/CGBlocks.cpp (+3-2)
  • (modified) clang/lib/CodeGen/CGCUDARuntime.cpp (+3-2)
  • (modified) clang/lib/CodeGen/CGCUDARuntime.h (+5-3)
  • (modified) clang/lib/CodeGen/CGCXXABI.h (+5-5)
  • (modified) clang/lib/CodeGen/CGClass.cpp (+6-10)
  • (modified) clang/lib/CodeGen/CGCoroutine.cpp (+21-9)
  • (modified) clang/lib/CodeGen/CGExpr.cpp (+25-16)
  • (modified) clang/lib/CodeGen/CGExprCXX.cpp (+33-27)
  • (modified) clang/lib/CodeGen/CodeGenFunction.h (+37-27)
  • (modified) clang/lib/CodeGen/ItaniumCXXABI.cpp (+9-7)
  • (modified) clang/lib/CodeGen/MicrosoftCXXABI.cpp (+10-8)
  • (modified) clang/lib/Sema/SemaCoroutine.cpp (+56-2)
  • (modified) clang/lib/Serialization/ASTReaderStmt.cpp (+8-2)
  • (modified) clang/lib/Serialization/ASTWriterStmt.cpp (+2-1)
  • (added) clang/test/CodeGenCoroutines/Inputs/utility.h (+13)
  • (added) clang/test/CodeGenCoroutines/coro-must-elide.cpp (+87)
  • (modified) clang/test/Misc/pragma-attribute-supported-attributes-list.test (+1)
  • (modified) llvm/include/llvm/Bitcode/LLVMBitCodes.h (+1)
  • (modified) llvm/include/llvm/IR/Attributes.td (+3)
  • (modified) llvm/lib/Bitcode/Writer/BitcodeWriter.cpp (+2)
  • (modified) llvm/lib/Transforms/Utils/CodeExtractor.cpp (+1)
diff --git a/clang/include/clang/AST/ExprCXX.h b/clang/include/clang/AST/ExprCXX.h
index c2feac525c1ea..0cf62aee41b66 100644
--- a/clang/include/clang/AST/ExprCXX.h
+++ b/clang/include/clang/AST/ExprCXX.h
@@ -5082,7 +5082,8 @@ class CoroutineSuspendExpr : public Expr {
   enum SubExpr { Operand, Common, Ready, Suspend, Resume, Count };
 
   Stmt *SubExprs[SubExpr::Count];
-  OpaqueValueExpr *OpaqueValue = nullptr;
+  OpaqueValueExpr *CommonExprOpaqueValue = nullptr;
+  OpaqueValueExpr *InplaceCallOpaqueValue = nullptr;
 
 public:
   // These types correspond to the three C++ 'await_suspend' return variants
@@ -5090,10 +5091,10 @@ class CoroutineSuspendExpr : public Expr {
 
   CoroutineSuspendExpr(StmtClass SC, SourceLocation KeywordLoc, Expr *Operand,
                        Expr *Common, Expr *Ready, Expr *Suspend, Expr *Resume,
-                       OpaqueValueExpr *OpaqueValue)
+                       OpaqueValueExpr *CommonExprOpaqueValue)
       : Expr(SC, Resume->getType(), Resume->getValueKind(),
              Resume->getObjectKind()),
-        KeywordLoc(KeywordLoc), OpaqueValue(OpaqueValue) {
+        KeywordLoc(KeywordLoc), CommonExprOpaqueValue(CommonExprOpaqueValue) {
     SubExprs[SubExpr::Operand] = Operand;
     SubExprs[SubExpr::Common] = Common;
     SubExprs[SubExpr::Ready] = Ready;
@@ -5128,7 +5129,16 @@ class CoroutineSuspendExpr : public Expr {
   }
 
   /// getOpaqueValue - Return the opaque value placeholder.
-  OpaqueValueExpr *getOpaqueValue() const { return OpaqueValue; }
+  OpaqueValueExpr *getCommonExprOpaqueValue() const {
+    return CommonExprOpaqueValue;
+  }
+
+  OpaqueValueExpr *getInplaceCallOpaqueValue() const {
+    return InplaceCallOpaqueValue;
+  }
+  void setInplaceCallOpaqueValue(OpaqueValueExpr *E) {
+    InplaceCallOpaqueValue = E;
+  }
 
   Expr *getReadyExpr() const {
     return static_cast<Expr*>(SubExprs[SubExpr::Ready]);
@@ -5194,9 +5204,9 @@ class CoawaitExpr : public CoroutineSuspendExpr {
 public:
   CoawaitExpr(SourceLocation CoawaitLoc, Expr *Operand, Expr *Common,
               Expr *Ready, Expr *Suspend, Expr *Resume,
-              OpaqueValueExpr *OpaqueValue, bool IsImplicit = false)
+              OpaqueValueExpr *CommonExprOpaqueValue, bool IsImplicit = false)
       : CoroutineSuspendExpr(CoawaitExprClass, CoawaitLoc, Operand, Common,
-                             Ready, Suspend, Resume, OpaqueValue) {
+                             Ready, Suspend, Resume, CommonExprOpaqueValue) {
     CoawaitBits.IsImplicit = IsImplicit;
   }
 
@@ -5275,9 +5285,9 @@ class CoyieldExpr : public CoroutineSuspendExpr {
 public:
   CoyieldExpr(SourceLocation CoyieldLoc, Expr *Operand, Expr *Common,
               Expr *Ready, Expr *Suspend, Expr *Resume,
-              OpaqueValueExpr *OpaqueValue)
+              OpaqueValueExpr *CommonExprOpaqueValue)
       : CoroutineSuspendExpr(CoyieldExprClass, CoyieldLoc, Operand, Common,
-                             Ready, Suspend, Resume, OpaqueValue) {}
+                             Ready, Suspend, Resume, CommonExprOpaqueValue) {}
   CoyieldExpr(SourceLocation CoyieldLoc, QualType Ty, Expr *Operand,
               Expr *Common)
       : CoroutineSuspendExpr(CoyieldExprClass, CoyieldLoc, Ty, Operand,
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 1293d0ddbc117..e482c9daf9fb3 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1217,6 +1217,14 @@ def CoroDisableLifetimeBound : InheritableAttr {
   let SimpleHandler = 1;
 }
 
+def CoroInplaceTask : InheritableAttr {
+  let Spellings = [Clang<"coro_inplace_task">];
+  let Subjects = SubjectList<[CXXRecord]>;
+  let LangOpts = [CPlusPlus];
+  let Documentation = [CoroInplaceTaskDoc];
+  let SimpleHandler = 1;
+}
+
 // OSObject-based attributes.
 def OSConsumed : InheritableParamAttr {
   let Spellings = [Clang<"os_consumed">];
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 09cf4f80bd999..21d59dedec578 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -8108,6 +8108,25 @@ but do not pass them to the underlying coroutine or pass them by value.
 }];
 }
 
+def CoroInplaceTaskDoc : Documentation {
+  let Category = DocCatDecl;
+  let Content = [{
+The ``[[clang::coro_inplace_task]]`` is a class attribute which can be applied
+to a coroutine return type.
+
+When a coroutine function that returns such a type calls another coroutine function,
+the compiler performs heap allocation elision when the following conditions are all met:
+- callee coroutine function returns a type that is annotated with ``[[clang::coro_inplace_task]]``.
+- The callee coroutine function is inlined.
+- In caller coroutine, the return value of the callee is a prvalue or an xvalue, and
+- The temporary expression containing the callee coroutine object is immediately co_awaited.
+
+The behavior is undefined if any of the following condition was met:
+- the caller coroutine is destroyed earlier than the callee coroutine.
+
+  }];
+}
+
 def CountedByDocs : Documentation {
   let Category = DocCatField;
   let Content = [{
diff --git a/clang/lib/CodeGen/CGBlocks.cpp b/clang/lib/CodeGen/CGBlocks.cpp
index 066139b1c78c7..684fda7440731 100644
--- a/clang/lib/CodeGen/CGBlocks.cpp
+++ b/clang/lib/CodeGen/CGBlocks.cpp
@@ -1163,7 +1163,8 @@ llvm::Type *CodeGenModule::getGenericBlockLiteralType() {
 }
 
 RValue CodeGenFunction::EmitBlockCallExpr(const CallExpr *E,
-                                          ReturnValueSlot ReturnValue) {
+                                          ReturnValueSlot ReturnValue,
+                                          llvm::CallBase **CallOrInvoke) {
   const auto *BPT = E->getCallee()->getType()->castAs<BlockPointerType>();
   llvm::Value *BlockPtr = EmitScalarExpr(E->getCallee());
   llvm::Type *GenBlockTy = CGM.getGenericBlockLiteralType();
@@ -1220,7 +1221,7 @@ RValue CodeGenFunction::EmitBlockCallExpr(const CallExpr *E,
   CGCallee Callee(CGCalleeInfo(), Func);
 
   // And call the block.
-  return EmitCall(FnInfo, Callee, ReturnValue, Args);
+  return EmitCall(FnInfo, Callee, ReturnValue, Args, CallOrInvoke);
 }
 
 Address CodeGenFunction::GetAddrOfBlockDecl(const VarDecl *variable) {
diff --git a/clang/lib/CodeGen/CGCUDARuntime.cpp b/clang/lib/CodeGen/CGCUDARuntime.cpp
index c14a9d3f2bbbc..1e1da1e2411a7 100644
--- a/clang/lib/CodeGen/CGCUDARuntime.cpp
+++ b/clang/lib/CodeGen/CGCUDARuntime.cpp
@@ -25,7 +25,8 @@ CGCUDARuntime::~CGCUDARuntime() {}
 
 RValue CGCUDARuntime::EmitCUDAKernelCallExpr(CodeGenFunction &CGF,
                                              const CUDAKernelCallExpr *E,
-                                             ReturnValueSlot ReturnValue) {
+                                             ReturnValueSlot ReturnValue,
+                                             llvm::CallBase **CallOrInvoke) {
   llvm::BasicBlock *ConfigOKBlock = CGF.createBasicBlock("kcall.configok");
   llvm::BasicBlock *ContBlock = CGF.createBasicBlock("kcall.end");
 
@@ -35,7 +36,7 @@ RValue CGCUDARuntime::EmitCUDAKernelCallExpr(CodeGenFunction &CGF,
 
   eval.begin(CGF);
   CGF.EmitBlock(ConfigOKBlock);
-  CGF.EmitSimpleCallExpr(E, ReturnValue);
+  CGF.EmitSimpleCallExpr(E, ReturnValue, CallOrInvoke);
   CGF.EmitBranch(ContBlock);
 
   CGF.EmitBlock(ContBlock);
diff --git a/clang/lib/CodeGen/CGCUDARuntime.h b/clang/lib/CodeGen/CGCUDARuntime.h
index 8030d632cc3d2..86f776004ee7c 100644
--- a/clang/lib/CodeGen/CGCUDARuntime.h
+++ b/clang/lib/CodeGen/CGCUDARuntime.h
@@ -21,6 +21,7 @@
 #include "llvm/IR/GlobalValue.h"
 
 namespace llvm {
+class CallBase;
 class Function;
 class GlobalVariable;
 }
@@ -82,9 +83,10 @@ class CGCUDARuntime {
   CGCUDARuntime(CodeGenModule &CGM) : CGM(CGM) {}
   virtual ~CGCUDARuntime();
 
-  virtual RValue EmitCUDAKernelCallExpr(CodeGenFunction &CGF,
-                                        const CUDAKernelCallExpr *E,
-                                        ReturnValueSlot ReturnValue);
+  virtual RValue
+  EmitCUDAKernelCallExpr(CodeGenFunction &CGF, const CUDAKernelCallExpr *E,
+                         ReturnValueSlot ReturnValue,
+                         llvm::CallBase **CallOrInvoke = nullptr);
 
   /// Emits a kernel launch stub.
   virtual void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args) = 0;
diff --git a/clang/lib/CodeGen/CGCXXABI.h b/clang/lib/CodeGen/CGCXXABI.h
index 7dcc539111996..687ff7fb84444 100644
--- a/clang/lib/CodeGen/CGCXXABI.h
+++ b/clang/lib/CodeGen/CGCXXABI.h
@@ -485,11 +485,11 @@ class CGCXXABI {
       llvm::PointerUnion<const CXXDeleteExpr *, const CXXMemberCallExpr *>;
 
   /// Emit the ABI-specific virtual destructor call.
-  virtual llvm::Value *EmitVirtualDestructorCall(CodeGenFunction &CGF,
-                                                 const CXXDestructorDecl *Dtor,
-                                                 CXXDtorType DtorType,
-                                                 Address This,
-                                                 DeleteOrMemberCallExpr E) = 0;
+  virtual llvm::Value *
+  EmitVirtualDestructorCall(CodeGenFunction &CGF, const CXXDestructorDecl *Dtor,
+                            CXXDtorType DtorType, Address This,
+                            DeleteOrMemberCallExpr E,
+                            llvm::CallBase **CallOrInvoke) = 0;
 
   virtual void adjustCallArgsForDestructorThunk(CodeGenFunction &CGF,
                                                 GlobalDecl GD,
diff --git a/clang/lib/CodeGen/CGClass.cpp b/clang/lib/CodeGen/CGClass.cpp
index 0a595bb998d26..c56716fbd0590 100644
--- a/clang/lib/CodeGen/CGClass.cpp
+++ b/clang/lib/CodeGen/CGClass.cpp
@@ -2191,15 +2191,11 @@ static bool canEmitDelegateCallArgs(CodeGenFunction &CGF,
   return true;
 }
 
-void CodeGenFunction::EmitCXXConstructorCall(const CXXConstructorDecl *D,
-                                             CXXCtorType Type,
-                                             bool ForVirtualBase,
-                                             bool Delegating,
-                                             Address This,
-                                             CallArgList &Args,
-                                             AggValueSlot::Overlap_t Overlap,
-                                             SourceLocation Loc,
-                                             bool NewPointerIsChecked) {
+void CodeGenFunction::EmitCXXConstructorCall(
+    const CXXConstructorDecl *D, CXXCtorType Type, bool ForVirtualBase,
+    bool Delegating, Address This, CallArgList &Args,
+    AggValueSlot::Overlap_t Overlap, SourceLocation Loc,
+    bool NewPointerIsChecked, llvm::CallBase **CallOrInvoke) {
   const CXXRecordDecl *ClassDecl = D->getParent();
 
   if (!NewPointerIsChecked)
@@ -2247,7 +2243,7 @@ void CodeGenFunction::EmitCXXConstructorCall(const CXXConstructorDecl *D,
   const CGFunctionInfo &Info = CGM.getTypes().arrangeCXXConstructorCall(
       Args, D, Type, ExtraArgs.Prefix, ExtraArgs.Suffix, PassPrototypeArgs);
   CGCallee Callee = CGCallee::forDirect(CalleePtr, GlobalDecl(D, Type));
-  EmitCall(Info, Callee, ReturnValueSlot(), Args, nullptr, false, Loc);
+  EmitCall(Info, Callee, ReturnValueSlot(), Args, CallOrInvoke, false, Loc);
 
   // Generate vtable assumptions if we're constructing a complete object
   // with a vtable.  We don't do this for base subobjects for two reasons:
diff --git a/clang/lib/CodeGen/CGCoroutine.cpp b/clang/lib/CodeGen/CGCoroutine.cpp
index a8a70186c2c5a..f04287b871029 100644
--- a/clang/lib/CodeGen/CGCoroutine.cpp
+++ b/clang/lib/CodeGen/CGCoroutine.cpp
@@ -12,9 +12,11 @@
 
 #include "CGCleanup.h"
 #include "CodeGenFunction.h"
-#include "llvm/ADT/ScopeExit.h"
+#include "clang/AST/ExprCXX.h"
 #include "clang/AST/StmtCXX.h"
 #include "clang/AST/StmtVisitor.h"
+#include "llvm/ADT/ScopeExit.h"
+#include "llvm/IR/Intrinsics.h"
 
 using namespace clang;
 using namespace CodeGen;
@@ -223,12 +225,22 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
                                     CoroutineSuspendExpr const &S,
                                     AwaitKind Kind, AggValueSlot aggSlot,
                                     bool ignoreResult, bool forLValue) {
-  auto *E = S.getCommonExpr();
+  auto &Builder = CGF.Builder;
 
-  auto CommonBinder =
-      CodeGenFunction::OpaqueValueMappingData::bind(CGF, S.getOpaqueValue(), E);
-  auto UnbindCommonOnExit =
-      llvm::make_scope_exit([&] { CommonBinder.unbind(CGF); });
+  // If S.getInplaceCallOpaqueValue() is null, we don't have a nested opaque
+  // value for common expression.
+  std::optional<CodeGenFunction::OpaqueValueMapping> OperandMapping;
+  if (auto *CallOV = S.getInplaceCallOpaqueValue()) {
+    auto *CE = cast<CallExpr>(CallOV->getSourceExpr());
+    llvm::CallBase *CallOrInvoke = nullptr;
+    LValue CallResult = CGF.EmitCallExprLValue(CE, &CallOrInvoke);
+    if (CallOrInvoke)
+      CallOrInvoke->addFnAttr(llvm::Attribute::CoroMustElide);
+
+    OperandMapping.emplace(CGF, CallOV, CallResult);
+  }
+  CodeGenFunction::OpaqueValueMapping BindCommon(CGF,
+                                                 S.getCommonExprOpaqueValue());
 
   auto Prefix = buildSuspendPrefixStr(Coro, Kind);
   BasicBlock *ReadyBlock = CGF.createBasicBlock(Prefix + Twine(".ready"));
@@ -241,7 +253,6 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
   // Otherwise, emit suspend logic.
   CGF.EmitBlock(SuspendBlock);
 
-  auto &Builder = CGF.Builder;
   llvm::Function *CoroSave = CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_save);
   auto *NullPtr = llvm::ConstantPointerNull::get(CGF.CGM.Int8PtrTy);
   auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr});
@@ -256,7 +267,8 @@ static LValueOrRValue emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
 
   SmallVector<llvm::Value *, 3> SuspendIntrinsicCallArgs;
   SuspendIntrinsicCallArgs.push_back(
-      CGF.getOrCreateOpaqueLValueMapping(S.getOpaqueValue()).getPointer(CGF));
+      CGF.getOrCreateOpaqueLValueMapping(S.getCommonExprOpaqueValue())
+          .getPointer(CGF));
 
   SuspendIntrinsicCallArgs.push_back(CGF.CurCoro.Data->CoroBegin);
   SuspendIntrinsicCallArgs.push_back(SuspendWrapper);
@@ -455,7 +467,7 @@ CodeGenFunction::generateAwaitSuspendWrapper(Twine const &CoroName,
       Builder.CreateLoad(GetAddrOfLocalVar(&FrameDecl));
 
   auto AwaiterBinder = CodeGenFunction::OpaqueValueMappingData::bind(
-      *this, S.getOpaqueValue(), AwaiterLValue);
+      *this, S.getCommonExprOpaqueValue(), AwaiterLValue);
 
   auto *SuspendRet = EmitScalarExpr(S.getSuspendExpr());
 
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index af1e1a25d1d8b..0ca2f5bbe823b 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -5437,16 +5437,17 @@ RValue CodeGenFunction::EmitRValueForField(LValue LV,
 //===--------------------------------------------------------------------===//
 
 RValue CodeGenFunction::EmitCallExpr(const CallExpr *E,
-                                     ReturnValueSlot ReturnValue) {
+                                     ReturnValueSlot ReturnValue,
+                                     llvm::CallBase **CallOrInvoke) {
   // Builtins never have block type.
   if (E->getCallee()->getType()->isBlockPointerType())
-    return EmitBlockCallExpr(E, ReturnValue);
+    return EmitBlockCallExpr(E, ReturnValue, CallOrInvoke);
 
   if (const auto *CE = dyn_cast<CXXMemberCallExpr>(E))
-    return EmitCXXMemberCallExpr(CE, ReturnValue);
+    return EmitCXXMemberCallExpr(CE, ReturnValue, CallOrInvoke);
 
   if (const auto *CE = dyn_cast<CUDAKernelCallExpr>(E))
-    return EmitCUDAKernelCallExpr(CE, ReturnValue);
+    return EmitCUDAKernelCallExpr(CE, ReturnValue, CallOrInvoke);
 
   // A CXXOperatorCallExpr is created even for explicit object methods, but
   // these should be treated like static function call.
@@ -5454,7 +5455,7 @@ RValue CodeGenFunction::EmitCallExpr(const CallExpr *E,
     if (const auto *MD =
             dyn_cast_if_present<CXXMethodDecl>(CE->getCalleeDecl());
         MD && MD->isImplicitObjectMemberFunction())
-      return EmitCXXOperatorMemberCallExpr(CE, MD, ReturnValue);
+      return EmitCXXOperatorMemberCallExpr(CE, MD, ReturnValue, CallOrInvoke);
 
   CGCallee callee = EmitCallee(E->getCallee());
 
@@ -5467,14 +5468,17 @@ RValue CodeGenFunction::EmitCallExpr(const CallExpr *E,
     return EmitCXXPseudoDestructorExpr(callee.getPseudoDestructorExpr());
   }
 
-  return EmitCall(E->getCallee()->getType(), callee, E, ReturnValue);
+  return EmitCall(E->getCallee()->getType(), callee, E, ReturnValue,
+                  /*Chain=*/nullptr, CallOrInvoke);
 }
 
 /// Emit a CallExpr without considering whether it might be a subclass.
 RValue CodeGenFunction::EmitSimpleCallExpr(const CallExpr *E,
-                                           ReturnValueSlot ReturnValue) {
+                                           ReturnValueSlot ReturnValue,
+                                           llvm::CallBase **CallOrInvoke) {
   CGCallee Callee = EmitCallee(E->getCallee());
-  return EmitCall(E->getCallee()->getType(), Callee, E, ReturnValue);
+  return EmitCall(E->getCallee()->getType(), Callee, E, ReturnValue,
+                  /*Chain=*/nullptr, CallOrInvoke);
 }
 
 // Detect the unusual situation where an inline version is shadowed by a
@@ -5678,8 +5682,9 @@ LValue CodeGenFunction::EmitBinaryOperatorLValue(const BinaryOperator *E) {
   llvm_unreachable("bad evaluation kind");
 }
 
-LValue CodeGenFunction::EmitCallExprLValue(const CallExpr *E) {
-  RValue RV = EmitCallExpr(E);
+LValue CodeGenFunction::EmitCallExprLValue(const CallExpr *E,
+                                           llvm::CallBase **CallOrInvoke) {
+  RValue RV = EmitCallExpr(E, ReturnValueSlot(), CallOrInvoke);
 
   if (!RV.isScalar())
     return MakeAddrLValue(RV.getAggregateAddress(), E->getType(),
@@ -5802,9 +5807,11 @@ LValue CodeGenFunction::EmitStmtExprLValue(const StmtExpr *E) {
                         AlignmentSource::Decl);
 }
 
-RValue CodeGenFunction::EmitCall(QualType CalleeType, const CGCallee &OrigCallee,
-                                 const CallExpr *E, ReturnValueSlot ReturnValue,
-                                 llvm::Value *Chain) {
+RValue CodeGenFunction::EmitCall(QualType CalleeType,
+                                 const CGCallee &OrigCallee, const CallExpr *E,
+                                 ReturnValueSlot ReturnValue,
+                                 llvm::Value *Chain,
+                                 llvm::CallBase **CallOrInvoke) {
   // Get the actual function type. The callee type will always be a pointer to
   // function type or a block pointer type.
   assert(CalleeType->isFunctionPointerType() &&
@@ -6015,8 +6022,8 @@ RValue CodeGenFunction::EmitCall(QualType CalleeType, const CGCallee &OrigCallee
         Address(Handle, Handle->getType(), CGM.getPointerAlign()));
     Callee.setFunctionPointer(Stub);
   }
-  llvm::CallBase *CallOrInvoke = nullptr;
-  RValue Call = EmitCall(FnInfo, Callee, ReturnValue, Args, &CallOrInvoke,
+  llvm::CallBase *LocalCallOrInvoke = nullptr;
+  RValue Call = EmitCall(FnInfo, Callee, ReturnValue, Args, &LocalCallOrInvoke,
                          E == MustTailCall, E->getExprLoc());
 
   // Generate function declaration DISuprogram in order to be used
@@ -6025,11 +6032,13 @@ RValue CodeGenFunction::EmitCall(QualType CalleeType, const CGCallee &OrigCallee
     if (auto *CalleeDecl = dyn_cast_or_null<FunctionDecl>(TargetDecl)) {
       FunctionArgList Args;
       QualType ResTy = BuildFunctionArgList(CalleeDecl, Args);
-      DI->EmitFuncDeclForCallSite(CallOrInvoke,
+      DI->EmitFuncDeclForCallSite(LocalCallOrInvoke,
                                   DI->getFunctionType(CalleeDecl, ResTy, Args),
                                   CalleeDecl);
     }
   }
+  if (CallOrInvoke)
+    *CallOrInvoke = LocalCallOrInvoke;
 
   return Call;
 }
diff --git a/clang/lib/CodeGen/CGExprCXX.cpp b/clang/lib/CodeGen/CGExprCXX.cpp
index 8eb6ab7381acb..1214bb054fb8d 100644
--- a/clang/lib/CodeGen/CGExprCXX.cpp
+++ b/clang/lib/CodeGen/CGExprCXX.cpp
@@ -84,23 +84,24 @@ commonEmitCXXMemberOrOperatorCall(CodeGenFunction &CGF, GlobalDecl GD,
 
 RValue CodeGenFunction::EmitCXXMemberOrOperatorCall(
     const CXXMethodDecl *MD, const CGCallee &Callee,
-    Retur...
[truncated]

@yuxuanchen1997 yuxuanchen1997 force-pushed the users/yuxuanchen1997/coro-inplace-task-fe branch from d38f060 to dfb903c Compare July 17, 2024 20:45
@yuxuanchen1997 yuxuanchen1997 changed the title [Clang] C++20 Coroutines: Introduce Frontend Attribute [[clang::coro_inplace_task]] [Clang] C++20 Coroutines: Introduce Frontend Attribute [[clang::coro_await_elidable]] Jul 17, 2024
@yuxuanchen1997 yuxuanchen1997 force-pushed the users/yuxuanchen1997/coro-inplace-task-fe branch 5 times, most recently from 8d9c2cc to 55e27fd Compare August 1, 2024 04:40

When a coroutine function that returns such a type calls another coroutine function,
the compiler performs heap allocation elision when the call to the coroutine function
is immediately co_awaited as a prvalue.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if the prvalue here is needed. If the call is always immediately co_awaited, the value of the call should always be prvalue, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt a bit uneasy to just say "immediate" because I feel like it's a little ambiguous. Briefly skimming through the standard, the word "immediate" is mostly used to describe parsing behaviours. While the term prvalue has a well-defined meaning in the standard and fits our purpose well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I am concerning is, actually we don't care if it is prvalue or not at all. What we really want is to find the CallInst. The prvalue catagory doesn't be involved in the design. It is just, the idea of prvalue can help us to implement this. So I think the concept of prvalue is better to be an implementation details but not the interface.

And on the other hand, in my mind, this should be meant to be related with parsers. So I am fine with that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I am concerning is, actually we don't care if it is prvalue or not at all.

I am actually using isPRValue() in Sema for this case. The right-hand side of co_await being a prvalue is a crucial requirement of this design. If the right-hand side is not a prvalue, we cannot do the elide safely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is about the implementation design. What I am saying is about the mental interface to users.

If the right-hand side is not a prvalue, we cannot do the elide safely.

Then the question is, if the call is immediately co_awaited, in what cases it won't be a prvalue. And if it is, why it is not safe on the high level?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the question is, if the call is immediately co_awaited, in what cases it won't be a prvalue.

There probably aren't any. But I was still not confident of what "immediately" means in general.

The term "immediate subexpression" is defined to be a different meaning in the standard: https://eel.is/c++draft/intro.execution#3

But I don't have a strong opinion of this. Probably not ambiguous in this context. This isn't the first time we use an overloaded term and won't be the last either, but I still like interfacing with developer with the term prvalue better. In my experience it's better understood.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the term "immediate subexpression" is not clear in the standard wording. But we're using a different term "immediately co_awaited". So we're not covering something existed already. The reason why I don't like prvalue here is that it didn't interpret the things very clearly. It may make people think they understand it but in fact it is not.

In another word, the semantics or intention of the attribute is that, the coroutine instance generated by the targeted call (no matter how we describe it) before we resume the current coroutine. The interesting point here is that, the coroutine instance is not necessarily connected to the return object of that targeted call. They are two different things. But just in the general implementation, the return object of such targeted call may have a pointer pointed to the coroutine instance. But there are indeed differences.

if (E->isCoroMustElide()) {
auto *I = *CallOrInvoke;
if (I)
I->addFnAttr(llvm::Attribute::CoroMustElide);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like all the changes in CodeGen are served for this. And I am a little concerned about if it is worthy to do such a big change for this. @efriedma-quic could you take a look here?

And I am wondering if all of the Emit*CallExpr function will be converged to one or a few fundamental functions to emit the CallInst like EmitCall then we can do our job there.

Copy link
Member Author

@yuxuanchen1997 yuxuanchen1997 Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if all of the Emit*CallExpr function will be converged to one

This actually sounds like a bigger change to me. If you search for all the usages of Emit*CallExpr, most of them are called from EmitCallExpr at the top level. (with the exception of EmitSimpleCallExpr, whose usage is really limited, maybe questionable here as well.)

a few fundamental functions to emit the CallInst like EmitCall then we can do our job there

EmitCall doesn't currently take a CallExpr, wondering if you'd like me to change to adding a flag to propagate this information. It also has multiple overloads, and perhaps more importantly existing calls from Emit*CallExpr to EmitCall are also using CallOrInvoke from EmitCall to "do something". So I am not sure how this perspective will actually pan out once put into code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the current style is more or less potential to scale. But I'd like to hear from @efriedma-quic since the patch changes a lot of CodeGen's functions signature and I want to know if he will be happy about this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gentle ping on this. @efriedma-quic

@yuxuanchen1997 yuxuanchen1997 force-pushed the users/yuxuanchen1997/coro-inplace-task-fe branch from 55e27fd to 5578fde Compare August 8, 2024 22:56
@yuxuanchen1997 yuxuanchen1997 deleted the users/yuxuanchen1997/coro-inplace-task-fe branch August 8, 2024 23:42
@yuxuanchen1997 yuxuanchen1997 restored the users/yuxuanchen1997/coro-inplace-task-fe branch August 8, 2024 23:42
@yuxuanchen1997 yuxuanchen1997 reopened this Aug 8, 2024
@ChuanqiXu9
Copy link
Member

I feel good with the current one except the CodeGen part. I'd like to land this faster than later since I feel this is important and I want to give it more baking time. So I'll try to take a deeper look into the CodeGen part if @efriedma-quic may not be available for another few weeks.

@yuxuanchen1997 yuxuanchen1997 force-pushed the users/yuxuanchen1997/coro-inplace-task-fe branch 2 times, most recently from 92966fb to 9c8163d Compare August 22, 2024 17:28
@yuxuanchen1997
Copy link
Member Author

Hello! Requesting upstream to take a look again as I believe all feedback at the moment has been addressed. Please let me know if I still miss anything.

@yuxuanchen1997 yuxuanchen1997 force-pushed the users/yuxuanchen1997/coro-inplace-task-fe branch from 9c8163d to 7c35074 Compare August 27, 2024 17:20
@yuxuanchen1997 yuxuanchen1997 force-pushed the users/yuxuanchen1997/coro-inplace-task-fe branch from 7c35074 to ef2b868 Compare August 28, 2024 21:34
Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except the CodeGen's change. I am still slightly concerned about the maintainability. Let's try to give @efriedma-quic another two weeks.

Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's give it a try : )

@yuxuanchen1997 yuxuanchen1997 force-pushed the users/yuxuanchen1997/coro-inplace-task-fe branch from ef2b868 to c47f921 Compare September 9, 2024 04:47
@yuxuanchen1997 yuxuanchen1997 merged commit e17a39b into main Sep 9, 2024
9 checks passed
@yuxuanchen1997 yuxuanchen1997 deleted the users/yuxuanchen1997/coro-inplace-task-fe branch September 9, 2024 06:09
yuxuanchen1997 added a commit that referenced this pull request Sep 9, 2024
…ramp functions during CoroSplit (#99283)

This patch is episode two of the coroutine HALO improvement project
published on discourse:
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044

Previously CoroElide depends on inlining, and its analysis does not work
very well with code generated by the C++ frontend due the existence of
many customization points. There has been issue reported to upstream how
ineffective the original CoroElide was in real world applications.

For C++ users, this set of patches aim to fix this problem by providing
library authors and users deterministic HALO behaviour for some
well-behaved coroutine `Task` types. The stack begins with a library
side attribute on the `Task` class that guarantees no unstructured
concurrency when coroutines are awaited directly with `co_await`ed as a
prvalue. This attribute on Task types gives us lifetime guarantees and
makes C++ FE capable to telling the ME which coroutine calls are
elidable. We convey such information from FE through the attribute
`coro_elide_safe`.

This patch modifies CoroSplit to create a variant of the coroutine ramp
function that 1) does not use heap allocated frame, instead take an
additional parameter as the pointer to the frame. Such parameter is
attributed with `dereferenceble` and `align` to convey size and align
requirements for the frame. 2) always stores cleanup instead of destroy
address for `coro.destroy()` actions.

In a later patch, we will have a new pass that runs right after
CoroSplit to find usages of the callee coroutine attributed
`coro_elide_safe` in presplit coroutine callers, allocates the frame on
its "stack", transform those usages to call the `noalloc` ramp function
variant.

(note I put quotes on the word "stack" here, because for presplit
coroutine, any alloca will be spilled into the frame when it's being
split)

The C++ Frontend attribute implementation that works with this change
can be found at #99282
The pass that makes use of the new `noalloc` split can be found at
#99285
yuxuanchen1997 added a commit that referenced this pull request Sep 9, 2024
…routines to the `noalloc` variant (#99285)

This patch is episode three of the middle end implementation for the
coroutine HALO improvement project published on discourse:
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044

After we attribute the calls to some coroutines as "coro_elide_safe" in
the C++ FE and creating a `noalloc` ramp function, we use a new middle
end pass to move the call to coroutines to the noalloc variant.

This pass should be run after CoroSplit. For each node we process in
CoroSplit, we look for its callers and replace the attributed ones in
presplit coroutines to the noalloc one. The transformed `noalloc` ramp
function will also require a frame pointer to a block of memory it can
use as an activation frame. We allocate this on the caller's frame with
an alloca.

Please note that we cannot safely transform such attributed calls in
post-split coroutines due to memory lifetime reasons. The CoroSplit pass
is responsible for creating the coroutine frame spills for all the
allocas in the coroutine. Therefore it will be unsafe to create new
allocas like this one in post-split coroutines. This happens relatively
rarely because CGSCC performs the passes on the callees before the
caller. However, if multiple coroutines coexist in one SCC, this
situation does happen (and prevents us from having potentially unbound
frame size due to recursion.)

You can find episode 1: Clang FE of this patch series at
#99282
Episode 2: CoroSplit at #99283
facebook-github-bot pushed a commit to facebook/folly that referenced this pull request Oct 14, 2024
Summary:
# Context
Recent upstream LLVM development gives us this opportunity to elide heap allocations for coroutine function calls. Our folly::coro's heap allocation costs us about $4M every year fleet wide. This patch potentially gives us a decent chunk of capacity savings.

This diff adds two attributes to folly:
## [[clang::coro_await_elidable]]
This attribute is introduced by llvm/llvm-project#99282.
This attribute is designed to make the following case always apply HALO
```
class [[clang::coro_await_elidable]] Task { ... };
Task<void> foo();
Task<void> bar() {
  co_await foo();
}
```
This requires 1) bar is a coroutine returns a type attributed as such; and 2) `foo()` to be immediately co_awaited.

Note: if the example above had `bar()` call itself recursively, the elide won't happen because we cannot determine the necessary frame size for potentially unbound recursion. Same goes for mutually recursive functions.

This won't work on the fan out case for the same reason. This optimization is off limits if we can't statically determine the necessary frame size for the outer most coroutine.

## [[clang::coro_await_elidable_argument]]

This attribute is introduced by llvm/llvm-project#108474.
The attribute propagates a "safe elide context" as defined in the summary of the PR.

The motivation of this attribute is to support eliding foo in the following case:
```
class [[clang::coro_await_elidable]] Task { ... };
Task<void> foo();
Task<void> bar() {
  co_await co_nothrow(foo());
}
```
`co_nothrow` itself is not a coroutine. `foo()` in this case is not immediately `co_await`ed. Hence not elidable. However, if we add this "coro_await_elidable_argument" attribute to co_nothrow's parameter (or parameter pack), we allow the compiler to treat its arguments as safe to elide as well as long as both co_nothrow() is awaited as a prvalue, and `foo()` is immediately passed to `co_nothrow`'s param as a prvalue.

This can also be useful for composition of senders in the future.

## Why is this safe to elide?
Generally, we depend on the fact that `folly::coro::Task` does not provide the user a way to destroy the caller while the callee is in a running or leave it in a resumeable state. And `co_nothrow`'s awaitable object doesn't try to destroy caller either.

## How many more cases are elided after this change?
IG is the top coroutine user according to this [strobelight](https://fburl.com/strobelight/dgx7cyzv). With this change, we found 3009 out of 4292 coroutine calls are elidable. Among the 1283 non-elidable cases, only 2 had a profile hotness greater than zero.

Reviewed By: yfeldblum

Differential Revision: D58956574

fbshipit-source-id: da8be360a03a34dec7111004a1af44e20d163839
facebook-github-bot pushed a commit to facebook/hhvm that referenced this pull request Oct 14, 2024
Summary:
# Context
Recent upstream LLVM development gives us this opportunity to elide heap allocations for coroutine function calls. Our folly::coro's heap allocation costs us about $4M every year fleet wide. This patch potentially gives us a decent chunk of capacity savings.

This diff adds two attributes to folly:
## [[clang::coro_await_elidable]]
This attribute is introduced by llvm/llvm-project#99282.
This attribute is designed to make the following case always apply HALO
```
class [[clang::coro_await_elidable]] Task { ... };
Task<void> foo();
Task<void> bar() {
  co_await foo();
}
```
This requires 1) bar is a coroutine returns a type attributed as such; and 2) `foo()` to be immediately co_awaited.

Note: if the example above had `bar()` call itself recursively, the elide won't happen because we cannot determine the necessary frame size for potentially unbound recursion. Same goes for mutually recursive functions.

This won't work on the fan out case for the same reason. This optimization is off limits if we can't statically determine the necessary frame size for the outer most coroutine.

## [[clang::coro_await_elidable_argument]]

This attribute is introduced by llvm/llvm-project#108474.
The attribute propagates a "safe elide context" as defined in the summary of the PR.

The motivation of this attribute is to support eliding foo in the following case:
```
class [[clang::coro_await_elidable]] Task { ... };
Task<void> foo();
Task<void> bar() {
  co_await co_nothrow(foo());
}
```
`co_nothrow` itself is not a coroutine. `foo()` in this case is not immediately `co_await`ed. Hence not elidable. However, if we add this "coro_await_elidable_argument" attribute to co_nothrow's parameter (or parameter pack), we allow the compiler to treat its arguments as safe to elide as well as long as both co_nothrow() is awaited as a prvalue, and `foo()` is immediately passed to `co_nothrow`'s param as a prvalue.

This can also be useful for composition of senders in the future.

## Why is this safe to elide?
Generally, we depend on the fact that `folly::coro::Task` does not provide the user a way to destroy the caller while the callee is in a running or leave it in a resumeable state. And `co_nothrow`'s awaitable object doesn't try to destroy caller either.

## How many more cases are elided after this change?
IG is the top coroutine user according to this [strobelight](https://fburl.com/strobelight/dgx7cyzv). With this change, we found 3009 out of 4292 coroutine calls are elidable. Among the 1283 non-elidable cases, only 2 had a profile hotness greater than zero.

Reviewed By: yfeldblum

Differential Revision: D58956574

fbshipit-source-id: da8be360a03a34dec7111004a1af44e20d163839
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category coroutines C++20 coroutines llvm:ir llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants