[Clang][RFC] Bypass TAD during overload resolution if a perfect match exists #133426

cor3ntin · 2025-03-28T11:12:46Z

This implements the same overload resolution behavior as GCC,
as described in https://wg21.link/p3606 (sections 1-2, not 3)

If, during overload resolution, a non-template candidate is always picked because each argument is a perfect match (i.e., the source and target types are the same), we do not perform deduction for any template candidate that might exist.

The goal is to be able to merge #122423 without being too disruptive.

This change means that the selection of the best viable candidate and template argument deduction become interleaved.

To avoid rewriting half of Clang, we store in OverloadCandidateSet enough information to deduce template candidates from OverloadCandidateSet::BestViableFunction. This means the lifetime of any object used by the template argument must outlive a call to Add*Template*Candidate.

This two-phase resolution is not performed for some initialization as there are cases where template candidates are a better match per the standard. It's also bypassed for code completion.

The change has a nice impact on compile times
https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au .

Fixes #62096
Fixes #74581
Fixes #53454

cor3ntin · 2025-03-28T11:13:51Z

Release note
More Tests
More comments

clang/lib/Sema/SemaOverload.cpp

zyn0217 · 2025-03-28T11:46:32Z

clang/lib/Sema/SemaOverload.cpp

+static void AddTemplateOverloadCandidate(
+    Sema &S, OverloadCandidateSet &CandidateSet,
+    NonDeducedConversionTemplateOverloadCandidate &&C) {
+  return S.AddTemplateConversionCandidateImmediately(


Can we reuse InjectNonDeducedTemplateCandidates for *Immediately functions or vice versa?

zyn0217 · 2025-03-28T11:48:07Z

clang/lib/Sema/SemaOverload.cpp

+  llvm::SmallVector<OverloadCandidate *, 4> PendingBest;
  llvm::SmallVector<const NamedDecl *, 4> EquivalentCands;
-
-  llvm::SmallVector<OverloadCandidate*, 4> PendingBest;


unrelated changes (sorry for nitpicks at this point)

github-actions · 2025-03-29T11:37:21Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvmbot · 2025-03-31T15:33:04Z

@llvm/pr-subscribers-clang

Author: cor3ntin (cor3ntin)

Changes

This implements the same overload resolution behavior as GCC,
as described in https://wg21.link/p3606 (sections 1-2, not 3)

If, during overload resolution, a non-template candidate is always picked because each argument is a perfect match (i.e., the source and target types are the same), we do not perform deduction for any template candidate that might exist.

The goal is to be able to merge #122423 without being too disruptive.

This change means that the selection of the best viable candidate and template argument deduction become interleaved.

To avoid rewriting half of Clang, we store in OverloadCandidateSet enough information to deduce template candidates from OverloadCandidateSet::BestViableFunction. This means the lifetime of any object used by the template argument must outlive a call to Add*Template*Candidate.

This two-phase resolution is not performed for some initialization as there are cases where template candidates are a better match per the standard. It's also bypassed for code completion.

The change has a nice impact on compile times
https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au .

Fixes #62096
Fixes #74581

Patch is 60.82 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/133426.diff

9 Files Affected:

(modified) clang/include/clang/Sema/Overload.h (+150-7)
(modified) clang/include/clang/Sema/Sema.h (+2)
(modified) clang/lib/Sema/SemaCodeComplete.cpp (+4-2)
(modified) clang/lib/Sema/SemaInit.cpp (+11-4)
(modified) clang/lib/Sema/SemaOverload.cpp (+344-101)
(modified) clang/test/CXX/temp/temp.constr/temp.constr.atomic/constrant-satisfaction-conversions.cpp (+4-4)
(modified) clang/test/SemaCXX/implicit-member-functions.cpp (+7-14)
(modified) clang/test/SemaTemplate/instantiate-function-params.cpp (+3-4)
(modified) clang/test/Templight/templight-empty-entries-fix.cpp (+51-75)

diff --git a/clang/include/clang/Sema/Overload.h b/clang/include/clang/Sema/Overload.h
index 6e08762dcc6d7..547845c01db11 100644
--- a/clang/include/clang/Sema/Overload.h
+++ b/clang/include/clang/Sema/Overload.h
@@ -38,6 +38,7 @@
 #include <cstddef>
 #include <cstdint>
 #include <utility>
+#include <variant>
 
 namespace clang {
 
@@ -407,6 +408,11 @@ class Sema;
              Third == ICK_Identity;
     }
 
+    bool isPerfect(const ASTContext &C) const {
+      return isIdentityConversion() &&
+             (!ReferenceBinding || C.hasSameType(getFromType(), getToType(2)));
+    }
+
     ImplicitConversionRank getRank() const;
     NarrowingKind
     getNarrowingKind(ASTContext &Context, const Expr *Converted,
@@ -743,6 +749,10 @@ class Sema;
       Standard.setAllToTypes(T);
     }
 
+    bool isPerfect(const ASTContext &C) const {
+      return isStandard() && Standard.isPerfect(C);
+    }
+
     // True iff this is a conversion sequence from an initializer list to an
     // array or std::initializer.
     bool hasInitializerListContainerType() const {
@@ -979,6 +989,20 @@ class Sema;
       return false;
     }
 
+    bool isPerfectMatch(const ASTContext &Ctx) const {
+      if (!Viable)
+        return false;
+      for (const auto &C : Conversions) {
+        if (!C.isInitialized())
+          return false;
+        if (!C.isPerfect(Ctx))
+          return false;
+      }
+      if (isa_and_nonnull<CXXConversionDecl>(Function))
+        return FinalConversion.isPerfect(Ctx);
+      return true;
+    }
+
     bool TryToFixBadConversion(unsigned Idx, Sema &S) {
       bool CanFix = Fix.tryToFixConversion(
                       Conversions[Idx].Bad.FromExpr,
@@ -1015,6 +1039,61 @@ class Sema;
           RewriteKind(CRK_None) {}
   };
 
+  struct DeferredConversionTemplateOverloadCandidate {
+    FunctionTemplateDecl *FunctionTemplate;
+    DeclAccessPair FoundDecl;
+    CXXRecordDecl *ActingContext;
+    Expr *From;
+    QualType ToType;
+
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned AllowObjCConversionOnExplicit : 1;
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned AllowExplicit : 1;
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned AllowResultConversion : 1;
+  };
+
+  struct DeferredMethodTemplateOverloadCandidate {
+    FunctionTemplateDecl *FunctionTemplate;
+    DeclAccessPair FoundDecl;
+    ArrayRef<Expr *> Args;
+    CXXRecordDecl *ActingContext;
+    Expr::Classification ObjectClassification;
+    QualType ObjectType;
+
+    OverloadCandidateParamOrder PO;
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned SuppressUserConversions : 1;
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned PartialOverloading : 1;
+  };
+
+  struct DeferredFunctionTemplateOverloadCandidate {
+    FunctionTemplateDecl *FunctionTemplate;
+    DeclAccessPair FoundDecl;
+    ArrayRef<Expr *> Args;
+
+    CallExpr::ADLCallKind IsADLCandidate;
+    OverloadCandidateParamOrder PO;
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned SuppressUserConversions : 1;
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned PartialOverloading : 1;
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned AllowExplicit : 1;
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned AggregateCandidateDeduction : 1;
+  };
+
+  using DeferredTemplateOverloadCandidate =
+      std::variant<DeferredConversionTemplateOverloadCandidate,
+                   DeferredMethodTemplateOverloadCandidate,
+                   DeferredFunctionTemplateOverloadCandidate>;
+
+  static_assert(
+      std::is_trivially_destructible_v<DeferredTemplateOverloadCandidate>);
+
   /// OverloadCandidateSet - A set of overload candidates, used in C++
   /// overload resolution (C++ 13.3).
   class OverloadCandidateSet {
@@ -1043,6 +1122,8 @@ class Sema;
       /// C++ [over.match.call.general]
       /// Resolve a call through the address of an overload set.
       CSK_AddressOfOverloadSet,
+
+      CSK_CodeCompletion,
     };
 
     /// Information about operator rewrites to consider when adding operator
@@ -1116,6 +1197,7 @@ class Sema;
   private:
     SmallVector<OverloadCandidate, 16> Candidates;
     llvm::SmallPtrSet<uintptr_t, 16> Functions;
+    SmallVector<DeferredTemplateOverloadCandidate, 8> DeferredCandidates;
 
     // Allocator for ConversionSequenceLists. We store the first few of these
     // inline to avoid allocation for small sets.
@@ -1126,7 +1208,8 @@ class Sema;
     OperatorRewriteInfo RewriteInfo;
 
     constexpr static unsigned NumInlineBytes =
-        24 * sizeof(ImplicitConversionSequence);
+        32 * sizeof(ImplicitConversionSequence);
+
     unsigned NumInlineBytesUsed = 0;
     alignas(void *) char InlineSpace[NumInlineBytes];
 
@@ -1137,15 +1220,13 @@ class Sema;
     /// from the slab allocator.
     /// FIXME: It would probably be nice to have a SmallBumpPtrAllocator
     /// instead.
-    /// FIXME: Now that this only allocates ImplicitConversionSequences, do we
-    /// want to un-generalize this?
     template <typename T>
     T *slabAllocate(unsigned N) {
       // It's simpler if this doesn't need to consider alignment.
       static_assert(alignof(T) == alignof(void *),
                     "Only works for pointer-aligned types.");
-      static_assert(std::is_trivial<T>::value ||
-                        std::is_same<ImplicitConversionSequence, T>::value,
+      static_assert(std::is_trivially_destructible_v<T> ||
+                        (std::is_same_v<ImplicitConversionSequence, T>),
                     "Add destruction logic to OverloadCandidateSet::clear().");
 
       unsigned NBytes = sizeof(T) * N;
@@ -1176,6 +1257,9 @@ class Sema;
     /// Whether diagnostics should be deferred.
     bool shouldDeferDiags(Sema &S, ArrayRef<Expr *> Args, SourceLocation OpLoc);
 
+    // Whether the resolution of template candidates should be defered
+    bool shouldDeferTemplateArgumentDeduction(const LangOptions &Opts) const;
+
     /// Determine when this overload candidate will be new to the
     /// overload set.
     bool isNewCandidate(Decl *F, OverloadCandidateParamOrder PO =
@@ -1199,8 +1283,12 @@ class Sema;
     iterator begin() { return Candidates.begin(); }
     iterator end() { return Candidates.end(); }
 
-    size_t size() const { return Candidates.size(); }
-    bool empty() const { return Candidates.empty(); }
+    size_t size() const {
+      return Candidates.size() + DeferredCandidates.size();
+    }
+    bool empty() const {
+      return Candidates.empty() && DeferredCandidates.empty();
+    }
 
     /// Allocate storage for conversion sequences for NumConversions
     /// conversions.
@@ -1216,6 +1304,19 @@ class Sema;
       return ConversionSequenceList(Conversions, NumConversions);
     }
 
+    llvm::MutableArrayRef<Expr *> getPersistentArgsArray(unsigned N) {
+      Expr **Exprs = slabAllocate<Expr *>(N);
+      return llvm::MutableArrayRef<Expr *>(Exprs, N);
+    }
+
+    template <typename... T>
+    llvm::MutableArrayRef<Expr *> getPersistentArgsArray(T *...Exprs) {
+      llvm::MutableArrayRef<Expr *> Arr =
+          getPersistentArgsArray(sizeof...(Exprs));
+      llvm::copy(std::initializer_list<Expr *>{Exprs...}, Arr.data());
+      return Arr;
+    }
+
     /// Add a new candidate with NumConversions conversion sequence slots
     /// to the overload set.
     OverloadCandidate &addCandidate(unsigned NumConversions = 0,
@@ -1231,6 +1332,28 @@ class Sema;
       return C;
     }
 
+    void AddDeferredTemplateCandidate(
+        FunctionTemplateDecl *FunctionTemplate, DeclAccessPair FoundDecl,
+        ArrayRef<Expr *> Args, bool SuppressUserConversions,
+        bool PartialOverloading, bool AllowExplicit,
+        CallExpr::ADLCallKind IsADLCandidate, OverloadCandidateParamOrder PO,
+        bool AggregateCandidateDeduction);
+
+    void AddDeferredMethodTemplateCandidate(
+        FunctionTemplateDecl *MethodTmpl, DeclAccessPair FoundDecl,
+        CXXRecordDecl *ActingContext, QualType ObjectType,
+        Expr::Classification ObjectClassification, ArrayRef<Expr *> Args,
+        bool SuppressUserConversions, bool PartialOverloading,
+        OverloadCandidateParamOrder PO);
+
+    void AddDeferredConversionTemplateCandidate(
+        FunctionTemplateDecl *FunctionTemplate, DeclAccessPair FoundDecl,
+        CXXRecordDecl *ActingContext, Expr *From, QualType ToType,
+        bool AllowObjCConversionOnExplicit, bool AllowExplicit,
+        bool AllowResultConversion);
+
+    void InjectNonDeducedTemplateCandidates(Sema &S);
+
     /// Find the best viable function on this overload set, if it exists.
     OverloadingResult BestViableFunction(Sema &S, SourceLocation Loc,
                                          OverloadCandidateSet::iterator& Best);
@@ -1263,6 +1386,14 @@ class Sema;
       DestAS = AS;
     }
 
+  private:
+    OverloadingResult ResultForBestCandidate(const iterator &Best);
+    void CudaExcludeWrongSideCandidates(Sema &S);
+    OverloadingResult
+    BestViableFunctionImpl(Sema &S, SourceLocation Loc,
+                           OverloadCandidateSet::iterator &Best);
+    void PerfectViableFunction(Sema &S, SourceLocation Loc,
+                               OverloadCandidateSet::iterator &Best);
   };
 
   bool isBetterOverloadCandidate(Sema &S, const OverloadCandidate &Cand1,
@@ -1311,6 +1442,18 @@ class Sema;
   // parameter.
   bool shouldEnforceArgLimit(bool PartialOverloading, FunctionDecl *Function);
 
+  inline bool OverloadCandidateSet::shouldDeferTemplateArgumentDeduction(
+      const LangOptions &Opts) const {
+    return
+        // For user defined conversion we need to check against different
+        // combination of CV qualifiers and look at any expicit specifier, so
+        // always deduce template candidate.
+        Kind != CSK_InitByUserDefinedConversion
+        && Kind != CSK_InitByConstructor
+        && Kind != CSK_CodeCompletion &&
+        Opts.CPlusPlus && (!Opts.CUDA || Opts.GPUExcludeWrongSideOverloads);
+  }
+
 } // namespace clang
 
 #endif // LLVM_CLANG_SEMA_OVERLOAD_H
diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index 066bce61c74c1..ab2a94584d46c 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -60,6 +60,7 @@
 #include "clang/Sema/DeclSpec.h"
 #include "clang/Sema/ExternalSemaSource.h"
 #include "clang/Sema/IdentifierResolver.h"
+#include "clang/Sema/Overload.h"
 #include "clang/Sema/Ownership.h"
 #include "clang/Sema/ParsedAttr.h"
 #include "clang/Sema/Redeclaration.h"
@@ -10345,6 +10346,7 @@ class Sema final : public SemaBase {
   /// Add a C++ function template specialization as a candidate
   /// in the candidate set, using template argument deduction to produce
   /// an appropriate function template specialization.
+
   void AddTemplateOverloadCandidate(
       FunctionTemplateDecl *FunctionTemplate, DeclAccessPair FoundDecl,
       TemplateArgumentListInfo *ExplicitTemplateArgs, ArrayRef<Expr *> Args,
diff --git a/clang/lib/Sema/SemaCodeComplete.cpp b/clang/lib/Sema/SemaCodeComplete.cpp
index 2003701b65654..e314f6859d71c 100644
--- a/clang/lib/Sema/SemaCodeComplete.cpp
+++ b/clang/lib/Sema/SemaCodeComplete.cpp
@@ -6364,7 +6364,8 @@ SemaCodeCompletion::ProduceCallSignatureHelp(Expr *Fn, ArrayRef<Expr *> Args,
   Expr *NakedFn = Fn->IgnoreParenCasts();
   // Build an overload candidate set based on the functions we find.
   SourceLocation Loc = Fn->getExprLoc();
-  OverloadCandidateSet CandidateSet(Loc, OverloadCandidateSet::CSK_Normal);
+  OverloadCandidateSet CandidateSet(Loc,
+                                    OverloadCandidateSet::CSK_CodeCompletion);
 
   if (auto ULE = dyn_cast<UnresolvedLookupExpr>(NakedFn)) {
     SemaRef.AddOverloadedCallCandidates(ULE, ArgsWithoutDependentTypes,
@@ -6567,7 +6568,8 @@ QualType SemaCodeCompletion::ProduceConstructorSignatureHelp(
   // FIXME: Provide support for variadic template constructors.
 
   if (CRD) {
-    OverloadCandidateSet CandidateSet(Loc, OverloadCandidateSet::CSK_Normal);
+    OverloadCandidateSet CandidateSet(Loc,
+                                      OverloadCandidateSet::CSK_CodeCompletion);
     for (NamedDecl *C : SemaRef.LookupConstructors(CRD)) {
       if (auto *FD = dyn_cast<FunctionDecl>(C)) {
         // FIXME: we can't yet provide correct signature help for initializer
diff --git a/clang/lib/Sema/SemaInit.cpp b/clang/lib/Sema/SemaInit.cpp
index 9814c3f456f0d..a8ab43776a6de 100644
--- a/clang/lib/Sema/SemaInit.cpp
+++ b/clang/lib/Sema/SemaInit.cpp
@@ -10043,12 +10043,19 @@ QualType Sema::DeduceTemplateSpecializationFromInitializer(
     //   When [...] the constructor [...] is a candidate by
     //    - [over.match.copy] (in all cases)
     if (TD) {
-      SmallVector<Expr *, 8> TmpInits;
-      for (Expr *E : Inits)
+
+      // As template candidates are not deduced immediately,
+      // persist the arry in the overload set.
+      MutableArrayRef<Expr *> TmpInits =
+          Candidates.getPersistentArgsArray(Inits.size());
+
+      for (auto [I, E] : llvm::enumerate(Inits)) {
         if (auto *DI = dyn_cast<DesignatedInitExpr>(E))
-          TmpInits.push_back(DI->getInit());
+          TmpInits[I] = DI->getInit();
         else
-          TmpInits.push_back(E);
+          TmpInits[I] = E;
+      }
+
       AddTemplateOverloadCandidate(
           TD, FoundDecl, /*ExplicitArgs=*/nullptr, TmpInits, Candidates,
           /*SuppressUserConversions=*/false,
diff --git a/clang/lib/Sema/SemaOverload.cpp b/clang/lib/Sema/SemaOverload.cpp
index 6d8006b35dcf4..faf357b010580 100644
--- a/clang/lib/Sema/SemaOverload.cpp
+++ b/clang/lib/Sema/SemaOverload.cpp
@@ -31,6 +31,7 @@
 #include "clang/Sema/Lookup.h"
 #include "clang/Sema/Overload.h"
 #include "clang/Sema/SemaCUDA.h"
+#include "clang/Sema/SemaCodeCompletion.h"
 #include "clang/Sema/SemaObjC.h"
 #include "clang/Sema/Template.h"
 #include "clang/Sema/TemplateDeduction.h"
@@ -45,6 +46,7 @@
 #include <cstddef>
 #include <cstdlib>
 #include <optional>
+#include <variant>
 
 using namespace clang;
 using namespace sema;
@@ -7787,15 +7789,14 @@ void Sema::AddMethodCandidate(
   }
 }
 
-void Sema::AddMethodTemplateCandidate(
+static void AddMethodTemplateCandidateImmediately(
+    Sema &S, OverloadCandidateSet &CandidateSet,
     FunctionTemplateDecl *MethodTmpl, DeclAccessPair FoundDecl,
     CXXRecordDecl *ActingContext,
     TemplateArgumentListInfo *ExplicitTemplateArgs, QualType ObjectType,
     Expr::Classification ObjectClassification, ArrayRef<Expr *> Args,
-    OverloadCandidateSet &CandidateSet, bool SuppressUserConversions,
-    bool PartialOverloading, OverloadCandidateParamOrder PO) {
-  if (!CandidateSet.isNewCandidate(MethodTmpl, PO))
-    return;
+    bool SuppressUserConversions, bool PartialOverloading,
+    OverloadCandidateParamOrder PO) {
 
   // C++ [over.match.funcs]p7:
   //   In each case where a candidate is a function template, candidate
@@ -7809,12 +7810,12 @@ void Sema::AddMethodTemplateCandidate(
   TemplateDeductionInfo Info(CandidateSet.getLocation());
   FunctionDecl *Specialization = nullptr;
   ConversionSequenceList Conversions;
-  if (TemplateDeductionResult Result = DeduceTemplateArguments(
+  if (TemplateDeductionResult Result = S.DeduceTemplateArguments(
           MethodTmpl, ExplicitTemplateArgs, Args, Specialization, Info,
           PartialOverloading, /*AggregateDeductionCandidate=*/false,
           /*PartialOrdering=*/false, ObjectType, ObjectClassification,
           [&](ArrayRef<QualType> ParamTypes) {
-            return CheckNonDependentConversions(
+            return S.CheckNonDependentConversions(
                 MethodTmpl, ParamTypes, Args, CandidateSet, Conversions,
                 SuppressUserConversions, ActingContext, ObjectType,
                 ObjectClassification, PO);
@@ -7826,7 +7827,7 @@ void Sema::AddMethodTemplateCandidate(
     Candidate.Function = MethodTmpl->getTemplatedDecl();
     Candidate.Viable = false;
     Candidate.RewriteKind =
-      CandidateSet.getRewriteInfo().getRewriteKind(Candidate.Function, PO);
+        CandidateSet.getRewriteInfo().getRewriteKind(Candidate.Function, PO);
     Candidate.IsSurrogate = false;
     Candidate.IgnoreObjectArgument =
         cast<CXXMethodDecl>(Candidate.Function)->isStatic() ||
@@ -7836,8 +7837,8 @@ void Sema::AddMethodTemplateCandidate(
       Candidate.FailureKind = ovl_fail_bad_conversion;
     else {
       Candidate.FailureKind = ovl_fail_bad_deduction;
-      Candidate.DeductionFailure = MakeDeductionFailureInfo(Context, Result,
-                                                            Info);
+      Candidate.DeductionFailure =
+          MakeDeductionFailureInfo(S.Context, Result, Info);
     }
     return;
   }
@@ -7847,10 +7848,34 @@ void Sema::AddMethodTemplateCandidate(
   assert(Specialization && "Missing member function template specialization?");
   assert(isa<CXXMethodDecl>(Specialization) &&
          "Specialization is not a member function?");
-  AddMethodCandidate(cast<CXXMethodDecl>(Specialization), FoundDecl,
-                     ActingContext, ObjectType, ObjectClassification, Args,
-                     CandidateSet, SuppressUserConversions, PartialOverloading,
-                     Conversions, PO, Info.hasStrictPackMatch());
+  S.AddMethodCandidate(
+      cast<CXXMethodDecl>(Specialization), FoundDecl, ActingContext, ObjectType,
+      ObjectClassification, Args, CandidateSet, SuppressUserConversions,
+      PartialOverloading, Conversions, PO, Info.hasStrictPackMatch());
+}
+
+void Sema::AddMethodTemplateCandidate(
+    FunctionTemplateDecl *MethodTmpl, DeclAccessPair FoundDecl,
+    CXXRecordDecl *ActingContext,
+    TemplateArgumentListInfo *ExplicitTemplateArgs, QualType ObjectType,
+    Expr::Classification ObjectClassification, ArrayRef<Expr *> Args,
+    OverloadCandidateSet &CandidateSet, bool SuppressUserConversions,
+    bool PartialOverloading, OverloadCandidateParamOrder PO) {
+  if (!CandidateSet.isNewCandidate(MethodTmpl, PO))
+    return;
+
+  if (ExplicitTemplateArgs ||
+      !CandidateSet.shouldDeferTemplateArgumentDeduction(getLangOpts())) {
+    AddMethodTemplateCandidateImmediately(
+        *this, CandidateSet, MethodTmpl, FoundDecl, ActingContext,
+        ExplicitTemplateArgs, ObjectType, ObjectClassification, Args,
+        SuppressUserConversions, PartialOverloading, PO);
+    return;
+  }
+
+  CandidateSet.AddDeferredMethodTemplateCandidate(
+      MethodTmpl, FoundDecl, ActingContext, ObjectType, ObjectClassification,
+      Args, SuppressUserConversions, PartialOverloading, PO);
 }
 
 /// Determine whether a given function template has a simple explicit specifier
@@ -7859,14 +7884,13 @@ static bool isNonDependentlyExplicit(FunctionTemplateDecl *FTD) {
   return ExplicitSpecifier::getFromDecl(FTD->getTemplatedDecl()).isExplicit();
 }
 
-void Sema::AddTemplateOverloadCandidate(
+static void AddTemplateOverloadCandidateImmediately(
+    Sema &S, OverloadCandidateSet &CandidateSet,
     FunctionTemplateDecl *FunctionTemplate, DeclAccessPair FoundDecl,
     TemplateArgumentListInfo *ExplicitTemplateArgs, ArrayRef<Expr *> Args,
-    OverloadCandidateSet &CandidateSet, bool SuppressUserConversions,
-    bool PartialOverloading, bool AllowExplicit, ADLCallKind IsADLCandidate,
-    OverloadCandidateParamOrder PO, bool AggregateCandidateDeduction) {
-  if (!CandidateSet.isNewCandidate(FunctionTemplate, PO))
-    return;
+    bool SuppressUserConversions, bool PartialOverloading, bool AllowExplicit,
+    Sema::ADLCallKind IsADLCandidate, OverloadCandidateParamOrder PO,
+    bool AggregateCandidateDeduction) {
 
   // If the function template has a non-dependent explicit specification,
   // exclude it now if appropriate; we are not permitted to perform deduction
@@ -7893,14 +7917,14 @@ void Sema::AddTemplateOverloadCandidate(
                              FunctionTemplate->getTemplateDepth());
   FunctionDecl *Specialization = nullptr;
   ConversionSequenceList Conversions;
-  if (TemplateDeductionResult Result = DeduceTemplateArguments(
+  if (TemplateDeductionResult Result = S.DeduceTemplateArguments(
           FunctionTemplate, ExplicitTemplateArgs, Args, Specialization, Info,
           PartialOverloading, AggregateCandidateDeduction,
           /*PartialOrdering=*/false,
           /*ObjectType=*/QualType(),
           /*ObjectClassification=*/Expr::Classification(),
           [&](ArrayRef<QualType> ParamTypes) {
-            return CheckNonDependentConversions(
+            return S.CheckNonDependentConver...
[truncated]

Sirraide

Just a few typos I noticed

clang/lib/Sema/SemaInit.cpp

clang/lib/Sema/SemaOverload.cpp

zygoloid

I think we should do this. Even if WG21 decides that they only want to allow this and not require it, it seems like a good change that we stand a decent chance of reaching implementation consensus on.

zyn0217

Thanks!

zyn0217 · 2025-04-01T05:41:01Z

clang/include/clang/Sema/Overload.h

-        24 * sizeof(ImplicitConversionSequence);
+        32 * sizeof(ImplicitConversionSequence);


Can we add some comments? what causes us to change it from 24 -> 32

I added some. We may be storing additional data in the slab allocator ( When we need to keep the argument array expression around for deferred candidates), so I made the storage a tad bigger to accomodate. There is no reason for this thing to not be quite large frankly. it's short lived

clang/lib/Sema/SemaOverload.cpp

erichkeane

I agree completely with Richard, we should just do this. We have prior art (GCC), the standard currently permits it as an implementation strategy, and it is an improvement in basically EVERY way with mild/barely-justifiable downsides.

clang/include/clang/Sema/Overload.h

erichkeane

A few mean comments, but the approach still seems solid here. I think this is a solid patch

erichkeane · 2025-04-14T13:42:18Z

clang/include/clang/Sema/Overload.h

@@ -407,6 +407,18 @@ class Sema;
             Third == ICK_Identity;
    }

+    bool isPerfect(const ASTContext &C) const {


I would LOVE a comment here, the implementation isn't particularly clear what is going on here.

Still would like comments on these functions that say what 'perfect' means here.

clang/include/clang/Sema/Overload.h

erichkeane · 2025-04-14T13:47:42Z

clang/include/clang/Sema/Overload.h

@@ -1015,6 +1045,62 @@ class Sema;
          RewriteKind(CRK_None) {}
  };

+  struct DeferredTemplateOverloadCandidate {
+    DeferredTemplateOverloadCandidate *Next = nullptr;


Comment here + on the slab allocator function as to WHY we are doing this instead of a std::vector/etc would probably be helpful. I understand it, but I probably won't next time I see this code ;)

clang/include/clang/Sema/Overload.h

erichkeane

I have a handful of comments that are all nits/minor enough I trust you on, but overall this looks fine.

Lets ship it!

erichkeane · 2025-04-15T14:29:59Z

clang/docs/ReleaseNotes.rst

@@ -96,6 +96,12 @@ C++ Language Changes
      asm((std::string_view("nop")) ::: (std::string_view("memory")));
    }

+- Clang now implements the changes to overload resolution proposed by `P3606 <https://wg21.link/P3606R0>`_.


Can we clarify just parts 1 and 2?

erichkeane · 2025-04-15T14:30:38Z

clang/include/clang/Sema/Overload.h

@@ -407,6 +407,18 @@ class Sema;
             Third == ICK_Identity;
    }

+    bool isPerfect(const ASTContext &C) const {


Still would like comments on these functions that say what 'perfect' means here.

erichkeane · 2025-04-15T14:32:10Z

clang/include/clang/Sema/Overload.h

@@ -743,6 +759,13 @@ class Sema;
      Standard.setAllToTypes(T);
    }

+    /// A conversion sequence is perfect if


this comment is good here, add to 410 would be nice as well.

clang/include/clang/Sema/Overload.h

erichkeane · 2025-04-15T14:35:17Z

clang/include/clang/Sema/Overload.h

@@ -1043,6 +1141,8 @@ class Sema;
      /// C++ [over.match.call.general]
      /// Resolve a call through the address of an overload set.
      CSK_AddressOfOverloadSet,
+
+      CSK_CodeCompletion,


What is happening here? I'm a touch confused.

When doing code completion, we do overload resolution ( see ProduceCallSignatureHelp) - In those cases, we want to always find all candidates, even if there is a candidate that would be a perfect match for the somewhat fake type or set of types that are used to produce the overload set so we can display them all.

Or at least that's my best understanding of it - I felt that this patch should not affect the behavior of code completion in any way.

Ah, I see. Can you clarify that in a comment above this as to why this is a separate category? It was pretty jarring to see in general, let alone here.

clang/include/clang/Sema/Overload.h

erichkeane · 2025-04-15T14:38:43Z

clang/include/clang/Sema/Overload.h

+    llvm::MutableArrayRef<Expr *> getPersistentArgsArray(T *...Exprs) {
+      llvm::MutableArrayRef<Expr *> Arr =
+          getPersistentArgsArray(sizeof...(Exprs));
+      llvm::copy(std::initializer_list<Expr *>{Exprs...}, Arr.data());


Suggested change

llvm::copy(std::initializer_list<Expr *>{Exprs...}, Arr.data());

llvm::uninitialized_copy(std::initializer_list<Expr *>{Exprs...}, Arr.data());

llvm::uninitialized_copy does not exist, and std::uninitialized_copy takes iterators, so I would have to materialize the list. Given that we are copying pointers I don't think it matters, so I have not applied that piece of feedback.

I'm happy to if you feel it's important

ah, hrmph, thats unfortunate. There is A good reason that was brought up on me in the past, but everything I can remember/think of is sort of a "well, i guess it is 'more' correct'" when it comes to pointers. So don't change it here, as the init-list stuff makes it more error prone.

clang/include/clang/Sema/Sema.h

zyn0217

Some nits, and thanks for helping us in this area!

zyn0217 · 2025-04-16T06:54:40Z

clang/lib/Sema/SemaOverload.cpp

@@ -7919,7 +7951,7 @@ void Sema::AddTemplateOverloadCandidate(
    Candidate.Function = FunctionTemplate->getTemplatedDecl();
    Candidate.Viable = false;
    Candidate.RewriteKind =
-      CandidateSet.getRewriteInfo().getRewriteKind(Candidate.Function, PO);
+        CandidateSet.getRewriteInfo().getRewriteKind(Candidate.Function, PO);


zyn0217 · 2025-04-16T06:56:22Z

clang/lib/Sema/SemaOverload.cpp

@@ -7834,7 +7838,7 @@ void Sema::AddMethodTemplateCandidate(
    Candidate.Function = MethodTmpl->getTemplatedDecl();
    Candidate.Viable = false;
    Candidate.RewriteKind =
-      CandidateSet.getRewriteInfo().getRewriteKind(Candidate.Function, PO);
+        CandidateSet.getRewriteInfo().getRewriteKind(Candidate.Function, PO);


zyn0217 · 2025-04-16T06:56:54Z

clang/lib/Sema/SemaOverload.cpp

@@ -45,6 +45,7 @@
 #include <cstddef>
 #include <cstdlib>
 #include <optional>
+#include <variant>


We no longer use variant, right?

…match exists This implements the same overload resolution behavior as GCC, as described in https://wg21.link/p3606 (section 1-2, not 3) If during overload resolution, there is a non-template candidate that would be always be picked - because each of the argument is a perfect match (ie the source and target types are the same), we do not perform deduction for any template candidate that might exists. The goal is to be able to merge llvm#122423 without being too disruptive. This change means that the selection of the best viable candidate and template argument deduction become interleaved. To avoid rewriting half of Clang we store in `OverloadCandidateSet` enough information to be able to deduce template candidates from `OverloadCandidateSet::BestViableFunction`. Which means the lifetime of any object used by template argument must outlive a call to `Add*Template*Candidate`. This two phase resolution is not performed for some initialization as there are cases where template candidate are better match in these cases per the standard. It's also bypassed for code completion. The change has a nice impact on compile times https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au Fixes llvm#62096 Fixes llvm#74581

* Prefer a constructor template over a non-template conversion function

- We find a template conversion candidate - A constructor with a dependent explicit specifier

…ct match exists" (#135993) Reverts #133426 This is failing on some bots https://lab.llvm.org/buildbot/#/builders/163/builds/17265

… if a perfect match exists" (#135993) Reverts llvm/llvm-project#133426 This is failing on some bots https://lab.llvm.org/buildbot/#/builders/163/builds/17265

…ts (#136018) This implements the same overload resolution behavior as GCC, as described in https://wg21.link/p3606 (section 1-2, not 3) If during overload resolution, there is a non-template candidate that would be always be picked - because each of the argument is a perfect match (ie the source and target types are the same), we do not perform deduction for any template candidate that might exists. The goal is to be able to merge #122423 without being too disruptive. This change means that the selection of the best viable candidate and template argument deduction become interleaved. To avoid rewriting half of Clang we store in `OverloadCandidateSet` enough information to be able to deduce template candidates from `OverloadCandidateSet::BestViableFunction`. Which means the lifetime of any object used by template argument must outlive a call to `Add*Template*Candidate`. This two phase resolution is not performed for some initialization as there are cases where template candidate are better match in these cases per the standard. It's also bypassed for code completion. The change has a nice impact on compile times https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au Fixes #62096 Fixes #74581 Reapplies #133426

… exists (llvm#133426) This implements the same overload resolution behavior as GCC, as described in https://wg21.link/p3606 (sections 1-2, not 3) If, during overload resolution, a non-template candidate is always picked because each argument is a perfect match (i.e., the source and target types are the same), we do not perform deduction for any template candidate that might exist. The goal is to be able to merge llvm#122423 without being too disruptive. This change means that the selection of the best viable candidate and template argument deduction become interleaved. To avoid rewriting half of Clang, we store in `OverloadCandidateSet` enough information to deduce template candidates from `OverloadCandidateSet::BestViableFunction`. This means the lifetime of any object used by the template argument must outlive a call to `Add*Template*Candidate`. This two-phase resolution is not performed for some initialization as there are cases where template candidates are a better match per the standard. It's also bypassed for code completion. The change has a nice impact on compile times https://llvm-compile-time-tracker.com/compare.php?from=edc22c64e527171041876f26a491bb1d03d905d5&to=8170b860bd4b70917005796c05a9be013a95abb2&stat=instructions%3Au Fixes llvm#62096 Fixes llvm#74581 Fixes llvm#53454

…ct match exists" (llvm#135993) Reverts llvm#133426 This is failing on some bots https://lab.llvm.org/buildbot/#/builders/163/builds/17265

…ts (llvm#136018) This implements the same overload resolution behavior as GCC, as described in https://wg21.link/p3606 (section 1-2, not 3) If during overload resolution, there is a non-template candidate that would be always be picked - because each of the argument is a perfect match (ie the source and target types are the same), we do not perform deduction for any template candidate that might exists. The goal is to be able to merge llvm#122423 without being too disruptive. This change means that the selection of the best viable candidate and template argument deduction become interleaved. To avoid rewriting half of Clang we store in `OverloadCandidateSet` enough information to be able to deduce template candidates from `OverloadCandidateSet::BestViableFunction`. Which means the lifetime of any object used by template argument must outlive a call to `Add*Template*Candidate`. This two phase resolution is not performed for some initialization as there are cases where template candidate are better match in these cases per the standard. It's also bypassed for code completion. The change has a nice impact on compile times https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au Fixes llvm#62096 Fixes llvm#74581 Reapplies llvm#133426

…ts (#136203) This implements the same overload resolution behavior as GCC, as described in https://wg21.link/p3606 (section 1-2, not 3) If during overload resolution, there is a non-template candidate that would be always be picked - because each of the argument is a perfect match (ie the source and target types are the same), we do not perform deduction for any template candidate that might exists. The goal is to be able to merge #122423 without being too disruptive. This change means that the selection of the best viable candidate and template argument deduction become interleaved. To avoid rewriting half of Clang we store in OverloadCandidateSet enough information to be able to deduce template candidates from OverloadCandidateSet::BestViableFunction. Which means the lifetime of any object used by template argument must outlive a call to Add*Template*Candidate. This two phase resolution is not performed for some initialization as there are cases where template candidate are better match in these cases per the standard. It's also bypassed for code completion. The change has a nice impact on compile times https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au Fixes #62096 Fixes #74581 Reapplies #133426

… match exists (#136203) This implements the same overload resolution behavior as GCC, as described in https://wg21.link/p3606 (section 1-2, not 3) If during overload resolution, there is a non-template candidate that would be always be picked - because each of the argument is a perfect match (ie the source and target types are the same), we do not perform deduction for any template candidate that might exists. The goal is to be able to merge llvm/llvm-project#122423 without being too disruptive. This change means that the selection of the best viable candidate and template argument deduction become interleaved. To avoid rewriting half of Clang we store in OverloadCandidateSet enough information to be able to deduce template candidates from OverloadCandidateSet::BestViableFunction. Which means the lifetime of any object used by template argument must outlive a call to Add*Template*Candidate. This two phase resolution is not performed for some initialization as there are cases where template candidate are better match in these cases per the standard. It's also bypassed for code completion. The change has a nice impact on compile times https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au Fixes llvm/llvm-project#62096 Fixes llvm/llvm-project#74581 Reapplies llvm/llvm-project#133426

nikic · 2025-04-18T08:48:31Z

The final version had a 4% improvement on clang bootstrap: https://llvm-compile-time-tracker.com/compare.php?from=23324b8b109aed1f77cb20cef476b795f33b6835&to=8c5a307bd8d406e6167a5cd3ce3c74e2e3bfb2a6&stat=instructions:u

Wow.

…ts (llvm#136203) This implements the same overload resolution behavior as GCC, as described in https://wg21.link/p3606 (section 1-2, not 3) If during overload resolution, there is a non-template candidate that would be always be picked - because each of the argument is a perfect match (ie the source and target types are the same), we do not perform deduction for any template candidate that might exists. The goal is to be able to merge llvm#122423 without being too disruptive. This change means that the selection of the best viable candidate and template argument deduction become interleaved. To avoid rewriting half of Clang we store in OverloadCandidateSet enough information to be able to deduce template candidates from OverloadCandidateSet::BestViableFunction. Which means the lifetime of any object used by template argument must outlive a call to Add*Template*Candidate. This two phase resolution is not performed for some initialization as there are cases where template candidate are better match in these cases per the standard. It's also bypassed for code completion. The change has a nice impact on compile times https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au Fixes llvm#62096 Fixes llvm#74581 Reapplies llvm#133426

cor3ntin requested review from AaronBallman, erichkeane, Sirraide, zyn0217 and zygoloid March 28, 2025 11:14

zyn0217 reviewed Mar 28, 2025

View reviewed changes

cor3ntin changed the title ~~[Clang][WIP][RFC] Bypass TAD during overload resolution if a perfect match exists~~ [Clang][RFC] Bypass TAD during overload resolution if a perfect match exists Mar 31, 2025

cor3ntin marked this pull request as ready for review March 31, 2025 15:32

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Mar 31, 2025

Sirraide reviewed Mar 31, 2025

View reviewed changes

clang/lib/Sema/SemaInit.cpp Outdated Show resolved Hide resolved

clang/lib/Sema/SemaOverload.cpp Outdated Show resolved Hide resolved

zygoloid reviewed Mar 31, 2025

View reviewed changes

clang/lib/Sema/SemaOverload.cpp Outdated Show resolved Hide resolved

clang/lib/Sema/SemaOverload.cpp Outdated Show resolved Hide resolved

zygoloid reviewed Mar 31, 2025

View reviewed changes

zyn0217 reviewed Apr 1, 2025

View reviewed changes

erichkeane reviewed Apr 1, 2025

View reviewed changes

clang/include/clang/Sema/Overload.h Outdated Show resolved Hide resolved

cor3ntin force-pushed the corentin/delay_overload branch from b8b812d to 8170b86 Compare April 3, 2025 13:54

cor3ntin force-pushed the corentin/delay_overload branch from 37ef323 to 9967287 Compare April 14, 2025 08:14

erichkeane reviewed Apr 14, 2025

View reviewed changes

erichkeane approved these changes Apr 15, 2025

View reviewed changes

cor3ntin force-pushed the corentin/delay_overload branch 2 times, most recently from ddcad96 to 4c1ddfa Compare April 16, 2025 06:22

zyn0217 approved these changes Apr 16, 2025

View reviewed changes

cor3ntin force-pushed the corentin/delay_overload branch from 4c1ddfa to f5e899d Compare April 16, 2025 12:00

cor3ntin added 4 commits April 16, 2025 17:03

avoid comparing types

d35cfbb

Fix logic

297109c

optimize

f1b0454

cor3ntin added 14 commits April 16, 2025 17:03

Fix typos and cuda tests

a21bc4d

remove redundant check

b205be1

More code simplification

af22b19

fix cuda, add comments

42e8795

add tests

03c157a

* Fix handling of explicit specifiers

0709939

* Prefer a constructor template over a non-template conversion function

slab allocate template candidates

a02299e

fix rvalue to lvalue binding

8ee938c

Disable resolution by perfect match when

bcf08f5

- We find a template conversion candidate - A constructor with a dependent explicit specifier

address Erich's, format, add release notes

8aee255

address more of Erich's feedback

6edcb65

Clarify changelog

7b0c9c6

fix assertion

c621548

cleanups

cfecf5a

cor3ntin force-pushed the corentin/delay_overload branch from f5e899d to cfecf5a Compare April 16, 2025 15:03

cor3ntin merged commit facc57f into llvm:main Apr 16, 2025
10 of 12 checks passed

cor3ntin mentioned this pull request Apr 16, 2025

Revert "[Clang][RFC] Bypass TAD during overload resolution if a perfect match exists" #135993

Merged

cor3ntin added a commit that referenced this pull request Apr 16, 2025

Revert "[Clang][RFC] Bypass TAD during overload resolution if a perfe…

6ccc928

…ct match exists" (#135993) Reverts #133426 This is failing on some bots https://lab.llvm.org/buildbot/#/builders/163/builds/17265

cor3ntin mentioned this pull request Apr 16, 2025

[Clang] Bypass TAD during overload resolution if a perfect match exists #136018

Merged

cor3ntin mentioned this pull request Apr 17, 2025

[Clang] Bypass TAD during overload resolution if a perfect match exists #136203

Merged

		24 * sizeof(ImplicitConversionSequence);
		32 * sizeof(ImplicitConversionSequence);

	llvm::copy(std::initializer_list<Expr *>{Exprs...}, Arr.data());
	llvm::uninitialized_copy(std::initializer_list<Expr *>{Exprs...}, Arr.data());

[Clang][RFC] Bypass TAD during overload resolution if a perfect match exists #133426

[Clang][RFC] Bypass TAD during overload resolution if a perfect match exists #133426

Uh oh!

Conversation

cor3ntin commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cor3ntin commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 31, 2025

Uh oh!

Sirraide left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zygoloid left a comment

Choose a reason for hiding this comment

Uh oh!

zyn0217 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

erichkeane left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

erichkeane left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

erichkeane left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

cor3ntin commented Mar 28, 2025 •

edited

Loading

cor3ntin commented Mar 28, 2025 •

edited

Loading

github-actions bot commented Mar 29, 2025 •

edited

Loading