Skip to content

[Clang][RFC] Bypass TAD during overload resolution if a perfect match exists #133426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Apr 16, 2025

Conversation

cor3ntin
Copy link
Contributor

@cor3ntin cor3ntin commented Mar 28, 2025

This implements the same overload resolution behavior as GCC,
as described in https://wg21.link/p3606 (sections 1-2, not 3)

If, during overload resolution, a non-template candidate is always picked because each argument is a perfect match (i.e., the source and target types are the same), we do not perform deduction for any template candidate that might exist.

The goal is to be able to merge #122423 without being too disruptive.

This change means that the selection of the best viable candidate and template argument deduction become interleaved.

To avoid rewriting half of Clang, we store in OverloadCandidateSet enough information to deduce template candidates from OverloadCandidateSet::BestViableFunction. This means the lifetime of any object used by the template argument must outlive a call to Add*Template*Candidate.

This two-phase resolution is not performed for some initialization as there are cases where template candidates are a better match per the standard. It's also bypassed for code completion.

The change has a nice impact on compile times
https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au .

Fixes #62096
Fixes #74581
Fixes #53454

@cor3ntin
Copy link
Contributor Author

cor3ntin commented Mar 28, 2025

  • Release note
  • More Tests
  • More comments

static void AddTemplateOverloadCandidate(
Sema &S, OverloadCandidateSet &CandidateSet,
NonDeducedConversionTemplateOverloadCandidate &&C) {
return S.AddTemplateConversionCandidateImmediately(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse InjectNonDeducedTemplateCandidates for *Immediately functions or vice versa?

Comment on lines +11179 to -10988
llvm::SmallVector<OverloadCandidate *, 4> PendingBest;
llvm::SmallVector<const NamedDecl *, 4> EquivalentCands;

llvm::SmallVector<OverloadCandidate*, 4> PendingBest;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated changes (sorry for nitpicks at this point)

Copy link

github-actions bot commented Mar 29, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@cor3ntin cor3ntin changed the title [Clang][WIP][RFC] Bypass TAD during overload resolution if a perfect match exists [Clang][RFC] Bypass TAD during overload resolution if a perfect match exists Mar 31, 2025
@cor3ntin cor3ntin marked this pull request as ready for review March 31, 2025 15:32
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Mar 31, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 31, 2025

@llvm/pr-subscribers-clang

Author: cor3ntin (cor3ntin)

Changes

This implements the same overload resolution behavior as GCC,
as described in https://wg21.link/p3606 (sections 1-2, not 3)

If, during overload resolution, a non-template candidate is always picked because each argument is a perfect match (i.e., the source and target types are the same), we do not perform deduction for any template candidate that might exist.

The goal is to be able to merge #122423 without being too disruptive.

This change means that the selection of the best viable candidate and template argument deduction become interleaved.

To avoid rewriting half of Clang, we store in OverloadCandidateSet enough information to deduce template candidates from OverloadCandidateSet::BestViableFunction. This means the lifetime of any object used by the template argument must outlive a call to Add*Template*Candidate.

This two-phase resolution is not performed for some initialization as there are cases where template candidates are a better match per the standard. It's also bypassed for code completion.

The change has a nice impact on compile times
https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&amp;to=bf7041045c9408490c395230047c5461de72fc39&amp;stat=instructions%3Au .

Fixes #62096
Fixes #74581


Patch is 60.82 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/133426.diff

9 Files Affected:

  • (modified) clang/include/clang/Sema/Overload.h (+150-7)
  • (modified) clang/include/clang/Sema/Sema.h (+2)
  • (modified) clang/lib/Sema/SemaCodeComplete.cpp (+4-2)
  • (modified) clang/lib/Sema/SemaInit.cpp (+11-4)
  • (modified) clang/lib/Sema/SemaOverload.cpp (+344-101)
  • (modified) clang/test/CXX/temp/temp.constr/temp.constr.atomic/constrant-satisfaction-conversions.cpp (+4-4)
  • (modified) clang/test/SemaCXX/implicit-member-functions.cpp (+7-14)
  • (modified) clang/test/SemaTemplate/instantiate-function-params.cpp (+3-4)
  • (modified) clang/test/Templight/templight-empty-entries-fix.cpp (+51-75)
diff --git a/clang/include/clang/Sema/Overload.h b/clang/include/clang/Sema/Overload.h
index 6e08762dcc6d7..547845c01db11 100644
--- a/clang/include/clang/Sema/Overload.h
+++ b/clang/include/clang/Sema/Overload.h
@@ -38,6 +38,7 @@
 #include <cstddef>
 #include <cstdint>
 #include <utility>
+#include <variant>
 
 namespace clang {
 
@@ -407,6 +408,11 @@ class Sema;
              Third == ICK_Identity;
     }
 
+    bool isPerfect(const ASTContext &C) const {
+      return isIdentityConversion() &&
+             (!ReferenceBinding || C.hasSameType(getFromType(), getToType(2)));
+    }
+
     ImplicitConversionRank getRank() const;
     NarrowingKind
     getNarrowingKind(ASTContext &Context, const Expr *Converted,
@@ -743,6 +749,10 @@ class Sema;
       Standard.setAllToTypes(T);
     }
 
+    bool isPerfect(const ASTContext &C) const {
+      return isStandard() && Standard.isPerfect(C);
+    }
+
     // True iff this is a conversion sequence from an initializer list to an
     // array or std::initializer.
     bool hasInitializerListContainerType() const {
@@ -979,6 +989,20 @@ class Sema;
       return false;
     }
 
+    bool isPerfectMatch(const ASTContext &Ctx) const {
+      if (!Viable)
+        return false;
+      for (const auto &C : Conversions) {
+        if (!C.isInitialized())
+          return false;
+        if (!C.isPerfect(Ctx))
+          return false;
+      }
+      if (isa_and_nonnull<CXXConversionDecl>(Function))
+        return FinalConversion.isPerfect(Ctx);
+      return true;
+    }
+
     bool TryToFixBadConversion(unsigned Idx, Sema &S) {
       bool CanFix = Fix.tryToFixConversion(
                       Conversions[Idx].Bad.FromExpr,
@@ -1015,6 +1039,61 @@ class Sema;
           RewriteKind(CRK_None) {}
   };
 
+  struct DeferredConversionTemplateOverloadCandidate {
+    FunctionTemplateDecl *FunctionTemplate;
+    DeclAccessPair FoundDecl;
+    CXXRecordDecl *ActingContext;
+    Expr *From;
+    QualType ToType;
+
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned AllowObjCConversionOnExplicit : 1;
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned AllowExplicit : 1;
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned AllowResultConversion : 1;
+  };
+
+  struct DeferredMethodTemplateOverloadCandidate {
+    FunctionTemplateDecl *FunctionTemplate;
+    DeclAccessPair FoundDecl;
+    ArrayRef<Expr *> Args;
+    CXXRecordDecl *ActingContext;
+    Expr::Classification ObjectClassification;
+    QualType ObjectType;
+
+    OverloadCandidateParamOrder PO;
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned SuppressUserConversions : 1;
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned PartialOverloading : 1;
+  };
+
+  struct DeferredFunctionTemplateOverloadCandidate {
+    FunctionTemplateDecl *FunctionTemplate;
+    DeclAccessPair FoundDecl;
+    ArrayRef<Expr *> Args;
+
+    CallExpr::ADLCallKind IsADLCandidate;
+    OverloadCandidateParamOrder PO;
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned SuppressUserConversions : 1;
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned PartialOverloading : 1;
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned AllowExplicit : 1;
+    LLVM_PREFERRED_TYPE(bool)
+    unsigned AggregateCandidateDeduction : 1;
+  };
+
+  using DeferredTemplateOverloadCandidate =
+      std::variant<DeferredConversionTemplateOverloadCandidate,
+                   DeferredMethodTemplateOverloadCandidate,
+                   DeferredFunctionTemplateOverloadCandidate>;
+
+  static_assert(
+      std::is_trivially_destructible_v<DeferredTemplateOverloadCandidate>);
+
   /// OverloadCandidateSet - A set of overload candidates, used in C++
   /// overload resolution (C++ 13.3).
   class OverloadCandidateSet {
@@ -1043,6 +1122,8 @@ class Sema;
       /// C++ [over.match.call.general]
       /// Resolve a call through the address of an overload set.
       CSK_AddressOfOverloadSet,
+
+      CSK_CodeCompletion,
     };
 
     /// Information about operator rewrites to consider when adding operator
@@ -1116,6 +1197,7 @@ class Sema;
   private:
     SmallVector<OverloadCandidate, 16> Candidates;
     llvm::SmallPtrSet<uintptr_t, 16> Functions;
+    SmallVector<DeferredTemplateOverloadCandidate, 8> DeferredCandidates;
 
     // Allocator for ConversionSequenceLists. We store the first few of these
     // inline to avoid allocation for small sets.
@@ -1126,7 +1208,8 @@ class Sema;
     OperatorRewriteInfo RewriteInfo;
 
     constexpr static unsigned NumInlineBytes =
-        24 * sizeof(ImplicitConversionSequence);
+        32 * sizeof(ImplicitConversionSequence);
+
     unsigned NumInlineBytesUsed = 0;
     alignas(void *) char InlineSpace[NumInlineBytes];
 
@@ -1137,15 +1220,13 @@ class Sema;
     /// from the slab allocator.
     /// FIXME: It would probably be nice to have a SmallBumpPtrAllocator
     /// instead.
-    /// FIXME: Now that this only allocates ImplicitConversionSequences, do we
-    /// want to un-generalize this?
     template <typename T>
     T *slabAllocate(unsigned N) {
       // It's simpler if this doesn't need to consider alignment.
       static_assert(alignof(T) == alignof(void *),
                     "Only works for pointer-aligned types.");
-      static_assert(std::is_trivial<T>::value ||
-                        std::is_same<ImplicitConversionSequence, T>::value,
+      static_assert(std::is_trivially_destructible_v<T> ||
+                        (std::is_same_v<ImplicitConversionSequence, T>),
                     "Add destruction logic to OverloadCandidateSet::clear().");
 
       unsigned NBytes = sizeof(T) * N;
@@ -1176,6 +1257,9 @@ class Sema;
     /// Whether diagnostics should be deferred.
     bool shouldDeferDiags(Sema &S, ArrayRef<Expr *> Args, SourceLocation OpLoc);
 
+    // Whether the resolution of template candidates should be defered
+    bool shouldDeferTemplateArgumentDeduction(const LangOptions &Opts) const;
+
     /// Determine when this overload candidate will be new to the
     /// overload set.
     bool isNewCandidate(Decl *F, OverloadCandidateParamOrder PO =
@@ -1199,8 +1283,12 @@ class Sema;
     iterator begin() { return Candidates.begin(); }
     iterator end() { return Candidates.end(); }
 
-    size_t size() const { return Candidates.size(); }
-    bool empty() const { return Candidates.empty(); }
+    size_t size() const {
+      return Candidates.size() + DeferredCandidates.size();
+    }
+    bool empty() const {
+      return Candidates.empty() && DeferredCandidates.empty();
+    }
 
     /// Allocate storage for conversion sequences for NumConversions
     /// conversions.
@@ -1216,6 +1304,19 @@ class Sema;
       return ConversionSequenceList(Conversions, NumConversions);
     }
 
+    llvm::MutableArrayRef<Expr *> getPersistentArgsArray(unsigned N) {
+      Expr **Exprs = slabAllocate<Expr *>(N);
+      return llvm::MutableArrayRef<Expr *>(Exprs, N);
+    }
+
+    template <typename... T>
+    llvm::MutableArrayRef<Expr *> getPersistentArgsArray(T *...Exprs) {
+      llvm::MutableArrayRef<Expr *> Arr =
+          getPersistentArgsArray(sizeof...(Exprs));
+      llvm::copy(std::initializer_list<Expr *>{Exprs...}, Arr.data());
+      return Arr;
+    }
+
     /// Add a new candidate with NumConversions conversion sequence slots
     /// to the overload set.
     OverloadCandidate &addCandidate(unsigned NumConversions = 0,
@@ -1231,6 +1332,28 @@ class Sema;
       return C;
     }
 
+    void AddDeferredTemplateCandidate(
+        FunctionTemplateDecl *FunctionTemplate, DeclAccessPair FoundDecl,
+        ArrayRef<Expr *> Args, bool SuppressUserConversions,
+        bool PartialOverloading, bool AllowExplicit,
+        CallExpr::ADLCallKind IsADLCandidate, OverloadCandidateParamOrder PO,
+        bool AggregateCandidateDeduction);
+
+    void AddDeferredMethodTemplateCandidate(
+        FunctionTemplateDecl *MethodTmpl, DeclAccessPair FoundDecl,
+        CXXRecordDecl *ActingContext, QualType ObjectType,
+        Expr::Classification ObjectClassification, ArrayRef<Expr *> Args,
+        bool SuppressUserConversions, bool PartialOverloading,
+        OverloadCandidateParamOrder PO);
+
+    void AddDeferredConversionTemplateCandidate(
+        FunctionTemplateDecl *FunctionTemplate, DeclAccessPair FoundDecl,
+        CXXRecordDecl *ActingContext, Expr *From, QualType ToType,
+        bool AllowObjCConversionOnExplicit, bool AllowExplicit,
+        bool AllowResultConversion);
+
+    void InjectNonDeducedTemplateCandidates(Sema &S);
+
     /// Find the best viable function on this overload set, if it exists.
     OverloadingResult BestViableFunction(Sema &S, SourceLocation Loc,
                                          OverloadCandidateSet::iterator& Best);
@@ -1263,6 +1386,14 @@ class Sema;
       DestAS = AS;
     }
 
+  private:
+    OverloadingResult ResultForBestCandidate(const iterator &Best);
+    void CudaExcludeWrongSideCandidates(Sema &S);
+    OverloadingResult
+    BestViableFunctionImpl(Sema &S, SourceLocation Loc,
+                           OverloadCandidateSet::iterator &Best);
+    void PerfectViableFunction(Sema &S, SourceLocation Loc,
+                               OverloadCandidateSet::iterator &Best);
   };
 
   bool isBetterOverloadCandidate(Sema &S, const OverloadCandidate &Cand1,
@@ -1311,6 +1442,18 @@ class Sema;
   // parameter.
   bool shouldEnforceArgLimit(bool PartialOverloading, FunctionDecl *Function);
 
+  inline bool OverloadCandidateSet::shouldDeferTemplateArgumentDeduction(
+      const LangOptions &Opts) const {
+    return
+        // For user defined conversion we need to check against different
+        // combination of CV qualifiers and look at any expicit specifier, so
+        // always deduce template candidate.
+        Kind != CSK_InitByUserDefinedConversion
+        && Kind != CSK_InitByConstructor
+        && Kind != CSK_CodeCompletion &&
+        Opts.CPlusPlus && (!Opts.CUDA || Opts.GPUExcludeWrongSideOverloads);
+  }
+
 } // namespace clang
 
 #endif // LLVM_CLANG_SEMA_OVERLOAD_H
diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index 066bce61c74c1..ab2a94584d46c 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -60,6 +60,7 @@
 #include "clang/Sema/DeclSpec.h"
 #include "clang/Sema/ExternalSemaSource.h"
 #include "clang/Sema/IdentifierResolver.h"
+#include "clang/Sema/Overload.h"
 #include "clang/Sema/Ownership.h"
 #include "clang/Sema/ParsedAttr.h"
 #include "clang/Sema/Redeclaration.h"
@@ -10345,6 +10346,7 @@ class Sema final : public SemaBase {
   /// Add a C++ function template specialization as a candidate
   /// in the candidate set, using template argument deduction to produce
   /// an appropriate function template specialization.
+
   void AddTemplateOverloadCandidate(
       FunctionTemplateDecl *FunctionTemplate, DeclAccessPair FoundDecl,
       TemplateArgumentListInfo *ExplicitTemplateArgs, ArrayRef<Expr *> Args,
diff --git a/clang/lib/Sema/SemaCodeComplete.cpp b/clang/lib/Sema/SemaCodeComplete.cpp
index 2003701b65654..e314f6859d71c 100644
--- a/clang/lib/Sema/SemaCodeComplete.cpp
+++ b/clang/lib/Sema/SemaCodeComplete.cpp
@@ -6364,7 +6364,8 @@ SemaCodeCompletion::ProduceCallSignatureHelp(Expr *Fn, ArrayRef<Expr *> Args,
   Expr *NakedFn = Fn->IgnoreParenCasts();
   // Build an overload candidate set based on the functions we find.
   SourceLocation Loc = Fn->getExprLoc();
-  OverloadCandidateSet CandidateSet(Loc, OverloadCandidateSet::CSK_Normal);
+  OverloadCandidateSet CandidateSet(Loc,
+                                    OverloadCandidateSet::CSK_CodeCompletion);
 
   if (auto ULE = dyn_cast<UnresolvedLookupExpr>(NakedFn)) {
     SemaRef.AddOverloadedCallCandidates(ULE, ArgsWithoutDependentTypes,
@@ -6567,7 +6568,8 @@ QualType SemaCodeCompletion::ProduceConstructorSignatureHelp(
   // FIXME: Provide support for variadic template constructors.
 
   if (CRD) {
-    OverloadCandidateSet CandidateSet(Loc, OverloadCandidateSet::CSK_Normal);
+    OverloadCandidateSet CandidateSet(Loc,
+                                      OverloadCandidateSet::CSK_CodeCompletion);
     for (NamedDecl *C : SemaRef.LookupConstructors(CRD)) {
       if (auto *FD = dyn_cast<FunctionDecl>(C)) {
         // FIXME: we can't yet provide correct signature help for initializer
diff --git a/clang/lib/Sema/SemaInit.cpp b/clang/lib/Sema/SemaInit.cpp
index 9814c3f456f0d..a8ab43776a6de 100644
--- a/clang/lib/Sema/SemaInit.cpp
+++ b/clang/lib/Sema/SemaInit.cpp
@@ -10043,12 +10043,19 @@ QualType Sema::DeduceTemplateSpecializationFromInitializer(
     //   When [...] the constructor [...] is a candidate by
     //    - [over.match.copy] (in all cases)
     if (TD) {
-      SmallVector<Expr *, 8> TmpInits;
-      for (Expr *E : Inits)
+
+      // As template candidates are not deduced immediately,
+      // persist the arry in the overload set.
+      MutableArrayRef<Expr *> TmpInits =
+          Candidates.getPersistentArgsArray(Inits.size());
+
+      for (auto [I, E] : llvm::enumerate(Inits)) {
         if (auto *DI = dyn_cast<DesignatedInitExpr>(E))
-          TmpInits.push_back(DI->getInit());
+          TmpInits[I] = DI->getInit();
         else
-          TmpInits.push_back(E);
+          TmpInits[I] = E;
+      }
+
       AddTemplateOverloadCandidate(
           TD, FoundDecl, /*ExplicitArgs=*/nullptr, TmpInits, Candidates,
           /*SuppressUserConversions=*/false,
diff --git a/clang/lib/Sema/SemaOverload.cpp b/clang/lib/Sema/SemaOverload.cpp
index 6d8006b35dcf4..faf357b010580 100644
--- a/clang/lib/Sema/SemaOverload.cpp
+++ b/clang/lib/Sema/SemaOverload.cpp
@@ -31,6 +31,7 @@
 #include "clang/Sema/Lookup.h"
 #include "clang/Sema/Overload.h"
 #include "clang/Sema/SemaCUDA.h"
+#include "clang/Sema/SemaCodeCompletion.h"
 #include "clang/Sema/SemaObjC.h"
 #include "clang/Sema/Template.h"
 #include "clang/Sema/TemplateDeduction.h"
@@ -45,6 +46,7 @@
 #include <cstddef>
 #include <cstdlib>
 #include <optional>
+#include <variant>
 
 using namespace clang;
 using namespace sema;
@@ -7787,15 +7789,14 @@ void Sema::AddMethodCandidate(
   }
 }
 
-void Sema::AddMethodTemplateCandidate(
+static void AddMethodTemplateCandidateImmediately(
+    Sema &S, OverloadCandidateSet &CandidateSet,
     FunctionTemplateDecl *MethodTmpl, DeclAccessPair FoundDecl,
     CXXRecordDecl *ActingContext,
     TemplateArgumentListInfo *ExplicitTemplateArgs, QualType ObjectType,
     Expr::Classification ObjectClassification, ArrayRef<Expr *> Args,
-    OverloadCandidateSet &CandidateSet, bool SuppressUserConversions,
-    bool PartialOverloading, OverloadCandidateParamOrder PO) {
-  if (!CandidateSet.isNewCandidate(MethodTmpl, PO))
-    return;
+    bool SuppressUserConversions, bool PartialOverloading,
+    OverloadCandidateParamOrder PO) {
 
   // C++ [over.match.funcs]p7:
   //   In each case where a candidate is a function template, candidate
@@ -7809,12 +7810,12 @@ void Sema::AddMethodTemplateCandidate(
   TemplateDeductionInfo Info(CandidateSet.getLocation());
   FunctionDecl *Specialization = nullptr;
   ConversionSequenceList Conversions;
-  if (TemplateDeductionResult Result = DeduceTemplateArguments(
+  if (TemplateDeductionResult Result = S.DeduceTemplateArguments(
           MethodTmpl, ExplicitTemplateArgs, Args, Specialization, Info,
           PartialOverloading, /*AggregateDeductionCandidate=*/false,
           /*PartialOrdering=*/false, ObjectType, ObjectClassification,
           [&](ArrayRef<QualType> ParamTypes) {
-            return CheckNonDependentConversions(
+            return S.CheckNonDependentConversions(
                 MethodTmpl, ParamTypes, Args, CandidateSet, Conversions,
                 SuppressUserConversions, ActingContext, ObjectType,
                 ObjectClassification, PO);
@@ -7826,7 +7827,7 @@ void Sema::AddMethodTemplateCandidate(
     Candidate.Function = MethodTmpl->getTemplatedDecl();
     Candidate.Viable = false;
     Candidate.RewriteKind =
-      CandidateSet.getRewriteInfo().getRewriteKind(Candidate.Function, PO);
+        CandidateSet.getRewriteInfo().getRewriteKind(Candidate.Function, PO);
     Candidate.IsSurrogate = false;
     Candidate.IgnoreObjectArgument =
         cast<CXXMethodDecl>(Candidate.Function)->isStatic() ||
@@ -7836,8 +7837,8 @@ void Sema::AddMethodTemplateCandidate(
       Candidate.FailureKind = ovl_fail_bad_conversion;
     else {
       Candidate.FailureKind = ovl_fail_bad_deduction;
-      Candidate.DeductionFailure = MakeDeductionFailureInfo(Context, Result,
-                                                            Info);
+      Candidate.DeductionFailure =
+          MakeDeductionFailureInfo(S.Context, Result, Info);
     }
     return;
   }
@@ -7847,10 +7848,34 @@ void Sema::AddMethodTemplateCandidate(
   assert(Specialization && "Missing member function template specialization?");
   assert(isa<CXXMethodDecl>(Specialization) &&
          "Specialization is not a member function?");
-  AddMethodCandidate(cast<CXXMethodDecl>(Specialization), FoundDecl,
-                     ActingContext, ObjectType, ObjectClassification, Args,
-                     CandidateSet, SuppressUserConversions, PartialOverloading,
-                     Conversions, PO, Info.hasStrictPackMatch());
+  S.AddMethodCandidate(
+      cast<CXXMethodDecl>(Specialization), FoundDecl, ActingContext, ObjectType,
+      ObjectClassification, Args, CandidateSet, SuppressUserConversions,
+      PartialOverloading, Conversions, PO, Info.hasStrictPackMatch());
+}
+
+void Sema::AddMethodTemplateCandidate(
+    FunctionTemplateDecl *MethodTmpl, DeclAccessPair FoundDecl,
+    CXXRecordDecl *ActingContext,
+    TemplateArgumentListInfo *ExplicitTemplateArgs, QualType ObjectType,
+    Expr::Classification ObjectClassification, ArrayRef<Expr *> Args,
+    OverloadCandidateSet &CandidateSet, bool SuppressUserConversions,
+    bool PartialOverloading, OverloadCandidateParamOrder PO) {
+  if (!CandidateSet.isNewCandidate(MethodTmpl, PO))
+    return;
+
+  if (ExplicitTemplateArgs ||
+      !CandidateSet.shouldDeferTemplateArgumentDeduction(getLangOpts())) {
+    AddMethodTemplateCandidateImmediately(
+        *this, CandidateSet, MethodTmpl, FoundDecl, ActingContext,
+        ExplicitTemplateArgs, ObjectType, ObjectClassification, Args,
+        SuppressUserConversions, PartialOverloading, PO);
+    return;
+  }
+
+  CandidateSet.AddDeferredMethodTemplateCandidate(
+      MethodTmpl, FoundDecl, ActingContext, ObjectType, ObjectClassification,
+      Args, SuppressUserConversions, PartialOverloading, PO);
 }
 
 /// Determine whether a given function template has a simple explicit specifier
@@ -7859,14 +7884,13 @@ static bool isNonDependentlyExplicit(FunctionTemplateDecl *FTD) {
   return ExplicitSpecifier::getFromDecl(FTD->getTemplatedDecl()).isExplicit();
 }
 
-void Sema::AddTemplateOverloadCandidate(
+static void AddTemplateOverloadCandidateImmediately(
+    Sema &S, OverloadCandidateSet &CandidateSet,
     FunctionTemplateDecl *FunctionTemplate, DeclAccessPair FoundDecl,
     TemplateArgumentListInfo *ExplicitTemplateArgs, ArrayRef<Expr *> Args,
-    OverloadCandidateSet &CandidateSet, bool SuppressUserConversions,
-    bool PartialOverloading, bool AllowExplicit, ADLCallKind IsADLCandidate,
-    OverloadCandidateParamOrder PO, bool AggregateCandidateDeduction) {
-  if (!CandidateSet.isNewCandidate(FunctionTemplate, PO))
-    return;
+    bool SuppressUserConversions, bool PartialOverloading, bool AllowExplicit,
+    Sema::ADLCallKind IsADLCandidate, OverloadCandidateParamOrder PO,
+    bool AggregateCandidateDeduction) {
 
   // If the function template has a non-dependent explicit specification,
   // exclude it now if appropriate; we are not permitted to perform deduction
@@ -7893,14 +7917,14 @@ void Sema::AddTemplateOverloadCandidate(
                              FunctionTemplate->getTemplateDepth());
   FunctionDecl *Specialization = nullptr;
   ConversionSequenceList Conversions;
-  if (TemplateDeductionResult Result = DeduceTemplateArguments(
+  if (TemplateDeductionResult Result = S.DeduceTemplateArguments(
           FunctionTemplate, ExplicitTemplateArgs, Args, Specialization, Info,
           PartialOverloading, AggregateCandidateDeduction,
           /*PartialOrdering=*/false,
           /*ObjectType=*/QualType(),
           /*ObjectClassification=*/Expr::Classification(),
           [&](ArrayRef<QualType> ParamTypes) {
-            return CheckNonDependentConversions(
+            return S.CheckNonDependentConver...
[truncated]

Copy link
Member

@Sirraide Sirraide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few typos I noticed

Copy link
Collaborator

@zygoloid zygoloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do this. Even if WG21 decides that they only want to allow this and not require it, it seems like a good change that we stand a decent chance of reaching implementation consensus on.

Copy link
Contributor

@zyn0217 zyn0217 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines -1129 to +1241
24 * sizeof(ImplicitConversionSequence);
32 * sizeof(ImplicitConversionSequence);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some comments? what causes us to change it from 24 -> 32

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some. We may be storing additional data in the slab allocator ( When we need to keep the argument array expression around for deferred candidates), so I made the storage a tad bigger to accomodate. There is no reason for this thing to not be quite large frankly. it's short lived

Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree completely with Richard, we should just do this. We have prior art (GCC), the standard currently permits it as an implementation strategy, and it is an improvement in basically EVERY way with mild/barely-justifiable downsides.

@cor3ntin cor3ntin force-pushed the corentin/delay_overload branch from b8b812d to 8170b86 Compare April 3, 2025 13:54
@cor3ntin cor3ntin force-pushed the corentin/delay_overload branch from 37ef323 to 9967287 Compare April 14, 2025 08:14
Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few mean comments, but the approach still seems solid here. I think this is a solid patch

@@ -407,6 +407,18 @@ class Sema;
Third == ICK_Identity;
}

bool isPerfect(const ASTContext &C) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would LOVE a comment here, the implementation isn't particularly clear what is going on here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still would like comments on these functions that say what 'perfect' means here.

@@ -1015,6 +1045,62 @@ class Sema;
RewriteKind(CRK_None) {}
};

struct DeferredTemplateOverloadCandidate {
DeferredTemplateOverloadCandidate *Next = nullptr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment here + on the slab allocator function as to WHY we are doing this instead of a std::vector/etc would probably be helpful. I understand it, but I probably won't next time I see this code ;)

Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a handful of comments that are all nits/minor enough I trust you on, but overall this looks fine.

Lets ship it!

@@ -96,6 +96,12 @@ C++ Language Changes
asm((std::string_view("nop")) ::: (std::string_view("memory")));
}

- Clang now implements the changes to overload resolution proposed by `P3606 <https://wg21.link/P3606R0>`_.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify just parts 1 and 2?

@@ -407,6 +407,18 @@ class Sema;
Third == ICK_Identity;
}

bool isPerfect(const ASTContext &C) const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still would like comments on these functions that say what 'perfect' means here.

@@ -743,6 +759,13 @@ class Sema;
Standard.setAllToTypes(T);
}

/// A conversion sequence is perfect if
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is good here, add to 410 would be nice as well.

@@ -1043,6 +1141,8 @@ class Sema;
/// C++ [over.match.call.general]
/// Resolve a call through the address of an overload set.
CSK_AddressOfOverloadSet,

CSK_CodeCompletion,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is happening here? I'm a touch confused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When doing code completion, we do overload resolution ( see ProduceCallSignatureHelp) - In those cases, we want to always find all candidates, even if there is a candidate that would be a perfect match for the somewhat fake type or set of types that are used to produce the overload set so we can display them all.

Or at least that's my best understanding of it - I felt that this patch should not affect the behavior of code completion in any way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Can you clarify that in a comment above this as to why this is a separate category? It was pretty jarring to see in general, let alone here.

llvm::MutableArrayRef<Expr *> getPersistentArgsArray(T *...Exprs) {
llvm::MutableArrayRef<Expr *> Arr =
getPersistentArgsArray(sizeof...(Exprs));
llvm::copy(std::initializer_list<Expr *>{Exprs...}, Arr.data());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
llvm::copy(std::initializer_list<Expr *>{Exprs...}, Arr.data());
llvm::uninitialized_copy(std::initializer_list<Expr *>{Exprs...}, Arr.data());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm::uninitialized_copy does not exist, and std::uninitialized_copy takes iterators, so I would have to materialize the list. Given that we are copying pointers I don't think it matters, so I have not applied that piece of feedback.

I'm happy to if you feel it's important

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, hrmph, thats unfortunate. There is A good reason that was brought up on me in the past, but everything I can remember/think of is sort of a "well, i guess it is 'more' correct'" when it comes to pointers. So don't change it here, as the init-list stuff makes it more error prone.

@cor3ntin cor3ntin force-pushed the corentin/delay_overload branch 2 times, most recently from ddcad96 to 4c1ddfa Compare April 16, 2025 06:22
Copy link
Contributor

@zyn0217 zyn0217 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits, and thanks for helping us in this area!

@@ -7919,7 +7951,7 @@ void Sema::AddTemplateOverloadCandidate(
Candidate.Function = FunctionTemplate->getTemplatedDecl();
Candidate.Viable = false;
Candidate.RewriteKind =
CandidateSet.getRewriteInfo().getRewriteKind(Candidate.Function, PO);
CandidateSet.getRewriteInfo().getRewriteKind(Candidate.Function, PO);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated?

@@ -7834,7 +7838,7 @@ void Sema::AddMethodTemplateCandidate(
Candidate.Function = MethodTmpl->getTemplatedDecl();
Candidate.Viable = false;
Candidate.RewriteKind =
CandidateSet.getRewriteInfo().getRewriteKind(Candidate.Function, PO);
CandidateSet.getRewriteInfo().getRewriteKind(Candidate.Function, PO);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -45,6 +45,7 @@
#include <cstddef>
#include <cstdlib>
#include <optional>
#include <variant>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer use variant, right?

@cor3ntin cor3ntin force-pushed the corentin/delay_overload branch from 4c1ddfa to f5e899d Compare April 16, 2025 12:00
…match exists

This implements the same overload resolution behavior as GCC,
as described in https://wg21.link/p3606 (section 1-2, not 3)

If during overload resolution, there is a non-template candidate
that would be always be picked - because each of the argument
is a perfect match (ie the source and target types are the same),
we do not perform deduction for any template candidate
that might exists.

The goal is to be able to merge llvm#122423 without being too disruptive.

This change means that the selection of the best viable candidate and
template argument deduction become interleaved.

To avoid rewriting half of Clang we store in `OverloadCandidateSet`
enough information to be able to deduce template candidates from
`OverloadCandidateSet::BestViableFunction`. Which means
the lifetime of any object used by template argument must outlive
a call to `Add*Template*Candidate`.

This two phase resolution is not performed for some initialization
as there are cases where template candidate are better match
in these cases per the standard. It's also bypassed for code completion.

The change has a nice impact on compile times
https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au

Fixes llvm#62096
Fixes llvm#74581
@cor3ntin cor3ntin force-pushed the corentin/delay_overload branch from f5e899d to cfecf5a Compare April 16, 2025 15:03
@cor3ntin cor3ntin merged commit facc57f into llvm:main Apr 16, 2025
10 of 12 checks passed
cor3ntin added a commit that referenced this pull request Apr 16, 2025
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 16, 2025
cor3ntin added a commit that referenced this pull request Apr 17, 2025
…ts (#136018)

This implements the same overload resolution behavior as GCC,
as described in https://wg21.link/p3606 (section 1-2, not 3)

If during overload resolution, there is a non-template candidate
that would be always be picked - because each of the argument
is a perfect match (ie the source and target types are the same),
we do not perform deduction for any template candidate
that might exists.

The goal is to be able to merge
#122423 without being too
disruptive.

This change means that the selection of the best viable candidate and
template argument deduction become interleaved.

To avoid rewriting half of Clang we store in `OverloadCandidateSet`
enough information to be able to deduce template candidates from
`OverloadCandidateSet::BestViableFunction`. Which means
the lifetime of any object used by template argument must outlive
a call to `Add*Template*Candidate`.

This two phase resolution is not performed for some initialization
as there are cases where template candidate are better match
in these cases per the standard. It's also bypassed for code completion.

The change has a nice impact on compile times

https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au

Fixes #62096
Fixes #74581

Reapplies #133426
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
… exists (llvm#133426)

This implements the same overload resolution behavior as GCC, 
as described in https://wg21.link/p3606 (sections 1-2, not 3)

If, during overload resolution, a non-template candidate is always
picked because each argument is a perfect match (i.e., the source and
target types are the same), we do not perform deduction for any template
candidate that might exist.

The goal is to be able to merge llvm#122423 without being too disruptive.

This change means that the selection of the best viable candidate and
template argument deduction become interleaved.

To avoid rewriting half of Clang, we store in `OverloadCandidateSet`
enough information to deduce template candidates from
`OverloadCandidateSet::BestViableFunction`. This means the lifetime of
any object used by the template argument must outlive a call to
`Add*Template*Candidate`.

This two-phase resolution is not performed for some initialization as
there are cases where template candidates are a better match per the
standard. It's also bypassed for code completion.

The change has a nice impact on compile times

https://llvm-compile-time-tracker.com/compare.php?from=edc22c64e527171041876f26a491bb1d03d905d5&to=8170b860bd4b70917005796c05a9be013a95abb2&stat=instructions%3Au

Fixes llvm#62096
Fixes llvm#74581
Fixes llvm#53454
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
…ts (llvm#136018)

This implements the same overload resolution behavior as GCC,
as described in https://wg21.link/p3606 (section 1-2, not 3)

If during overload resolution, there is a non-template candidate
that would be always be picked - because each of the argument
is a perfect match (ie the source and target types are the same),
we do not perform deduction for any template candidate
that might exists.

The goal is to be able to merge
llvm#122423 without being too
disruptive.

This change means that the selection of the best viable candidate and
template argument deduction become interleaved.

To avoid rewriting half of Clang we store in `OverloadCandidateSet`
enough information to be able to deduce template candidates from
`OverloadCandidateSet::BestViableFunction`. Which means
the lifetime of any object used by template argument must outlive
a call to `Add*Template*Candidate`.

This two phase resolution is not performed for some initialization
as there are cases where template candidate are better match
in these cases per the standard. It's also bypassed for code completion.

The change has a nice impact on compile times

https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au

Fixes llvm#62096
Fixes llvm#74581

Reapplies llvm#133426
cor3ntin added a commit that referenced this pull request Apr 18, 2025
…ts (#136203)

This implements the same overload resolution behavior as GCC,
as described in https://wg21.link/p3606 (section 1-2, not 3)

If during overload resolution, there is a non-template candidate
that would be always be picked - because each of the argument
is a perfect match (ie the source and target types are the same),
we do not perform deduction for any template candidate
that might exists.

The goal is to be able to merge
#122423 without being too
disruptive.

This change means that the selection of the best viable candidate and
template argument deduction become interleaved.

To avoid rewriting half of Clang we store in OverloadCandidateSet
enough information to be able to deduce template candidates from
OverloadCandidateSet::BestViableFunction. Which means
the lifetime of any object used by template argument must outlive
a call to Add*Template*Candidate.

This two phase resolution is not performed for some initialization
as there are cases where template candidate are better match
in these cases per the standard. It's also bypassed for code completion.

The change has a nice impact on compile times

https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au

Fixes #62096
Fixes #74581

Reapplies #133426
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 18, 2025
… match exists (#136203)

This implements the same overload resolution behavior as GCC,
as described in https://wg21.link/p3606 (section 1-2, not 3)

If during overload resolution, there is a non-template candidate
that would be always be picked - because each of the argument
is a perfect match (ie the source and target types are the same),
we do not perform deduction for any template candidate
that might exists.

The goal is to be able to merge
llvm/llvm-project#122423 without being too
disruptive.

This change means that the selection of the best viable candidate and
template argument deduction become interleaved.

To avoid rewriting half of Clang we store in OverloadCandidateSet
enough information to be able to deduce template candidates from
OverloadCandidateSet::BestViableFunction. Which means
the lifetime of any object used by template argument must outlive
a call to Add*Template*Candidate.

This two phase resolution is not performed for some initialization
as there are cases where template candidate are better match
in these cases per the standard. It's also bypassed for code completion.

The change has a nice impact on compile times

https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au

Fixes llvm/llvm-project#62096
Fixes llvm/llvm-project#74581

Reapplies llvm/llvm-project#133426
@nikic
Copy link
Contributor

nikic commented Apr 18, 2025

The final version had a 4% improvement on clang bootstrap: https://llvm-compile-time-tracker.com/compare.php?from=23324b8b109aed1f77cb20cef476b795f33b6835&to=8c5a307bd8d406e6167a5cd3ce3c74e2e3bfb2a6&stat=instructions:u

Wow.

IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…ts (llvm#136203)

This implements the same overload resolution behavior as GCC,
as described in https://wg21.link/p3606 (section 1-2, not 3)

If during overload resolution, there is a non-template candidate
that would be always be picked - because each of the argument
is a perfect match (ie the source and target types are the same),
we do not perform deduction for any template candidate
that might exists.

The goal is to be able to merge
llvm#122423 without being too
disruptive.

This change means that the selection of the best viable candidate and
template argument deduction become interleaved.

To avoid rewriting half of Clang we store in OverloadCandidateSet
enough information to be able to deduce template candidates from
OverloadCandidateSet::BestViableFunction. Which means
the lifetime of any object used by template argument must outlive
a call to Add*Template*Candidate.

This two phase resolution is not performed for some initialization
as there are cases where template candidate are better match
in these cases per the standard. It's also bypassed for code completion.

The change has a nice impact on compile times

https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au

Fixes llvm#62096
Fixes llvm#74581

Reapplies llvm#133426
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…ts (llvm#136203)

This implements the same overload resolution behavior as GCC,
as described in https://wg21.link/p3606 (section 1-2, not 3)

If during overload resolution, there is a non-template candidate
that would be always be picked - because each of the argument
is a perfect match (ie the source and target types are the same),
we do not perform deduction for any template candidate
that might exists.

The goal is to be able to merge
llvm#122423 without being too
disruptive.

This change means that the selection of the best viable candidate and
template argument deduction become interleaved.

To avoid rewriting half of Clang we store in OverloadCandidateSet
enough information to be able to deduce template candidates from
OverloadCandidateSet::BestViableFunction. Which means
the lifetime of any object used by template argument must outlive
a call to Add*Template*Candidate.

This two phase resolution is not performed for some initialization
as there are cases where template candidate are better match
in these cases per the standard. It's also bypassed for code completion.

The change has a nice impact on compile times

https://llvm-compile-time-tracker.com/compare.php?from=719b029c16eeb1035da522fd641dfcc4cee6be74&to=bf7041045c9408490c395230047c5461de72fc39&stat=instructions%3Au

Fixes llvm#62096
Fixes llvm#74581

Reapplies llvm#133426
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
7 participants