Skip to content

[libc++] Reduce std::conjunction overhead #124259

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 25, 2025

Conversation

philnik777
Copy link
Contributor

The old and new implementation of _And are very close in terms of performance according to my testing, but the new implementation can also be used to implement conjunction, which make that ~50% faster.

@philnik777 philnik777 force-pushed the optimize_conjunction branch from 63da627 to a8cb701 Compare January 24, 2025 23:00
@philnik777 philnik777 marked this pull request as ready for review January 25, 2025 09:36
@philnik777 philnik777 requested a review from a team as a code owner January 25, 2025 09:36
@philnik777 philnik777 merged commit 2696e4f into llvm:main Jan 25, 2025
78 of 79 checks passed
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Jan 25, 2025
@philnik777 philnik777 deleted the optimize_conjunction branch January 25, 2025 09:36
@llvmbot
Copy link
Member

llvmbot commented Jan 25, 2025

@llvm/pr-subscribers-libcxx

Author: Nikolas Klauser (philnik777)

Changes

The old and new implementation of _And are very close in terms of performance according to my testing, but the new implementation can also be used to implement conjunction, which make that ~50% faster.


Full diff: https://github.com/llvm/llvm-project/pull/124259.diff

1 Files Affected:

  • (modified) libcxx/include/__type_traits/conjunction.h (+18-24)
diff --git a/libcxx/include/__type_traits/conjunction.h b/libcxx/include/__type_traits/conjunction.h
index ad9656acd47ecc..6b6717a50a468a 100644
--- a/libcxx/include/__type_traits/conjunction.h
+++ b/libcxx/include/__type_traits/conjunction.h
@@ -10,8 +10,6 @@
 #define _LIBCPP___TYPE_TRAITS_CONJUNCTION_H
 
 #include <__config>
-#include <__type_traits/conditional.h>
-#include <__type_traits/enable_if.h>
 #include <__type_traits/integral_constant.h>
 #include <__type_traits/is_same.h>
 
@@ -21,22 +19,29 @@
 
 _LIBCPP_BEGIN_NAMESPACE_STD
 
-template <class...>
-using __expand_to_true _LIBCPP_NODEBUG = true_type;
+template <bool>
+struct _AndImpl;
 
-template <class... _Pred>
-__expand_to_true<__enable_if_t<_Pred::value>...> __and_helper(int);
+template <>
+struct _AndImpl<true> {
+  template <class _Res, class _First, class... _Rest>
+  using _Result _LIBCPP_NODEBUG =
+      typename _AndImpl<bool(_First::value) && sizeof...(_Rest) != 0>::template _Result<_First, _Rest...>;
+};
 
-template <class...>
-false_type __and_helper(...);
+template <>
+struct _AndImpl<false> {
+  template <class _Res, class...>
+  using _Result _LIBCPP_NODEBUG = _Res;
+};
 
 // _And always performs lazy evaluation of its arguments.
 //
 // However, `_And<_Pred...>` itself will evaluate its result immediately (without having to
 // be instantiated) since it is an alias, unlike `conjunction<_Pred...>`, which is a struct.
 // If you want to defer the evaluation of `_And<_Pred...>` itself, use `_Lazy<_And, _Pred...>`.
-template <class... _Pred>
-using _And _LIBCPP_NODEBUG = decltype(std::__and_helper<_Pred...>(0));
+template <class... _Args>
+using _And _LIBCPP_NODEBUG = typename _AndImpl<sizeof...(_Args) != 0>::template _Result<true_type, _Args...>;
 
 template <bool... _Preds>
 struct __all_dummy;
@@ -46,22 +51,11 @@ struct __all : _IsSame<__all_dummy<_Pred...>, __all_dummy<((void)_Pred, true)...
 
 #if _LIBCPP_STD_VER >= 17
 
-template <class...>
-struct _LIBCPP_NO_SPECIALIZATIONS conjunction : true_type {};
-
-_LIBCPP_DIAGNOSTIC_PUSH
-#  if __has_warning("-Winvalid-specialization")
-_LIBCPP_CLANG_DIAGNOSTIC_IGNORED("-Winvalid-specialization")
-#  endif
-template <class _Arg>
-struct conjunction<_Arg> : _Arg {};
-
-template <class _Arg, class... _Args>
-struct conjunction<_Arg, _Args...> : conditional_t<!bool(_Arg::value), _Arg, conjunction<_Args...>> {};
-_LIBCPP_DIAGNOSTIC_POP
+template <class... _Args>
+struct _LIBCPP_NO_SPECIALIZATIONS conjunction : _And<_Args...> {};
 
 template <class... _Args>
-_LIBCPP_NO_SPECIALIZATIONS inline constexpr bool conjunction_v = conjunction<_Args...>::value;
+_LIBCPP_NO_SPECIALIZATIONS inline constexpr bool conjunction_v = _And<_Args...>::value;
 
 #endif // _LIBCPP_STD_VER >= 17
 

@asmok-g
Copy link

asmok-g commented Jan 31, 2025

Headsup: this is causing a clang seg fault. Working on a repro ..

@asmok-g
Copy link

asmok-g commented Jan 31, 2025

Probably the seg fault is due to going out of memory

@asmok-g
Copy link

asmok-g commented Feb 3, 2025

Our internal assessment of the situation is that this exposes a bug in the compiler but it will need time to get to the bottom of it. In the meantime we suggest to revert the patch till the problem is known and fixed @philnik777 what do you think ?

@philnik777
Copy link
Contributor Author

@asmok-g I'd really like to have a reproducer before reverting.

@ldionne
Copy link
Member

ldionne commented Feb 3, 2025

Indeed, please provide a reproducer. We're fine with reverting temporarily when we have a reproducer and a bug report that is being worked on, but without that we're essentially reverting because someone said so, which is not really workable for the project.

@asmok-g
Copy link

asmok-g commented Feb 3, 2025

Indeed. cvise is running so we'll see where it is tomorrow

@eaeltsin
Copy link
Contributor

eaeltsin commented Feb 5, 2025

We might have overreduced the test, but ended up with the following code that makes the compiler exhaust the stack and SIGSEGV:

template <int> struct _AndImpl {
  template <class _First, class... _Rest>
  using _Result = _AndImpl<sizeof...(_Rest)>::template _Result<_First>;
};

template <class... _Args>
using _And = _AndImpl<sizeof...(_Args)>::template _Result<int>;

_And<>;

not yet sure which usage of std::conjunction leads to this.

@philnik777
Copy link
Contributor Author

That certainly looks overreduced. There is no recursion anchor (is it called that in english?), so we have an infinite loop. Interesting that there seems to be no depth limit on aliases. But that's most certainly not the actual problem, since we do have a recursion anchor in the actual implementation.

I'd also be happy with a non-reduced case if you are able to shared it. (Assuming of course I can actually reproduce and verify that the problem is the new conjunction implementation)

@asmok-g
Copy link

asmok-g commented Feb 5, 2025

There's something we forgot to mention;

Observation

  • There are different instances of crashes we see.
  • The one I'm looking in is going out of memory and crashing.
  • When the preprocessed file is generated by the compiler at this revision, then no matter what clang we use to compile (before this revision or after), we get memory problem. (so problem is preexisting in clang ?)

Assessment of the internal clang team is

The problem is not in this PR. The PR just exposes the issue in clang. However, it's gonna take a long time to investigate. This is why we hope to revert in the meantime. Whatever repro (preprocessed file) we're going to provide won't prove this PR wrong.

My reduction is very slow and is at 126k lines now, and we have another one at 50k lines. In any case, the problem will be unrelated to this PR.

@ilya-biryukov
Copy link
Contributor

@philnik777 we have investigated further and found that this seems to boil down to Clang requiring more stack (likely deeper recursions) when instantiating std::tuple with long argument lists.

In our code, we end up crashing with stack overflow when instantiating std::tuple<1000 x size_t>. The upstream compiler takes a little longer (we seem to have more aggressive inlining when compiling Clang internally), but also seems to crash eventually. Here's a simple example with Clang from trunk and Clang 19 (they should be using the corresponding versions of libc++ that match the git commit). A tuple with 2k element either crashes it or times out, not sure about godbolt's output here:
https://gcc.godbolt.org/z/KTq98rfhr

If you play around with the tuple sizes, you'll see that smaller numbers that don't fail result in much higher compile times with the new approach.

The caveat is that there might've been more changes between Clang 19 and now, but we are pretty certain this commit is the root cause based on our rolling testing.

Could you please take a look if compiling std::tuple with long argument lists got slower / more crashy and investigate if that's expected?

@ilya-biryukov
Copy link
Contributor

Note that I don't have any idea how we get from std::tuple to conjunction, so I'm rolling with the assumption that does happen at some point in the code, I'm not very familiar with tuple's implementation.
Let me know if this makes sense.

philnik777 added a commit that referenced this pull request Feb 7, 2025
It turns out that the new implementation takes significantly more stack
memory for some reason.

This reverts commit 2696e4f.
@ldionne
Copy link
Member

ldionne commented Feb 7, 2025

@philnik777 Thanks for handling it. Note that this patch was also in release/20.x, so we should probably revert it on the release branch. Can you handle that?

Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
It turns out that the new implementation takes significantly more stack
memory for some reason.

This reverts commit 2696e4f.
swift-ci pushed a commit to swiftlang/llvm-project that referenced this pull request Feb 19, 2025
It turns out that the new implementation takes significantly more stack
memory for some reason.

This reverts commit 2696e4f.

(cherry picked from commit 0227396)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants