Skip to content

[libc++] Overhaul the PSTL dispatching mechanism #88131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 12, 2024

Conversation

ldionne
Copy link
Member

@ldionne ldionne commented Apr 9, 2024

The experimental PSTL's current dispatching mechanism was designed with
flexibility in mind. However, while reviewing the in-progress OpenMP
backend, I realized that the dispatching mechanism based on ADL and
default definitions in the frontend had several downsides. To name a
few:

  1. The dispatching of an algorithm to the back-end and its default
    implementation is bundled together via _LIBCPP_PSTL_CUSTOMIZATION_POINT.
    This makes the dispatching really confusing and leads to annoyances
    such as variable shadowing and weird lambda captures in the front-end.
  2. The distinction between back-end functions and front-end algorithms
    is not as clear as it could be, which led us to call one where we meant
    the other in a few cases. This is bad due to the exception requirements
    of the PSTL: calling a front-end algorithm inside the implementation of
    a back-end is incorrect for exception-safety.
  3. There are two levels of back-end dispatching in the PSTL, which treat
    CPU backends as a special case. This was confusing and not as flexible
    as we'd like. For example, there was no straightforward way to dispatch
    all uses of unseq to a specific back-end from the OpenMP backend,
    or for CPU backends to fall back on each other.

This patch rewrites the backend dispatching mechanism to solve these
problems, but doesn't touch any of the actual implementation of
algorithms. Specifically, this rewrite has the following characteristics:

  • All back-ends are full top-level backends defining all the basis operations
    required by the PSTL. This is made realistic for CPU backends by providing
    the CPU-based basis operations as simple helpers that can easily be reused
    when defining the PSTL basis operations.

  • The default definitions for algorithms are separated from their dispatching
    logic and grouped in families instead, based on the basis operation they
    require for their default implementation.

  • The front-end is thus simplified a whole lot and made very consistent
    for all algorithms, which makes it easier to audit the front-end for
    things like exception-correctness, appropriate forwarding, etc.

Fixes #70718

@ldionne
Copy link
Member Author

ldionne commented Apr 9, 2024

This is a draft patch. I'm going to be chipping away at it by extracting smaller reviews like #88127.

ldionne added a commit to ldionne/llvm-project that referenced this pull request Apr 9, 2024
Currently, CPU backends in the PSTL are created by defining functions
in the __par_backend namespace. Then, the PSTL includes the CPU backend
that gets configured via CMake and gets those definitions.

This prevents CPU backends from easily co-existing and is a bit confusing.
To solve this problem, this patch introduces the notion of __cpu_traits,
which is a cheap encapsulation of the basis operations required to
implement a CPU-based PSTL. Different backends can now define their own
tag and coexist, and the CPU-based PSTL will simply use __cpu_traits to
dispatch to the right implementation of e.g. __for_each.

Note that this patch doesn't change the actual implementation of the
backends in any way, it only modifies how that implementation is accessed
to implement PSTL algorithms.

This patch is a step towards llvm#88131.
@ldionne ldionne force-pushed the wip/pstl-reorganization branch from f70f001 to 7ef7d1b Compare April 12, 2024 20:54
Copy link

github-actions bot commented Apr 12, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@ldionne ldionne force-pushed the wip/pstl-reorganization branch from 7ef7d1b to a324ff9 Compare April 12, 2024 21:02
@ldionne ldionne added libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. pstl Issues related to the C++17 Parallel STL labels Apr 12, 2024
@llvmbot
Copy link
Member

llvmbot commented Apr 12, 2024

@llvm/pr-subscribers-libcxx

Author: Louis Dionne (ldionne)

Changes

The experimental PSTL's current dispatching mechanism was designed with
flexibility in mind. However, while reviewing the in-progress OpenMP
backend, I realized that the dispatching mechanism based on ADL and
default definitions in the frontend had some downsides. To name a few:

  1. The dispatching of an algorithm to the back-end and its default
    implementation is bundled together via _LIBCPP_PSTL_CUSTOMIZATION_POINT.
    This makes the dispatching really confusing and leads to annoyances
    such as variable shadowing and weird lambda captures in the front-end.
  2. The distinction between back-end functions and front-end algorithms
    is not as clear as it could be, which led us to call one where we meant
    the other in a few cases. This is bad due to the exception requirements
    of the PSTL.
  3. There are two levels of back-end dispatching in the PSTL, which treat
    CPU backends as a special case. This was confusing and not as flexible
    as we'd like. For example, there was no straightforward way to dispatch
    all uses of unseq to a specific back-end from the OpenMP backend,
    or for CPU backends to fall back on each other.

This patch rewrites the backend dispatching mechanism to solve these
problems, but doesn't touch any of the actual implementation of
algorithms. Specifically, this rewrite has the following characteristics:

  • All back-ends are full top-level backends defining all the basis operations
    required by the PSTL. This is made realistic for CPU backends by providing
    the CPU-based basis operations as simple helpers that can easily be reused
    when defining the PSTL basis operations.

  • The default definitions for algorithms are separated from their dispatching
    logic and grouped in families instead, based on the basis operation they
    require for their default implementation.

  • The front-end is thus simplified a whole lot and made very consistent
    for all algorithms, which makes it easier to audit the front-end for
    things like exception-correctness.

Fixes #70718


Patch is 313.42 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/88131.diff

65 Files Affected:

  • (modified) libcxx/CMakeLists.txt (+9-9)
  • (modified) libcxx/cmake/caches/Apple.cmake (+1-1)
  • (modified) libcxx/include/CMakeLists.txt (+20-33)
  • (added) libcxx/include/__algorithm/pstl.h (+627)
  • (removed) libcxx/include/__algorithm/pstl_any_all_none_of.h (-152)
  • (removed) libcxx/include/__algorithm/pstl_backend.h (-232)
  • (removed) libcxx/include/__algorithm/pstl_backends/cpu_backend.h (-68)
  • (removed) libcxx/include/__algorithm/pstl_backends/cpu_backends/backend.h (-41)
  • (removed) libcxx/include/__algorithm/pstl_backends/cpu_backends/fill.h (-62)
  • (removed) libcxx/include/__algorithm/pstl_backends/cpu_backends/for_each.h (-62)
  • (removed) libcxx/include/__algorithm/pstl_backends/cpu_backends/libdispatch.h (-347)
  • (removed) libcxx/include/__algorithm/pstl_backends/cpu_backends/merge.h (-85)
  • (removed) libcxx/include/__algorithm/pstl_backends/cpu_backends/serial.h (-83)
  • (removed) libcxx/include/__algorithm/pstl_backends/cpu_backends/stable_sort.h (-45)
  • (removed) libcxx/include/__algorithm/pstl_backends/cpu_backends/thread.h (-84)
  • (removed) libcxx/include/__algorithm/pstl_backends/cpu_backends/transform.h (-138)
  • (removed) libcxx/include/__algorithm/pstl_backends/cpu_backends/transform_reduce.h (-202)
  • (removed) libcxx/include/__algorithm/pstl_copy.h (-121)
  • (removed) libcxx/include/__algorithm/pstl_count.h (-121)
  • (removed) libcxx/include/__algorithm/pstl_equal.h (-175)
  • (removed) libcxx/include/__algorithm/pstl_fill.h (-116)
  • (removed) libcxx/include/__algorithm/pstl_find.h (-141)
  • (removed) libcxx/include/__algorithm/pstl_for_each.h (-108)
  • (removed) libcxx/include/__algorithm/pstl_frontend_dispatch.h (-44)
  • (removed) libcxx/include/__algorithm/pstl_generate.h (-114)
  • (removed) libcxx/include/__algorithm/pstl_is_partitioned.h (-77)
  • (removed) libcxx/include/__algorithm/pstl_merge.h (-92)
  • (removed) libcxx/include/__algorithm/pstl_move.h (-84)
  • (removed) libcxx/include/__algorithm/pstl_replace.h (-247)
  • (removed) libcxx/include/__algorithm/pstl_rotate_copy.h (-85)
  • (removed) libcxx/include/__algorithm/pstl_sort.h (-82)
  • (removed) libcxx/include/__algorithm/pstl_stable_sort.h (-61)
  • (removed) libcxx/include/__algorithm/pstl_transform.h (-120)
  • (modified) libcxx/include/__config_site.in (+3-3)
  • (renamed) libcxx/include/__numeric/pstl.h (+79-63)
  • (removed) libcxx/include/__numeric/pstl_reduce.h (-109)
  • (added) libcxx/include/__pstl/README.md (+171)
  • (added) libcxx/include/__pstl/backend_fwd.h (+119)
  • (added) libcxx/include/__pstl/backends/default.h (+505)
  • (added) libcxx/include/__pstl/backends/libdispatch.h (+388)
  • (added) libcxx/include/__pstl/backends/serial.h (+169)
  • (added) libcxx/include/__pstl/backends/std_thread.h (+123)
  • (added) libcxx/include/__pstl/configuration.h (+26)
  • (added) libcxx/include/__pstl/configuration_fwd.h (+41)
  • (renamed) libcxx/include/__pstl/cpu_algos/any_of.h (+34-35)
  • (added) libcxx/include/__pstl/cpu_algos/cpu_traits.h (+85)
  • (added) libcxx/include/__pstl/cpu_algos/fill.h (+65)
  • (renamed) libcxx/include/__pstl/cpu_algos/find_if.h (+43-40)
  • (added) libcxx/include/__pstl/cpu_algos/for_each.h (+65)
  • (added) libcxx/include/__pstl/cpu_algos/merge.h (+79)
  • (added) libcxx/include/__pstl/cpu_algos/stable_sort.h (+46)
  • (added) libcxx/include/__pstl/cpu_algos/transform.h (+152)
  • (added) libcxx/include/__pstl/cpu_algos/transform_reduce.h (+214)
  • (added) libcxx/include/__pstl/dispatch.h (+51)
  • (added) libcxx/include/__pstl/run_backend.h (+48)
  • (modified) libcxx/include/algorithm (+1-16)
  • (modified) libcxx/include/libcxx.imp (+2-33)
  • (modified) libcxx/include/module.modulemap (-4)
  • (modified) libcxx/include/numeric (+1-2)
  • (modified) libcxx/src/CMakeLists.txt (+1-1)
  • (modified) libcxx/src/pstl/libdispatch.cpp (+3-6)
  • (modified) libcxx/test/std/algorithms/numeric.ops/reduce/pstl.exception_handling.pass.cpp (+1)
  • (modified) libcxx/test/std/algorithms/numeric.ops/transform.reduce/pstl.exception_handling.pass.cpp (+24-14)
  • (modified) libcxx/test/support/test_iterators.h (+16-2)
  • (modified) libcxx/utils/generate_iwyu_mapping.py (+1)
diff --git a/libcxx/CMakeLists.txt b/libcxx/CMakeLists.txt
index e565c47c76687a..36232a814bfdff 100644
--- a/libcxx/CMakeLists.txt
+++ b/libcxx/CMakeLists.txt
@@ -301,9 +301,9 @@ option(LIBCXX_HAS_EXTERNAL_THREAD_API
    This option may only be set to ON when LIBCXX_ENABLE_THREADS=ON." OFF)
 
 if (LIBCXX_ENABLE_THREADS)
-  set(LIBCXX_PSTL_CPU_BACKEND "std_thread" CACHE STRING "Which PSTL CPU backend to use")
+  set(LIBCXX_PSTL_BACKEND "std_thread" CACHE STRING "Which PSTL backend to use")
 else()
-  set(LIBCXX_PSTL_CPU_BACKEND "serial" CACHE STRING "Which PSTL CPU backend to use")
+  set(LIBCXX_PSTL_BACKEND "serial" CACHE STRING "Which PSTL backend to use")
 endif()
 
 # Misc options ----------------------------------------------------------------
@@ -793,14 +793,14 @@ elseif (LIBCXX_HARDENING_MODE STREQUAL "debug")
   config_define(8 _LIBCPP_HARDENING_MODE_DEFAULT)
 endif()
 
-if (LIBCXX_PSTL_CPU_BACKEND STREQUAL "serial")
-  config_define(1 _LIBCPP_PSTL_CPU_BACKEND_SERIAL)
-elseif(LIBCXX_PSTL_CPU_BACKEND STREQUAL "std_thread")
-  config_define(1 _LIBCPP_PSTL_CPU_BACKEND_THREAD)
-elseif(LIBCXX_PSTL_CPU_BACKEND STREQUAL "libdispatch")
-  config_define(1 _LIBCPP_PSTL_CPU_BACKEND_LIBDISPATCH)
+if (LIBCXX_PSTL_BACKEND STREQUAL "serial")
+  config_define(1 _LIBCPP_PSTL_BACKEND_SERIAL)
+elseif(LIBCXX_PSTL_BACKEND STREQUAL "std_thread")
+  config_define(1 _LIBCPP_PSTL_BACKEND_STD_THREAD)
+elseif(LIBCXX_PSTL_BACKEND STREQUAL "libdispatch")
+  config_define(1 _LIBCPP_PSTL_BACKEND_LIBDISPATCH)
 else()
-  message(FATAL_ERROR "LIBCXX_PSTL_CPU_BACKEND is set to ${LIBCXX_PSTL_CPU_BACKEND}, which is not a valid backend.
+  message(FATAL_ERROR "LIBCXX_PSTL_BACKEND is set to ${LIBCXX_PSTL_BACKEND}, which is not a valid backend.
                        Valid backends are: serial, std_thread and libdispatch")
 endif()
 
diff --git a/libcxx/cmake/caches/Apple.cmake b/libcxx/cmake/caches/Apple.cmake
index cec13c08acf107..8768653e620add 100644
--- a/libcxx/cmake/caches/Apple.cmake
+++ b/libcxx/cmake/caches/Apple.cmake
@@ -7,7 +7,7 @@ set(LIBCXX_ENABLE_STATIC ON CACHE BOOL "")
 set(LIBCXX_ENABLE_SHARED ON CACHE BOOL "")
 set(LIBCXX_CXX_ABI libcxxabi CACHE STRING "")
 set(LIBCXX_ENABLE_VENDOR_AVAILABILITY_ANNOTATIONS ON CACHE BOOL "")
-set(LIBCXX_PSTL_CPU_BACKEND libdispatch CACHE STRING "")
+set(LIBCXX_PSTL_BACKEND libdispatch CACHE STRING "")
 
 set(LIBCXX_HERMETIC_STATIC_LIBRARY ON CACHE BOOL "")
 set(LIBCXXABI_HERMETIC_STATIC_LIBRARY ON CACHE BOOL "")
diff --git a/libcxx/include/CMakeLists.txt b/libcxx/include/CMakeLists.txt
index 097a41d4c41740..b801ae8eaa0077 100644
--- a/libcxx/include/CMakeLists.txt
+++ b/libcxx/include/CMakeLists.txt
@@ -72,37 +72,7 @@ set(files
   __algorithm/partition_point.h
   __algorithm/pop_heap.h
   __algorithm/prev_permutation.h
-  __algorithm/pstl_any_all_none_of.h
-  __algorithm/pstl_backend.h
-  __algorithm/pstl_backends/cpu_backend.h
-  __algorithm/pstl_backends/cpu_backends/any_of.h
-  __algorithm/pstl_backends/cpu_backends/backend.h
-  __algorithm/pstl_backends/cpu_backends/fill.h
-  __algorithm/pstl_backends/cpu_backends/find_if.h
-  __algorithm/pstl_backends/cpu_backends/for_each.h
-  __algorithm/pstl_backends/cpu_backends/libdispatch.h
-  __algorithm/pstl_backends/cpu_backends/merge.h
-  __algorithm/pstl_backends/cpu_backends/serial.h
-  __algorithm/pstl_backends/cpu_backends/stable_sort.h
-  __algorithm/pstl_backends/cpu_backends/thread.h
-  __algorithm/pstl_backends/cpu_backends/transform.h
-  __algorithm/pstl_backends/cpu_backends/transform_reduce.h
-  __algorithm/pstl_copy.h
-  __algorithm/pstl_count.h
-  __algorithm/pstl_equal.h
-  __algorithm/pstl_fill.h
-  __algorithm/pstl_find.h
-  __algorithm/pstl_for_each.h
-  __algorithm/pstl_frontend_dispatch.h
-  __algorithm/pstl_generate.h
-  __algorithm/pstl_is_partitioned.h
-  __algorithm/pstl_merge.h
-  __algorithm/pstl_move.h
-  __algorithm/pstl_replace.h
-  __algorithm/pstl_rotate_copy.h
-  __algorithm/pstl_sort.h
-  __algorithm/pstl_stable_sort.h
-  __algorithm/pstl_transform.h
+  __algorithm/pstl.h
   __algorithm/push_heap.h
   __algorithm/ranges_adjacent_find.h
   __algorithm/ranges_all_of.h
@@ -584,13 +554,30 @@ set(files
   __numeric/iota.h
   __numeric/midpoint.h
   __numeric/partial_sum.h
-  __numeric/pstl_reduce.h
-  __numeric/pstl_transform_reduce.h
+  __numeric/pstl.h
   __numeric/reduce.h
   __numeric/saturation_arithmetic.h
   __numeric/transform_exclusive_scan.h
   __numeric/transform_inclusive_scan.h
   __numeric/transform_reduce.h
+  __pstl/backend_fwd.h
+  __pstl/backends/default.h
+  __pstl/backends/libdispatch.h
+  __pstl/backends/serial.h
+  __pstl/backends/std_thread.h
+  __pstl/configuration.h
+  __pstl/configuration_fwd.h
+  __pstl/cpu_algos/any_of.h
+  __pstl/cpu_algos/cpu_traits.h
+  __pstl/cpu_algos/fill.h
+  __pstl/cpu_algos/find_if.h
+  __pstl/cpu_algos/for_each.h
+  __pstl/cpu_algos/merge.h
+  __pstl/cpu_algos/stable_sort.h
+  __pstl/cpu_algos/transform_reduce.h
+  __pstl/cpu_algos/transform.h
+  __pstl/dispatch.h
+  __pstl/run_backend.h
   __random/bernoulli_distribution.h
   __random/binomial_distribution.h
   __random/cauchy_distribution.h
diff --git a/libcxx/include/__algorithm/pstl.h b/libcxx/include/__algorithm/pstl.h
new file mode 100644
index 00000000000000..c678b12258f105
--- /dev/null
+++ b/libcxx/include/__algorithm/pstl.h
@@ -0,0 +1,627 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _LIBCPP___ALGORITHM_PSTL_H
+#define _LIBCPP___ALGORITHM_PSTL_H
+
+#include <__config>
+#include <__functional/operations.h>
+#include <__iterator/cpp17_iterator_concepts.h>
+#include <__pstl/backend_fwd.h>
+#include <__pstl/configuration.h>
+#include <__pstl/dispatch.h>
+#include <__pstl/run_backend.h>
+#include <__type_traits/enable_if.h>
+#include <__type_traits/is_execution_policy.h>
+#include <__type_traits/remove_cvref.h>
+#include <__utility/forward.h>
+#include <__utility/move.h>
+
+#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
+#  pragma GCC system_header
+#endif
+
+_LIBCPP_PUSH_MACROS
+#include <__undef_macros>
+
+#if !defined(_LIBCPP_HAS_NO_INCOMPLETE_PSTL) && _LIBCPP_STD_VER >= 17
+
+_LIBCPP_BEGIN_NAMESPACE_STD
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator,
+          class _Predicate,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI bool
+any_of(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Predicate __pred) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator);
+  using _Implementation = __pstl::__dispatch<__pstl::__any_of, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__pred));
+}
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator,
+          class _Pred,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI bool
+all_of(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Pred __pred) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator);
+  using _Implementation = __pstl::__dispatch<__pstl::__all_of, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__pred));
+}
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator,
+          class _Pred,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_NODISCARD_EXT _LIBCPP_HIDE_FROM_ABI bool
+none_of(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Pred __pred) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator);
+  using _Implementation = __pstl::__dispatch<__pstl::__none_of, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__pred));
+}
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator,
+          class _ForwardOutIterator,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_HIDE_FROM_ABI _ForwardOutIterator
+copy(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _ForwardOutIterator __out) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator);
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardOutIterator);
+  _LIBCPP_REQUIRE_CPP17_OUTPUT_ITERATOR(_ForwardOutIterator, decltype(*__first));
+  using _Implementation = __pstl::__dispatch<__pstl::__copy, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__out));
+}
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator,
+          class _ForwardOutIterator,
+          class _Size,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_HIDE_FROM_ABI _ForwardOutIterator
+copy_n(_ExecutionPolicy&& __policy, _ForwardIterator __first, _Size __n, _ForwardOutIterator __out) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator);
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardOutIterator);
+  _LIBCPP_REQUIRE_CPP17_OUTPUT_ITERATOR(_ForwardOutIterator, decltype(*__first));
+  using _Implementation = __pstl::__dispatch<__pstl::__copy_n, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__n), std::move(__out));
+}
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator,
+          class _Predicate,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_HIDE_FROM_ABI __iter_diff_t<_ForwardIterator>
+count_if(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Predicate __pred) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator);
+  using _Implementation = __pstl::__dispatch<__pstl::__count_if, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__pred));
+}
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator,
+          class _Tp,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_HIDE_FROM_ABI __iter_diff_t<_ForwardIterator>
+count(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator);
+  using _Implementation = __pstl::__dispatch<__pstl::__count, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), __value);
+}
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator1,
+          class _ForwardIterator2,
+          class _Pred,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_HIDE_FROM_ABI bool
+equal(_ExecutionPolicy&& __policy,
+      _ForwardIterator1 __first1,
+      _ForwardIterator1 __last1,
+      _ForwardIterator2 __first2,
+      _Pred __pred) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator1);
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator2);
+  using _Implementation = __pstl::__dispatch<__pstl::__equal_3leg, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy),
+      std::move(__first1),
+      std::move(__last1),
+      std::move(__first2),
+      std::move(__pred));
+}
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator1,
+          class _ForwardIterator2,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_HIDE_FROM_ABI bool
+equal(_ExecutionPolicy&& __policy, _ForwardIterator1 __first1, _ForwardIterator1 __last1, _ForwardIterator2 __first2) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator1);
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator2);
+  return std::equal(__policy, std::move(__first1), std::move(__last1), std::move(__first2), equal_to{});
+}
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator1,
+          class _ForwardIterator2,
+          class _Pred,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_HIDE_FROM_ABI bool
+equal(_ExecutionPolicy&& __policy,
+      _ForwardIterator1 __first1,
+      _ForwardIterator1 __last1,
+      _ForwardIterator2 __first2,
+      _ForwardIterator2 __last2,
+      _Pred __pred) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator1);
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator2);
+  using _Implementation = __pstl::__dispatch<__pstl::__equal, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy),
+      std::move(__first1),
+      std::move(__last1),
+      std::move(__first2),
+      std::move(__last2),
+      std::move(__pred));
+}
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator1,
+          class _ForwardIterator2,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_HIDE_FROM_ABI bool
+equal(_ExecutionPolicy&& __policy,
+      _ForwardIterator1 __first1,
+      _ForwardIterator1 __last1,
+      _ForwardIterator2 __first2,
+      _ForwardIterator2 __last2) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator1);
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator2);
+  return std::equal(
+      __policy, std::move(__first1), std::move(__last1), std::move(__first2), std::move(__last2), equal_to{});
+}
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator,
+          class _Tp,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_HIDE_FROM_ABI void
+fill(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator);
+  using _Implementation = __pstl::__dispatch<__pstl::__fill, __pstl::__current_configuration, _RawPolicy>;
+  __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), __value);
+}
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator,
+          class _Size,
+          class _Tp,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_HIDE_FROM_ABI void
+fill_n(_ExecutionPolicy&& __policy, _ForwardIterator __first, _Size __n, const _Tp& __value) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator);
+  using _Implementation = __pstl::__dispatch<__pstl::__fill_n, __pstl::__current_configuration, _RawPolicy>;
+  __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__n), __value);
+}
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator,
+          class _Predicate,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_HIDE_FROM_ABI _ForwardIterator
+find_if(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Predicate __pred) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator);
+  using _Implementation = __pstl::__dispatch<__pstl::__find_if, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(__policy, std::move(__first), std::move(__last), std::move(__pred));
+}
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator,
+          class _Predicate,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_HIDE_FROM_ABI _ForwardIterator
+find_if_not(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Predicate __pred) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator);
+  using _Implementation = __pstl::__dispatch<__pstl::__find_if_not, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__pred));
+}
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator,
+          class _Tp,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_HIDE_FROM_ABI _ForwardIterator
+find(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator);
+  using _Implementation = __pstl::__dispatch<__pstl::__find, __pstl::__current_configuration, _RawPolicy>;
+  return __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), __value);
+}
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator,
+          class _Function,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_HIDE_FROM_ABI void
+for_each(_ExecutionPolicy&& __policy, _ForwardIterator __first, _ForwardIterator __last, _Function __func) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator);
+  using _Implementation = __pstl::__dispatch<__pstl::__for_each, __pstl::__current_configuration, _RawPolicy>;
+  __pstl::__run_backend<_Implementation>(
+      std::forward<_ExecutionPolicy>(__policy), std::move(__first), std::move(__last), std::move(__func));
+}
+
+template <class _ExecutionPolicy,
+          class _ForwardIterator,
+          class _Size,
+          class _Function,
+          class _RawPolicy                                    = __remove_cvref_t<_ExecutionPolicy>,
+          enable_if_t<is_execution_policy_v<_RawPolicy>, int> = 0>
+_LIBCPP_HIDE_FROM_ABI void
+for_each_n(_ExecutionPolicy&& __policy, _ForwardIterator __first, _Size __size, _Function __func) {
+  _LIBCPP_REQUIRE_CPP17_FORWARD_ITERATOR(_ForwardIterator);
+  using _Implementation = __pstl::__dispatch<__pstl::__for_each_n, __pstl::__current_configuration, _RawPolicy>;
+  __pstl::__run_backend<_Implementation>(
+      std::forward<_Execut...
[truncated]

ldionne added a commit that referenced this pull request Apr 15, 2024
Currently, CPU backends in the PSTL are created by defining functions
in the __par_backend namespace. Then, the PSTL includes the CPU backend
that gets configured via CMake and gets those definitions.

This prevents CPU backends from easily co-existing and is a bit
confusing.
To solve this problem, this patch introduces the notion of __cpu_traits,
which is a cheap encapsulation of the basis operations required to
implement a CPU-based PSTL. Different backends can now define their own
tag and coexist, and the CPU-based PSTL will simply use __cpu_traits to
dispatch to the right implementation of e.g. __for_each.

Note that this patch doesn't change the actual implementation of the
backends in any way, it only modifies how that implementation is
accessed
to implement PSTL algorithms.

This patch is a step towards #88131.
bazuzi pushed a commit to bazuzi/llvm-project that referenced this pull request Apr 15, 2024
Currently, CPU backends in the PSTL are created by defining functions
in the __par_backend namespace. Then, the PSTL includes the CPU backend
that gets configured via CMake and gets those definitions.

This prevents CPU backends from easily co-existing and is a bit
confusing.
To solve this problem, this patch introduces the notion of __cpu_traits,
which is a cheap encapsulation of the basis operations required to
implement a CPU-based PSTL. Different backends can now define their own
tag and coexist, and the CPU-based PSTL will simply use __cpu_traits to
dispatch to the right implementation of e.g. __for_each.

Note that this patch doesn't change the actual implementation of the
backends in any way, it only modifies how that implementation is
accessed
to implement PSTL algorithms.

This patch is a step towards llvm#88131.
aniplcc pushed a commit to aniplcc/llvm-project that referenced this pull request Apr 15, 2024
Currently, CPU backends in the PSTL are created by defining functions
in the __par_backend namespace. Then, the PSTL includes the CPU backend
that gets configured via CMake and gets those definitions.

This prevents CPU backends from easily co-existing and is a bit
confusing.
To solve this problem, this patch introduces the notion of __cpu_traits,
which is a cheap encapsulation of the basis operations required to
implement a CPU-based PSTL. Different backends can now define their own
tag and coexist, and the CPU-based PSTL will simply use __cpu_traits to
dispatch to the right implementation of e.g. __for_each.

Note that this patch doesn't change the actual implementation of the
backends in any way, it only modifies how that implementation is
accessed
to implement PSTL algorithms.

This patch is a step towards llvm#88131.
@ldionne ldionne force-pushed the wip/pstl-reorganization branch 3 times, most recently from 78d2d20 to 6351b51 Compare May 28, 2024 08:23
@ldionne ldionne marked this pull request as ready for review May 28, 2024 08:23
@ldionne ldionne requested a review from a team as a code owner May 28, 2024 08:23
@ldionne ldionne force-pushed the wip/pstl-reorganization branch 4 times, most recently from c30418f to addd103 Compare May 28, 2024 17:50
The experimental PSTL's current dispatching mechanism was designed with
flexibility in mind. However, while reviewing the in-progress OpenMP
backend, I realized that the dispatching mechanism based on ADL and
default definitions in the frontend had several downsides. To name a
few:

1. The dispatching of an algorithm to the back-end and its default
  implementation is bundled together via `_LIBCPP_PSTL_CUSTOMIZATION_POINT`.
  This makes the dispatching really confusing and leads to annoyances
  such as variable shadowing and weird lambda captures in the front-end.
2. The distinction between back-end functions and front-end algorithms
  is not as clear as it could be, which led us to call one where we meant
  the other in a few cases. This is bad due to the exception requirements
  of the PSTL: calling a front-end algorithm inside the implementation of
  a back-end is incorrect for exception-safety.
3. There are two levels of back-end dispatching in the PSTL, which treat
  CPU backends as a special case. This was confusing and not as flexible
  as we'd like. For example, there was no straightforward way to dispatch
  all uses of `unseq` to a specific back-end from the OpenMP backend,
  or for CPU backends to fall back on each other.

This patch rewrites the backend dispatching mechanism to solve these
problems, but doesn't touch any of the actual implementation of
algorithms. Specifically, this rewrite has the following characteristics:

- All back-ends are full top-level backends defining all the basis operations
required by the PSTL. This is made realistic for CPU backends by providing
the CPU-based basis operations as simple helpers that can easily be reused
when defining the PSTL basis operations.

- The default definitions for algorithms are separated from their dispatching
logic and grouped in families instead, based on the basis operation they
require for their default implementation.

- The front-end is thus simplified a whole lot and made very consistent
for all algorithms, which makes it easier to audit the front-end for
things like exception-correctness, appropriate forwarding, etc.

Fixes llvm#70718
@ldionne ldionne force-pushed the wip/pstl-reorganization branch from addd103 to 1de13f4 Compare May 28, 2024 19:03
Copy link
Contributor

@philnik777 philnik777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, just some minor nits.

std::move(__g_pred));
}
},
using _Implementation = __pstl::__dispatch<__pstl::__equal_3leg, __pstl::__current_configuration, _RawPolicy>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does 2leg mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variant of std::equal is often referred to as the "3 leg" version because it takes 3 iterators [first1, last1) and first2 as opposed to "4 legs" which would be [first1, last1) and [first2, last2). It's not an official term or anything, but I needed something to differentiate between the two.

namespace __pstl {

template <template <class, class> class _Algorithm, class _Backend, class _ExecutionPolicy, class = void>
struct __is_implemented : false_type {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well make it a variable template?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +48 to +49
static_assert(__cant_find_backend_for<_Algorithm, _ExecutionPolicy>,
"Could not find a PSTL backend for the given algorithm and execution policy");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static_assert(__cant_find_backend_for<_Algorithm, _ExecutionPolicy>,
"Could not find a PSTL backend for the given algorithm and execution policy");
static_assert(false, "Could not find a PSTL backend for the given algorithm and execution policy");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the comment above says:

// Helpful to provide better error messages. This will show the algorithm and the execution policy
// in the compiler diagnostic.

This makes a good difference when hitting a compiler error due to a misconfigured backend.

namespace __pstl {

template <class _BackendFunction, class... _Args>
_LIBCPP_HIDE_FROM_ABI _LIBCPP_ALWAYS_INLINE auto __handle_exception_impl(_Args&&... __args) noexcept {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the always_inline?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea, removed.

#include <__utility/forward.h>
#include <__utility/move.h>
#include <new> // __throw_bad_alloc
#include <optional>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this include required?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For nullopt. It's kind of annoying to add optional everywhere, but we really need it for propagating errors.

@ldionne ldionne merged commit 9540950 into llvm:main Jun 12, 2024
49 of 54 checks passed
@ldionne ldionne deleted the wip/pstl-reorganization branch June 12, 2024 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. pstl Issues related to the C++17 Parallel STL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[libc++][PSTL] Dispatching to default algorithms is broken in some cases (at least std::any_of)
3 participants