Skip to content

[libc++] Optimize bitset shift operations #106225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

philnik777
Copy link
Contributor

@philnik777 philnik777 commented Aug 27, 2024

--------------------------------------
Benchmark               old        new
--------------------------------------
bm_left_shift<32>   6.19 ns   0.210 ns
bm_left_shift<64>   6.16 ns    6.20 ns
bm_right_shift<32>  6.23 ns   0.421 ns
bm_right_shift<64>  6.26 ns    6.31 ns

@philnik777 philnik777 changed the title [libc++] Opzimize bitset shift operations [libc++] Optimize bitset shift operations Aug 27, 2024
@philnik777 philnik777 marked this pull request as ready for review August 29, 2024 15:07
@philnik777 philnik777 requested a review from a team as a code owner August 29, 2024 15:07
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Aug 29, 2024
@llvmbot
Copy link
Member

llvmbot commented Aug 29, 2024

@llvm/pr-subscribers-libcxx

Author: Nikolas Klauser (philnik777)

Changes
--------------------------------------
Benchmark               old        new
--------------------------------------
bm_left_shift&lt;32&gt;   6.19 ns   0.210 ns
bm_left_shift&lt;64&gt;   6.16 ns    6.20 ns
bm_right_shift&lt;32&gt;  6.23 ns   0.421 ns
bm_right_shift&lt;64&gt;  6.26 ns    6.31 ns

Full diff: https://github.com/llvm/llvm-project/pull/106225.diff

3 Files Affected:

  • (modified) libcxx/include/__bit_reference (+18)
  • (modified) libcxx/test/benchmarks/CMakeLists.txt (+1)
  • (added) libcxx/test/benchmarks/bitset.bench.cpp (+37)
diff --git a/libcxx/include/__bit_reference b/libcxx/include/__bit_reference
index 22637d43974123..599e87d3e6fc3e 100644
--- a/libcxx/include/__bit_reference
+++ b/libcxx/include/__bit_reference
@@ -290,6 +290,15 @@ _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI __bit_iterator<_Cp, false> _
 template <class _Cp, bool _IsConst>
 inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 __bit_iterator<_Cp, false>
 copy(__bit_iterator<_Cp, _IsConst> __first, __bit_iterator<_Cp, _IsConst> __last, __bit_iterator<_Cp, false> __result) {
+  if (__builtin_constant_p(
+          __result.__ctz_ == 0 && __first.__seg_ == __last.__seg_ && __last.__seg_ == __result.__seg_) &&
+      __result.__ctz_ == 0 && __first.__seg_ == __last.__seg_ && __last.__seg_ == __result.__seg_) {
+    if (__first == __last)
+      return __result;
+    *__result.__seg_ >>= __first.__ctz_;
+    return __result + (__last - __first);
+  }
+
   if (__first.__ctz_ == __result.__ctz_)
     return std::__copy_aligned(__first, __last, __result);
   return std::__copy_unaligned(__first, __last, __result);
@@ -418,6 +427,15 @@ _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI __bit_iterator<_Cp, false> _
 template <class _Cp, bool _IsConst>
 inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 __bit_iterator<_Cp, false> copy_backward(
     __bit_iterator<_Cp, _IsConst> __first, __bit_iterator<_Cp, _IsConst> __last, __bit_iterator<_Cp, false> __result) {
+  if (__builtin_constant_p(
+          __first.__ctz_ == 0 && __first.__seg_ == __last.__seg_ && __last.__seg_ == __result.__seg_) &&
+      __first.__ctz_ == 0 && __first.__seg_ == __last.__seg_ && __last.__seg_ == __result.__seg_) {
+    if (__first == __last)
+      return __result;
+    *__result.__seg_ <<= __result.__ctz_ - __last.__ctz_;
+    return __result - (__last - __first);
+  }
+
   if (__last.__ctz_ == __result.__ctz_)
     return std::__copy_backward_aligned(__first, __last, __result);
   return std::__copy_backward_unaligned(__first, __last, __result);
diff --git a/libcxx/test/benchmarks/CMakeLists.txt b/libcxx/test/benchmarks/CMakeLists.txt
index 616cf0ff8d2374..d88ca038bc39f6 100644
--- a/libcxx/test/benchmarks/CMakeLists.txt
+++ b/libcxx/test/benchmarks/CMakeLists.txt
@@ -140,6 +140,7 @@ set(BENCHMARK_TESTS
     algorithms/stable_sort.bench.cpp
     atomic_wait.bench.cpp
     atomic_wait_vs_mutex_lock.bench.cpp
+    bitset.bench.cpp
     libcxxabi/dynamic_cast.bench.cpp
     libcxxabi/dynamic_cast_old_stress.bench.cpp
     allocation.bench.cpp
diff --git a/libcxx/test/benchmarks/bitset.bench.cpp b/libcxx/test/benchmarks/bitset.bench.cpp
new file mode 100644
index 00000000000000..fbe3dee5089e7e
--- /dev/null
+++ b/libcxx/test/benchmarks/bitset.bench.cpp
@@ -0,0 +1,37 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include <bitset>
+
+#include "benchmark/benchmark.h"
+
+template <std::size_t N>
+static void bm_left_shift(benchmark::State& state) {
+  std::bitset<N> b;
+
+  for (auto _ : state) {
+    b <<= 4;
+    benchmark::DoNotOptimize(b);
+  }
+}
+BENCHMARK(bm_left_shift<32>);
+BENCHMARK(bm_left_shift<64>);
+
+template <std::size_t N>
+static void bm_right_shift(benchmark::State& state) {
+  std::bitset<N> b;
+
+  for (auto _ : state) {
+    b >>= 4;
+    benchmark::DoNotOptimize(b);
+  }
+}
+BENCHMARK(bm_right_shift<32>);
+BENCHMARK(bm_right_shift<64>);
+
+BENCHMARK_MAIN();

@@ -290,6 +290,15 @@ _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI __bit_iterator<_Cp, false> _
template <class _Cp, bool _IsConst>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not attached: Please add a short description of what the optimization is in the PR description.

Comment on lines +296 to +297
if (__first == __last)
return __result;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move this outside the __builtin_constant_p block and then assume that __n != 0 from __copy_aligned & __copy_unaligned. I'd add _LIBCPP_ASSERT_INTERNAL inside those functions just to bake in that assumption safely.

__result.__ctz_ == 0 && __first.__seg_ == __last.__seg_ && __last.__seg_ == __result.__seg_) {
if (__first == __last)
return __result;
*__result.__seg_ >>= __first.__ctz_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on our discussion just now, I think what you want to do is something like:

Starting point
====================================
xxxxxxxxxyyyyyyyyyyyyyxxxxxxxxxx
         ^           ^         ^
                             result
       last        first

Desired result (where Y is y's that were not moved around)
====================================
xxxxxxxxxYYYYYYYYYYyyyyyyyyyyyyy
         ^           ^         ^
                             result
       last        first



xxxxxxxxxyyyyyyyyyy0000000000000   (A)
bitor
0000000000000000000yyyyyyyyyyyyy   (B)
=
xxxxxxxxxYYYYYYYYYYyyyyyyyyyyyyy
         ^           ^         ^
                             result
       last        first

Now, we have:

mask = (~0 << (last - first))

(A) = mask & originalthing
(B) = ~mask & (originalthing >> first.ctz)

I think this should work. Obviously we need some pretty good testing for that (which we don't seem to have right now). We should also consider all the degenerate cases that might exist.

It would also be nice to re-run the benchmark in light of these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants