[libc++] Optimize bitset shift operations #106225

philnik777 · 2024-08-27T13:53:16Z

--------------------------------------
Benchmark               old        new
--------------------------------------
bm_left_shift<32>   6.19 ns   0.210 ns
bm_left_shift<64>   6.16 ns    6.20 ns
bm_right_shift<32>  6.23 ns   0.421 ns
bm_right_shift<64>  6.26 ns    6.31 ns

llvmbot · 2024-08-29T15:07:42Z

@llvm/pr-subscribers-libcxx

Author: Nikolas Klauser (philnik777)

Changes

--------------------------------------
Benchmark               old        new
--------------------------------------
bm_left_shift&lt;32&gt;   6.19 ns   0.210 ns
bm_left_shift&lt;64&gt;   6.16 ns    6.20 ns
bm_right_shift&lt;32&gt;  6.23 ns   0.421 ns
bm_right_shift&lt;64&gt;  6.26 ns    6.31 ns

Full diff: https://github.com/llvm/llvm-project/pull/106225.diff

3 Files Affected:

(modified) libcxx/include/__bit_reference (+18)
(modified) libcxx/test/benchmarks/CMakeLists.txt (+1)
(added) libcxx/test/benchmarks/bitset.bench.cpp (+37)

diff --git a/libcxx/include/__bit_reference b/libcxx/include/__bit_reference
index 22637d43974123..599e87d3e6fc3e 100644
--- a/libcxx/include/__bit_reference
+++ b/libcxx/include/__bit_reference
@@ -290,6 +290,15 @@ _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI __bit_iterator<_Cp, false> _
 template <class _Cp, bool _IsConst>
 inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 __bit_iterator<_Cp, false>
 copy(__bit_iterator<_Cp, _IsConst> __first, __bit_iterator<_Cp, _IsConst> __last, __bit_iterator<_Cp, false> __result) {
+  if (__builtin_constant_p(
+          __result.__ctz_ == 0 && __first.__seg_ == __last.__seg_ && __last.__seg_ == __result.__seg_) &&
+      __result.__ctz_ == 0 && __first.__seg_ == __last.__seg_ && __last.__seg_ == __result.__seg_) {
+    if (__first == __last)
+      return __result;
+    *__result.__seg_ >>= __first.__ctz_;
+    return __result + (__last - __first);
+  }
+
   if (__first.__ctz_ == __result.__ctz_)
     return std::__copy_aligned(__first, __last, __result);
   return std::__copy_unaligned(__first, __last, __result);
@@ -418,6 +427,15 @@ _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI __bit_iterator<_Cp, false> _
 template <class _Cp, bool _IsConst>
 inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 __bit_iterator<_Cp, false> copy_backward(
     __bit_iterator<_Cp, _IsConst> __first, __bit_iterator<_Cp, _IsConst> __last, __bit_iterator<_Cp, false> __result) {
+  if (__builtin_constant_p(
+          __first.__ctz_ == 0 && __first.__seg_ == __last.__seg_ && __last.__seg_ == __result.__seg_) &&
+      __first.__ctz_ == 0 && __first.__seg_ == __last.__seg_ && __last.__seg_ == __result.__seg_) {
+    if (__first == __last)
+      return __result;
+    *__result.__seg_ <<= __result.__ctz_ - __last.__ctz_;
+    return __result - (__last - __first);
+  }
+
   if (__last.__ctz_ == __result.__ctz_)
     return std::__copy_backward_aligned(__first, __last, __result);
   return std::__copy_backward_unaligned(__first, __last, __result);
diff --git a/libcxx/test/benchmarks/CMakeLists.txt b/libcxx/test/benchmarks/CMakeLists.txt
index 616cf0ff8d2374..d88ca038bc39f6 100644
--- a/libcxx/test/benchmarks/CMakeLists.txt
+++ b/libcxx/test/benchmarks/CMakeLists.txt
@@ -140,6 +140,7 @@ set(BENCHMARK_TESTS
     algorithms/stable_sort.bench.cpp
     atomic_wait.bench.cpp
     atomic_wait_vs_mutex_lock.bench.cpp
+    bitset.bench.cpp
     libcxxabi/dynamic_cast.bench.cpp
     libcxxabi/dynamic_cast_old_stress.bench.cpp
     allocation.bench.cpp
diff --git a/libcxx/test/benchmarks/bitset.bench.cpp b/libcxx/test/benchmarks/bitset.bench.cpp
new file mode 100644
index 00000000000000..fbe3dee5089e7e
--- /dev/null
+++ b/libcxx/test/benchmarks/bitset.bench.cpp
@@ -0,0 +1,37 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include <bitset>
+
+#include "benchmark/benchmark.h"
+
+template <std::size_t N>
+static void bm_left_shift(benchmark::State& state) {
+  std::bitset<N> b;
+
+  for (auto _ : state) {
+    b <<= 4;
+    benchmark::DoNotOptimize(b);
+  }
+}
+BENCHMARK(bm_left_shift<32>);
+BENCHMARK(bm_left_shift<64>);
+
+template <std::size_t N>
+static void bm_right_shift(benchmark::State& state) {
+  std::bitset<N> b;
+
+  for (auto _ : state) {
+    b >>= 4;
+    benchmark::DoNotOptimize(b);
+  }
+}
+BENCHMARK(bm_right_shift<32>);
+BENCHMARK(bm_right_shift<64>);
+
+BENCHMARK_MAIN();

ldionne · 2024-09-12T16:04:15Z

libcxx/include/__bit_reference

@@ -290,6 +290,15 @@ _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI __bit_iterator<_Cp, false> _
 template <class _Cp, bool _IsConst>


Not attached: Please add a short description of what the optimization is in the PR description.

ldionne · 2024-09-12T16:21:02Z

libcxx/include/__bit_reference

+    if (__first == __last)
+      return __result;


We could move this outside the __builtin_constant_p block and then assume that __n != 0 from __copy_aligned & __copy_unaligned. I'd add _LIBCPP_ASSERT_INTERNAL inside those functions just to bake in that assumption safely.

ldionne · 2024-09-12T16:41:57Z

libcxx/include/__bit_reference

+      __result.__ctz_ == 0 && __first.__seg_ == __last.__seg_ && __last.__seg_ == __result.__seg_) {
+    if (__first == __last)
+      return __result;
+    *__result.__seg_ >>= __first.__ctz_;


Based on our discussion just now, I think what you want to do is something like:

Starting point ==================================== xxxxxxxxxyyyyyyyyyyyyyxxxxxxxxxx ^ ^ ^ result last first Desired result (where Y is y's that were not moved around) ==================================== xxxxxxxxxYYYYYYYYYYyyyyyyyyyyyyy ^ ^ ^ result last first xxxxxxxxxyyyyyyyyyy0000000000000 (A) bitor 0000000000000000000yyyyyyyyyyyyy (B) = xxxxxxxxxYYYYYYYYYYyyyyyyyyyyyyy ^ ^ ^ result last first

Now, we have:

mask = (~0 << (last - first)) (A) = mask & originalthing (B) = ~mask & (originalthing >> first.ctz)

I think this should work. Obviously we need some pretty good testing for that (which we don't seem to have right now). We should also consider all the degenerate cases that might exist.

It would also be nice to re-run the benchmark in light of these changes.

[libc++] Opzimize bitset shift operations

827aa35

philnik777 changed the title ~~[libc++] Opzimize bitset shift operations~~ [libc++] Optimize bitset shift operations Aug 27, 2024

philnik777 marked this pull request as ready for review August 29, 2024 15:07

philnik777 requested a review from a team as a code owner August 29, 2024 15:07

llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Aug 29, 2024

ldionne requested changes Sep 12, 2024

View reviewed changes

ldionne added the performance label Oct 31, 2024

philnik777 mentioned this pull request Jan 20, 2025

[libc++] Improve test coverage for copy/move ctors for vector<bool> #120132

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[libc++] Optimize bitset shift operations #106225

[libc++] Optimize bitset shift operations #106225

philnik777 commented Aug 27, 2024 •

edited

Loading

llvmbot commented Aug 29, 2024

ldionne Sep 12, 2024

ldionne Sep 12, 2024

ldionne Sep 12, 2024

		@@ -290,6 +290,15 @@ _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI __bit_iterator<_Cp, false> _
		template <class _Cp, bool _IsConst>

[libc++] Optimize bitset shift operations #106225

Are you sure you want to change the base?

[libc++] Optimize bitset shift operations #106225

Conversation

philnik777 commented Aug 27, 2024 • edited Loading

llvmbot commented Aug 29, 2024

ldionne Sep 12, 2024

Choose a reason for hiding this comment

ldionne Sep 12, 2024

Choose a reason for hiding this comment

ldionne Sep 12, 2024

Choose a reason for hiding this comment

philnik777 commented Aug 27, 2024 •

edited

Loading