Skip to content

[libc++] Optimize bitset shift operations #106225

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions libcxx/include/__bit_reference
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,15 @@ _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI __bit_iterator<_Cp, false> _
template <class _Cp, bool _IsConst>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not attached: Please add a short description of what the optimization is in the PR description.

inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 __bit_iterator<_Cp, false>
copy(__bit_iterator<_Cp, _IsConst> __first, __bit_iterator<_Cp, _IsConst> __last, __bit_iterator<_Cp, false> __result) {
if (__builtin_constant_p(
__result.__ctz_ == 0 && __first.__seg_ == __last.__seg_ && __last.__seg_ == __result.__seg_) &&
__result.__ctz_ == 0 && __first.__seg_ == __last.__seg_ && __last.__seg_ == __result.__seg_) {
if (__first == __last)
return __result;
Comment on lines +296 to +297
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move this outside the __builtin_constant_p block and then assume that __n != 0 from __copy_aligned & __copy_unaligned. I'd add _LIBCPP_ASSERT_INTERNAL inside those functions just to bake in that assumption safely.

*__result.__seg_ >>= __first.__ctz_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on our discussion just now, I think what you want to do is something like:

Starting point
====================================
xxxxxxxxxyyyyyyyyyyyyyxxxxxxxxxx
         ^           ^         ^
                             result
       last        first

Desired result (where Y is y's that were not moved around)
====================================
xxxxxxxxxYYYYYYYYYYyyyyyyyyyyyyy
         ^           ^         ^
                             result
       last        first



xxxxxxxxxyyyyyyyyyy0000000000000   (A)
bitor
0000000000000000000yyyyyyyyyyyyy   (B)
=
xxxxxxxxxYYYYYYYYYYyyyyyyyyyyyyy
         ^           ^         ^
                             result
       last        first

Now, we have:

mask = (~0 << (last - first))

(A) = mask & originalthing
(B) = ~mask & (originalthing >> first.ctz)

I think this should work. Obviously we need some pretty good testing for that (which we don't seem to have right now). We should also consider all the degenerate cases that might exist.

It would also be nice to re-run the benchmark in light of these changes.

return __result + (__last - __first);
}

if (__first.__ctz_ == __result.__ctz_)
return std::__copy_aligned(__first, __last, __result);
return std::__copy_unaligned(__first, __last, __result);
Expand Down Expand Up @@ -418,6 +427,15 @@ _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_HIDE_FROM_ABI __bit_iterator<_Cp, false> _
template <class _Cp, bool _IsConst>
inline _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 __bit_iterator<_Cp, false> copy_backward(
__bit_iterator<_Cp, _IsConst> __first, __bit_iterator<_Cp, _IsConst> __last, __bit_iterator<_Cp, false> __result) {
if (__builtin_constant_p(
__first.__ctz_ == 0 && __first.__seg_ == __last.__seg_ && __last.__seg_ == __result.__seg_) &&
__first.__ctz_ == 0 && __first.__seg_ == __last.__seg_ && __last.__seg_ == __result.__seg_) {
if (__first == __last)
return __result;
*__result.__seg_ <<= __result.__ctz_ - __last.__ctz_;
return __result - (__last - __first);
}

if (__last.__ctz_ == __result.__ctz_)
return std::__copy_backward_aligned(__first, __last, __result);
return std::__copy_backward_unaligned(__first, __last, __result);
Expand Down
1 change: 1 addition & 0 deletions libcxx/test/benchmarks/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ set(BENCHMARK_TESTS
algorithms/stable_sort.bench.cpp
atomic_wait.bench.cpp
atomic_wait_vs_mutex_lock.bench.cpp
bitset.bench.cpp
libcxxabi/dynamic_cast.bench.cpp
libcxxabi/dynamic_cast_old_stress.bench.cpp
allocation.bench.cpp
Expand Down
37 changes: 37 additions & 0 deletions libcxx/test/benchmarks/bitset.bench.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
//===----------------------------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#include <bitset>

#include "benchmark/benchmark.h"

template <std::size_t N>
static void bm_left_shift(benchmark::State& state) {
std::bitset<N> b;

for (auto _ : state) {
b <<= 4;
benchmark::DoNotOptimize(b);
}
}
BENCHMARK(bm_left_shift<32>);
BENCHMARK(bm_left_shift<64>);

template <std::size_t N>
static void bm_right_shift(benchmark::State& state) {
std::bitset<N> b;

for (auto _ : state) {
b >>= 4;
benchmark::DoNotOptimize(b);
}
}
BENCHMARK(bm_right_shift<32>);
BENCHMARK(bm_right_shift<64>);

BENCHMARK_MAIN();
Loading