Skip to content

[libc++][hardening] Check bounds on arithmetic in __bounded_iter #78876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 12, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 46 additions & 25 deletions libcxx/include/__iterator/bounded_iter.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,13 +31,20 @@ _LIBCPP_BEGIN_NAMESPACE_STD
// Iterator wrapper that carries the valid range it is allowed to access.
//
// This is a simple iterator wrapper for contiguous iterators that points
// within a [begin, end) range and carries these bounds with it. The iterator
// ensures that it is pointing within that [begin, end) range when it is
// dereferenced.
// within a [begin, end] range and carries these bounds with it. The iterator
// ensures that it is pointing within [begin, end) range when it is
// dereferenced. It also ensures that it is never iterated outside of
// [begin, end]. This is important for two reasons:
//
// Arithmetic operations are allowed and the bounds of the resulting iterator
// are not checked. Hence, it is possible to create an iterator pointing outside
// its range, but it is not possible to dereference it.
// 1. It allows `operator*` and `operator++` bounds checks to be `iter != end`.
// This is both less for the optimizer to prove, and aligns with how callers
// typically use iterators.
//
// 2. Advancing an iterator out of bounds is undefined behavior (see the table
// in [input.iterators]). In particular, when the underlying iterator is a
// pointer, it is undefined at the language level (see [expr.add]). If
// bounded iterators exhibited this undefined behavior, we risk compiler
// optimizations deleting non-redundant bounds checks.
template <class _Iterator, class = __enable_if_t< __libcpp_is_contiguous_iterator<_Iterator>::value > >
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this patch: can we do it for any random access iterator? Is there any reason why we need a contiguous iterator?

Copy link
Contributor Author

@davidben davidben Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would work with every random access iterator, in the sense of producing coherent behavior, but there are a couple of subtleties:

  1. A general random access iterator (I'm thinking std::map) may have slightly different performance properties. In particular, I don't think we'd want to write begin + n in <algorithm> with std::map because that'll end up walking the iterator a second time, and I'm not positive the first walk can be easily deleted. And since we don't unwrap the map iterators in std::copy, it's not necessary to check ahead of time. We can just lean on the wrapped operator++ check.

  2. Replacing iter with iter2 = bounded(begin, iter, end) means that iter2 becomes invalidated whenever end gets invalidated, not just iter. That's fine for contiguous ranges because the end iterator is just another pointer into the allocation. But the more complex containers, irritatingly, have different invalidation rules for end and general iterators! See Hardening checks should detect dereferencing map.end() and set.end() #80212 (comment) where I considered and then sadly had to reject basically this strategy for std::map.

  3. We may not need to store both begin and end for a general random access iterator. For example, if we ignored the iterator invalidation problem above, std::map only needs (iter, end). begin is only used to stop an errant operator--, and we can already detect that from the tree structure itself. (The only reason we can't detect end is that libc++ is extra clever and doesn't even have parent and right pointers on the end node at all. So attempting to read them will immediately go OOB.)

So my inclination would be to wait to see how we decide to solve the problem for non-contiguous random access and then make a decision about whether to generalize __bounded_iter then. My guess would be that we wouldn't end up wanting to reuse it anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random access iterators are required to have O(1) complexity for distance and other operations like it + n, so std::map::iterator is bidirectional, but not random-access. I get your point about maybe not wanting to implement __bounded_iter the same all the time, but I do think that the properties we use that hold for contiguous iterators also hold for random access iterators. Needs to be confirmed though.

Copy link
Contributor Author

@davidben davidben Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I hadn't realized random access iterators were not the std::map ones! (I don't actually know the STL very well at the level of iterator categories.). Yeah, maybe it would be useful to generalize it? I guess we can see when we get to hardening some random-access, non-contiguous iterator. Since __bounded_iter isn't public API or anything, we only need to care about what libc++'s own containers do.

Either way, as you say, unrelated to this patch. 😁

struct __bounded_iter {
using value_type = typename iterator_traits<_Iterator>::value_type;
Expand All @@ -51,8 +58,8 @@ struct __bounded_iter {

// Create a singular iterator.
//
// Such an iterator does not point to any object and is conceptually out of bounds, so it is
// not dereferenceable. Observing operations like comparison and assignment are valid.
// Such an iterator point past the end of an empty span, so it is not dereferenceable.
// Observing operations like comparison and assignment are valid.
_LIBCPP_HIDE_FROM_ABI __bounded_iter() = default;

_LIBCPP_HIDE_FROM_ABI __bounded_iter(__bounded_iter const&) = default;
Expand All @@ -70,18 +77,20 @@ struct __bounded_iter {

private:
// Create an iterator wrapping the given iterator, and whose bounds are described
// by the provided [begin, end) range.
// by the provided [begin, end] range.
//
// This constructor does not check whether the resulting iterator is within its bounds.
// However, it does check that the provided [begin, end) range is a valid range (that
// is, begin <= end).
// The constructor does not check whether the resulting iterator is within its bounds. It is a
// responsibility of the container to ensure that the given bounds are valid.
//
// Since it is non-standard for iterators to have this constructor, __bounded_iter must
// be created via `std::__make_bounded_iter`.
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 explicit __bounded_iter(
_Iterator __current, _Iterator __begin, _Iterator __end)
: __current_(__current), __begin_(__begin), __end_(__end) {
_LIBCPP_ASSERT_INTERNAL(__begin <= __end, "__bounded_iter(current, begin, end): [begin, end) is not a valid range");
_LIBCPP_ASSERT_INTERNAL(
__begin <= __current, "__bounded_iter(current, begin, end): current and begin are inconsistent");
_LIBCPP_ASSERT_INTERNAL(
__current <= __end, "__bounded_iter(current, begin, end): current and end are inconsistent");
}

template <class _It>
Expand All @@ -90,30 +99,37 @@ struct __bounded_iter {
public:
// Dereference and indexing operations.
//
// These operations check that the iterator is dereferenceable, that is within [begin, end).
// These operations check that the iterator is dereferenceable. Since the class invariant is
// that the iterator is always within `[begin, end]`, we only need to check it's not pointing to
// `end`. This is easier for the optimizer because it aligns with the `iter != container.end()`
// checks that typical callers already use (see
// https://github.com/llvm/llvm-project/issues/78829).
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 reference operator*() const _NOEXCEPT {
_LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(
__in_bounds(__current_), "__bounded_iter::operator*: Attempt to dereference an out-of-range iterator");
__current_ != __end_, "__bounded_iter::operator*: Attempt to dereference an iterator at the end");
return *__current_;
}

_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 pointer operator->() const _NOEXCEPT {
_LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(
__in_bounds(__current_), "__bounded_iter::operator->: Attempt to dereference an out-of-range iterator");
__current_ != __end_, "__bounded_iter::operator->: Attempt to dereference an iterator at the end");
return std::__to_address(__current_);
}

_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 reference operator[](difference_type __n) const _NOEXCEPT {
_LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(
__in_bounds(__current_ + __n), "__bounded_iter::operator[]: Attempt to index an iterator out-of-range");
__n >= __begin_ - __current_, "__bounded_iter::operator[]: Attempt to index an iterator past the start");
_LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(
__n < __end_ - __current_, "__bounded_iter::operator[]: Attempt to index an iterator at or past the end");
return __current_[__n];
}

// Arithmetic operations.
//
// These operations do not check that the resulting iterator is within the bounds, since that
// would make it impossible to create a past-the-end iterator.
// These operations check the iterator remains within `[begin, end]`.
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 __bounded_iter& operator++() _NOEXCEPT {
_LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(
__current_ != __end_, "__bounded_iter::operator++: Attempt to advance an iterator past the end");
++__current_;
return *this;
}
Expand All @@ -124,6 +140,8 @@ struct __bounded_iter {
}

_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 __bounded_iter& operator--() _NOEXCEPT {
_LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(
__current_ != __begin_, "__bounded_iter::operator--: Attempt to rewind an iterator past the start");
--__current_;
return *this;
}
Expand All @@ -134,6 +152,10 @@ struct __bounded_iter {
}

_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 __bounded_iter& operator+=(difference_type __n) _NOEXCEPT {
_LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(
__n >= __begin_ - __current_, "__bounded_iter::operator+=: Attempt to rewind an iterator past the start");
_LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(
__n <= __end_ - __current_, "__bounded_iter::operator+=: Attempt to advance an iterator past the end");
__current_ += __n;
return *this;
}
Expand All @@ -151,6 +173,10 @@ struct __bounded_iter {
}

_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX14 __bounded_iter& operator-=(difference_type __n) _NOEXCEPT {
_LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be any value in implementing this in terms of operator+=? Or would that make the call stack in case of a failed assertion too confusing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that requires negating __n, which may hit UB in itself. :-(

__n <= __current_ - __begin_, "__bounded_iter::operator-=: Attempt to rewind an iterator past the start");
_LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(
__n >= __current_ - __end_, "__bounded_iter::operator-=: Attempt to advance an iterator past the end");
__current_ -= __n;
return *this;
}
Expand Down Expand Up @@ -197,15 +223,10 @@ struct __bounded_iter {
}

private:
// Return whether the given iterator is in the bounds of this __bounded_iter.
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR bool __in_bounds(_Iterator const& __iter) const {
return __iter >= __begin_ && __iter < __end_;
}

template <class>
friend struct pointer_traits;
_Iterator __current_; // current iterator
_Iterator __begin_, __end_; // valid range represented as [begin, end)
_Iterator __begin_, __end_; // valid range represented as [begin, end]
};

template <class _It>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
//===----------------------------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
// UNSUPPORTED: c++03, c++11, c++14, c++17

// Make sure that std::span's iterators check for OOB accesses when the debug mode is enabled.

// REQUIRES: has-unix-headers, libcpp-has-abi-bounded-iterators
// UNSUPPORTED: libcpp-hardening-mode=none

#include <span>

#include "check_assertion.h"

struct Foo {
int x;
};

template <typename Iter>
void test_iterator(Iter begin, Iter end, bool reverse) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The errors produced by a reverse iterator can be a bit confusing for the user. @ldionne Do you think this is worth the trouble to try to fix? I suppose we could implement a separate checked reverse iterator type, but perhaps that's overkill.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think the test is fine as-is, those are implementation specific anyway, so this is good enough (IMO).

std::ptrdiff_t distance = std::distance(begin, end);

// Dereferencing an iterator at the end.
{
TEST_LIBCPP_ASSERT_FAILURE(
*end,
reverse ? "__bounded_iter::operator--: Attempt to rewind an iterator past the start"
: "__bounded_iter::operator*: Attempt to dereference an iterator at the end");
#if _LIBCPP_STD_VER >= 20
// In C++20 mode, std::reverse_iterator implements operator->, but not operator*, with
// std::prev instead of operator--. std::prev ultimately calls operator+
TEST_LIBCPP_ASSERT_FAILURE(
end->x,
reverse ? "__bounded_iter::operator+=: Attempt to rewind an iterator past the start"
: "__bounded_iter::operator->: Attempt to dereference an iterator at the end");
#else
TEST_LIBCPP_ASSERT_FAILURE(
end->x,
reverse ? "__bounded_iter::operator--: Attempt to rewind an iterator past the start"
: "__bounded_iter::operator->: Attempt to dereference an iterator at the end");
#endif
}

// Incrementing an iterator past the end.
{
[[maybe_unused]] const char* msg =
reverse ? "__bounded_iter::operator--: Attempt to rewind an iterator past the start"
: "__bounded_iter::operator++: Attempt to advance an iterator past the end";
auto it = end;
TEST_LIBCPP_ASSERT_FAILURE(it++, msg);
TEST_LIBCPP_ASSERT_FAILURE(++it, msg);
}

// Decrementing an iterator past the start.
{
[[maybe_unused]] const char* msg =
reverse ? "__bounded_iter::operator++: Attempt to advance an iterator past the end"
: "__bounded_iter::operator--: Attempt to rewind an iterator past the start";
auto it = begin;
TEST_LIBCPP_ASSERT_FAILURE(it--, msg);
TEST_LIBCPP_ASSERT_FAILURE(--it, msg);
}

// Advancing past the end with operator+= and operator+.
{
[[maybe_unused]] const char* msg =
reverse ? "__bounded_iter::operator-=: Attempt to rewind an iterator past the start"
: "__bounded_iter::operator+=: Attempt to advance an iterator past the end";
auto it = end;
TEST_LIBCPP_ASSERT_FAILURE(it += 1, msg);
TEST_LIBCPP_ASSERT_FAILURE(end + 1, msg);
it = begin;
TEST_LIBCPP_ASSERT_FAILURE(it += (distance + 1), msg);
TEST_LIBCPP_ASSERT_FAILURE(begin + (distance + 1), msg);
}

// Advancing past the end with operator-= and operator-.
{
[[maybe_unused]] const char* msg =
reverse ? "__bounded_iter::operator+=: Attempt to rewind an iterator past the start"
: "__bounded_iter::operator-=: Attempt to advance an iterator past the end";
auto it = end;
TEST_LIBCPP_ASSERT_FAILURE(it -= (-1), msg);
TEST_LIBCPP_ASSERT_FAILURE(end - (-1), msg);
it = begin;
TEST_LIBCPP_ASSERT_FAILURE(it -= (-distance - 1), msg);
TEST_LIBCPP_ASSERT_FAILURE(begin - (-distance - 1), msg);
}

// Rewinding past the start with operator+= and operator+.
{
[[maybe_unused]] const char* msg =
reverse ? "__bounded_iter::operator-=: Attempt to advance an iterator past the end"
: "__bounded_iter::operator+=: Attempt to rewind an iterator past the start";
auto it = begin;
TEST_LIBCPP_ASSERT_FAILURE(it += (-1), msg);
TEST_LIBCPP_ASSERT_FAILURE(begin + (-1), msg);
it = end;
TEST_LIBCPP_ASSERT_FAILURE(it += (-distance - 1), msg);
TEST_LIBCPP_ASSERT_FAILURE(end + (-distance - 1), msg);
}

// Rewinding past the start with operator-= and operator-.
{
[[maybe_unused]] const char* msg =
reverse ? "__bounded_iter::operator+=: Attempt to advance an iterator past the end"
: "__bounded_iter::operator-=: Attempt to rewind an iterator past the start";
auto it = begin;
TEST_LIBCPP_ASSERT_FAILURE(it -= 1, msg);
TEST_LIBCPP_ASSERT_FAILURE(begin - 1, msg);
it = end;
TEST_LIBCPP_ASSERT_FAILURE(it -= (distance + 1), msg);
TEST_LIBCPP_ASSERT_FAILURE(end - (distance + 1), msg);
}

// Out-of-bounds operator[].
{
[[maybe_unused]] const char* end_msg =
reverse ? "__bounded_iter::operator--: Attempt to rewind an iterator past the start"
: "__bounded_iter::operator[]: Attempt to index an iterator at or past the end";
[[maybe_unused]] const char* past_end_msg =
reverse ? "__bounded_iter::operator-=: Attempt to rewind an iterator past the start"
: "__bounded_iter::operator[]: Attempt to index an iterator at or past the end";
[[maybe_unused]] const char* past_start_msg =
reverse ? "__bounded_iter::operator-=: Attempt to advance an iterator past the end"
: "__bounded_iter::operator[]: Attempt to index an iterator past the start";
TEST_LIBCPP_ASSERT_FAILURE(begin[distance], end_msg);
TEST_LIBCPP_ASSERT_FAILURE(begin[distance + 1], past_end_msg);
TEST_LIBCPP_ASSERT_FAILURE(begin[-1], past_start_msg);
TEST_LIBCPP_ASSERT_FAILURE(begin[-99], past_start_msg);

auto it = begin + 1;
TEST_LIBCPP_ASSERT_FAILURE(it[distance - 1], end_msg);
TEST_LIBCPP_ASSERT_FAILURE(it[distance], past_end_msg);
TEST_LIBCPP_ASSERT_FAILURE(it[-2], past_start_msg);
TEST_LIBCPP_ASSERT_FAILURE(it[-99], past_start_msg);
}
}

int main(int, char**) {
// span<T>::iterator
{
Foo array[] = {{0}, {1}, {2}};
std::span<Foo> const span(array, 3);
test_iterator(span.begin(), span.end(), /*reverse=*/false);
}

// span<T, N>::iterator
{
Foo array[] = {{0}, {1}, {2}};
std::span<Foo, 3> const span(array, 3);
test_iterator(span.begin(), span.end(), /*reverse=*/false);
}

// span<T>::reverse_iterator
{
Foo array[] = {{0}, {1}, {2}};
std::span<Foo> const span(array, 3);
test_iterator(span.rbegin(), span.rend(), /*reverse=*/true);
}

// span<T, N>::reverse_iterator
{
Foo array[] = {{0}, {1}, {2}};
std::span<Foo, 3> const span(array, 3);
test_iterator(span.rbegin(), span.rend(), /*reverse=*/true);
}

return 0;
}
Loading