Ranged-for loop with _LIBCPP_ABI_BOUNDED_ITERATORS does not optimize runtime checks

When enabling bounded iterators, it seems Clang cannot optimize them out of a ranged-for loop. Here's a simple function that just sums over a span with a ranged-for loop, `sum1`, as well as a corresponding hand-written iterator loop, `sum2`.
https://godbolt.org/z/WqsPKdEoa

The left output, without `_LIBCPP_ABI_BOUNDED_ITERATORS` has no runtime checks, and Clang is even able to vectorize the loop. The right output, with `_LIBCPP_ABI_BOUNDED_ITERATORS` contains runtime checks with, in turn, impede the optimization.

**Cause**

I teased apart what `__bounded_iter` was doing. This is the assertion in question:
```
  // Return whether the given iterator is in the bounds of this __bounded_iter.
  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR bool __in_bounds(_Iterator const& __iter) const {
    return __iter >= __begin_ && __iter < __end_;
  }
```
I've extracted that into a minimal `sum3` function (same godbolt link), and that seems to reveal the problem. While Clang can prove that `iter >= begin`, it can't prove that `iter < end`. I suspect this is because iterator loops look like `iter != end`, not `iter < end`. In order to prove that `iter < end`, it must also know that `begin <= end`. Otherwise it's possible that `iter` was ahead of `end` to start with.

Indeed, if I add that assertion to the top, in `sum4` (same godbolt again), Clang pays for a *different* undesirable runtime check outside the loop, but then figures it out! So fixing https://github.com/llvm/llvm-project/issues/78784 may have further benefits, if the analysis is extended further.

**Alternative fix**

Another way Clang could have figured it out is if we used `iter != end` instead of `iter < end` for the safety assertion. That's actually what Chromium's checked iterator does. However, libc++ cannot do this because it doesn't check `operator+`, etc., only `operator*`. It is possible for `__current_` to overflow well past `__end_` before we do the check.

I think libc++ should bounds-check `operator+` too. First, once `__current_` has advanced past `__end_`, we may have already hit undefined behavior because pointer arithmetic is locked to the allocation. So the check in `operator*` may be too late anyway. Second, #78771 describes a place where libc++'s `memmove` specialization breaks its own hardening checks! Since `memmove` specialization is pretty valuable, I think it's best to fix #78771 by bounds-checking `operator+` and making sure the specialization takes triggers it early enough.

Now, once `operator+` is bounds-checked, `__bounded_iter` can rely on `__begin_ <= __current_ <= __end_`. That means the safety check need only be `__current_ != __end_`, which should be more easily optimized out.

(CC @danakj)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ranged-for loop with _LIBCPP_ABI_BOUNDED_ITERATORS does not optimize runtime checks #78829

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ranged-for loop with _LIBCPP_ABI_BOUNDED_ITERATORS does not optimize runtime checks #78829

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions