Description
When enabling bounded iterators, it seems Clang cannot optimize them out of a ranged-for loop. Here's a simple function that just sums over a span with a ranged-for loop, sum1
, as well as a corresponding hand-written iterator loop, sum2
.
https://godbolt.org/z/WqsPKdEoa
The left output, without _LIBCPP_ABI_BOUNDED_ITERATORS
has no runtime checks, and Clang is even able to vectorize the loop. The right output, with _LIBCPP_ABI_BOUNDED_ITERATORS
contains runtime checks with, in turn, impede the optimization.
Cause
I teased apart what __bounded_iter
was doing. This is the assertion in question:
// Return whether the given iterator is in the bounds of this __bounded_iter.
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR bool __in_bounds(_Iterator const& __iter) const {
return __iter >= __begin_ && __iter < __end_;
}
I've extracted that into a minimal sum3
function (same godbolt link), and that seems to reveal the problem. While Clang can prove that iter >= begin
, it can't prove that iter < end
. I suspect this is because iterator loops look like iter != end
, not iter < end
. In order to prove that iter < end
, it must also know that begin <= end
. Otherwise it's possible that iter
was ahead of end
to start with.
Indeed, if I add that assertion to the top, in sum4
(same godbolt again), Clang pays for a different undesirable runtime check outside the loop, but then figures it out! So fixing #78784 may have further benefits, if the analysis is extended further.
Alternative fix
Another way Clang could have figured it out is if we used iter != end
instead of iter < end
for the safety assertion. That's actually what Chromium's checked iterator does. However, libc++ cannot do this because it doesn't check operator+
, etc., only operator*
. It is possible for __current_
to overflow well past __end_
before we do the check.
I think libc++ should bounds-check operator+
too. First, once __current_
has advanced past __end_
, we may have already hit undefined behavior because pointer arithmetic is locked to the allocation. So the check in operator*
may be too late anyway. Second, #78771 describes a place where libc++'s memmove
specialization breaks its own hardening checks! Since memmove
specialization is pretty valuable, I think it's best to fix #78771 by bounds-checking operator+
and making sure the specialization takes triggers it early enough.
Now, once operator+
is bounds-checked, __bounded_iter
can rely on __begin_ <= __current_ <= __end_
. That means the safety check need only be __current_ != __end_
, which should be more easily optimized out.
(CC @danakj)