pointer addition semantics are unclear

### Location

[core::ptr::add](https://doc.rust-lang.org/std/primitive.pointer.html#method.add)
https://github.com/rust-lang/rust/blob/33855f80d4393bff5d1226eabf8e61f348179cee/library/core/src/ptr/const_ptr.rs#L810-L854

[core::ptr::offset](https://doc.rust-lang.org/std/primitive.pointer.html#method.offset-1)
https://github.com/rust-lang/rust/blob/33855f80d4393bff5d1226eabf8e61f348179cee/library/core/src/ptr/const_ptr.rs#L349-L393

### Summary

The documentation for `core::ptr::add` currently states that it:

https://github.com/rust-lang/rust/blob/33855f80d4393bff5d1226eabf8e61f348179cee/library/core/src/ptr/const_ptr.rs#L810

Except it's not a convenience for just doing `.offset(count as isize)`, both according to Miri and the constant interpreter:

```rust
#[no_mangle]
pub const fn mul() -> i32 {
    const {
        let x = &[0i32; 2];
        let x = x.as_ptr().wrapping_add(1);
        unsafe {
            // has UB according to MIRI and fails compilation on nightly if you comment this out
            // mul nuw i64 -1, 4 is poison
            // x.add(!0).read();
            
            // DB and passes compilation
            // mul nsw i64 -1, 4 is defined
            x.offset(-1).read();
        }
        x[0]
    }
}
```

So one might infer that the following requirement:

https://github.com/rust-lang/rust/blob/33855f80d4393bff5d1226eabf8e61f348179cee/library/core/src/ptr/const_ptr.rs#L819

means that the `count * size_of::<T>()` must not overflow in the unsigned sense for `add` and signed sense for `offset`.

However, it's unclear if `overflowing isize` means the unsigned multiplication result has to be positive, or if the unsigned multiplication also has to not wrap in the signed sense. Both Miri and the constant interpreter allow the following:

```rust
#[no_mangle]
pub const fn mulnsw() -> i32 {
    const {
        let x = &[0i32; 2];
        let x = x.as_ptr().wrapping_add(1);
        let offset = !0usize >> 2; // will be -4 when multiplied
        let mut ret = 0;
        unsafe {
            // has no UB according to MIRI and does not fail compilation on nightly
            // does not wrap in the unsigned sense
            // however wraps in the signed sense
            // has UB in our codegen
            ret = x.add(offset).read();
            
            // UB in codegen due to signed overflow
            // both MIRI and constant interpreter fail
            // ret = x.offset(offset as isize).read();
        }
        ret
    }
}
```

However, `x.add(offset)` will also get lowered to `getelementptr inbounds i32, %x, i64 %offset`, which has UB according to the LLVM [langref](https://llvm.org/docs/LangRef.html#getelementptr-instruction).

> The multiplication of an index by the type size does not wrap the pointer index type in a signed sense (mul nsw).

If the `core::ptr::add` does have defined behavior in the above case, then we can't directly use `getelementptr inbounds` for types larger than a byte, without doing a `mul nuw` beforehand, and then doing `getelementptr inbounds i8`. However, if it's UB, then the docs should make it clear.


So `core::ptr::add` still carries the responsibility of preventing signed overflow in the multiplication(at least according to our codegen semantics). However, it's unclear if "must not overflow isize", means that the unsigned result has to fit in `[0, isize::MAX]`, as the following passes:

```rust
#[no_mangle]
pub const fn addaddr() -> i32 {
    const {
        let x = &[0i32; 2];
        let x = x.as_ptr().wrapping_add(1).cast::<u8>();
        unsafe {
            // -1 * 1 doesn't overflow in both the signed and unsigned sense
            // however the result doesn't fit in [0, isize::MAX]
            // passes MIRI and the constant interpreter
            x.byte_add(!0).read();
        }
        x[0]
    }
}
```

To summarize:

1. Does the multiplication of `count * size_of::<T>()` need to not overflow in both the signed and unsigned sense for `core::ptr::add`?
2. In `core::ptr::add`, is the resulting offset treated as a signed integer or unsigned integer for the following requirement: https://github.com/rust-lang/rust/blob/33855f80d4393bff5d1226eabf8e61f348179cee/library/core/src/ptr/const_ptr.rs#L821-L824

Rationale for inquiry:

LLVM recently added a `nuw` attribute to the `getelementptr` instruction in order to inform the optimizer that the offset is non-negative. I was checking the `core::ptr::add` documentation to see if we could use it for that, however, I found some contradictory information w.r.t the docs and the current behavior of the constant interpreter and Miri.

	/// Adds an offset to a pointer (convenience for `.offset(count as isize)`).
	///
	/// `count` is in units of T; e.g., a `count` of 3 represents a pointer
	/// offset of `3 * size_of::<T>()` bytes.
	///
	/// # Safety
	///
	/// If any of the following conditions are violated, the result is Undefined Behavior:
	///
	/// * The computed offset, `count * size_of::<T>()` bytes, must not overflow `isize`.
	///
	/// * If the computed offset is non-zero, then `self` must be derived from a pointer to some
	/// [allocated object], and the entire memory range between `self` and the result must be in
	/// bounds of that allocated object. In particular, this range must not "wrap around" the edge
	/// of the address space.
	///
	/// Allocated objects can never be larger than `isize::MAX` bytes, so if the computed offset
	/// stays in bounds of the allocated object, it is guaranteed to satisfy the first requirement.
	/// This implies, for instance, that `vec.as_ptr().add(vec.len())` (for `vec: Vec<T>`) is always
	/// safe.
	///
	/// Consider using [`wrapping_add`] instead if these constraints are
	/// difficult to satisfy. The only advantage of this method is that it
	/// enables more aggressive compiler optimizations.
	///
	/// [`wrapping_add`]: #method.wrapping_add
	/// [allocated object]: crate::ptr#allocated-object
	///
	/// # Examples
	///
	/// ```
	/// let s: &str = "123";
	/// let ptr: *const u8 = s.as_ptr();
	///
	/// unsafe {
	/// assert_eq!(*ptr.add(1), b'2');
	/// assert_eq!(*ptr.add(2), b'3');
	/// }
	/// ```
	#[stable(feature = "pointer_methods", since = "1.26.0")]
	#[must_use = "returns a new pointer rather than modifying its argument"]
	#[rustc_const_stable(feature = "const_ptr_offset", since = "1.61.0")]
	#[inline(always)]
	#[cfg_attr(miri, track_caller)] // even without panics, this helps for Miri backtraces
	pub const unsafe fn add(self, count: usize) -> Self

	/// Adds an offset to a pointer.
	///
	/// `count` is in units of T; e.g., a `count` of 3 represents a pointer
	/// offset of `3 * size_of::<T>()` bytes.
	///
	/// # Safety
	///
	/// If any of the following conditions are violated, the result is Undefined Behavior:
	///
	/// * The computed offset, `count * size_of::<T>()` bytes, must not overflow `isize`.
	///
	/// * If the computed offset is non-zero, then `self` must be derived from a pointer to some
	/// [allocated object], and the entire memory range between `self` and the result must be in
	/// bounds of that allocated object. In particular, this range must not "wrap around" the edge
	/// of the address space.
	///
	/// Allocated objects can never be larger than `isize::MAX` bytes, so if the computed offset
	/// stays in bounds of the allocated object, it is guaranteed to satisfy the first requirement.
	/// This implies, for instance, that `vec.as_ptr().add(vec.len())` (for `vec: Vec<T>`) is always
	/// safe.
	///
	/// Consider using [`wrapping_offset`] instead if these constraints are
	/// difficult to satisfy. The only advantage of this method is that it
	/// enables more aggressive compiler optimizations.
	///
	/// [`wrapping_offset`]: #method.wrapping_offset
	/// [allocated object]: crate::ptr#allocated-object
	///
	/// # Examples
	///
	/// ```
	/// let s: &str = "123";
	/// let ptr: *const u8 = s.as_ptr();
	///
	/// unsafe {
	/// assert_eq!(*ptr.offset(1) as char, '2');
	/// assert_eq!(*ptr.offset(2) as char, '3');
	/// }
	/// ```
	#[stable(feature = "rust1", since = "1.0.0")]
	#[must_use = "returns a new pointer rather than modifying its argument"]
	#[rustc_const_stable(feature = "const_ptr_offset", since = "1.61.0")]
	#[inline(always)]
	#[cfg_attr(miri, track_caller)] // even without panics, this helps for Miri backtraces
	pub const unsafe fn offset(self, count: isize) -> *const T

	/// * If the computed offset is non-zero, then `self` must be derived from a pointer to some
	/// [allocated object], and the entire memory range between `self` and the result must be in
	/// bounds of that allocated object. In particular, this range must not "wrap around" the edge
	/// of the address space.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pointer addition semantics are unclear #130211

Location

Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

pointer addition semantics are unclear #130211

Description

Location

Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions