Description
Write-up source: https://hackmd.io/It7MysqPRzeuQZ2RZevz9w?edit
Summary
This document proposes to stabilize refs-to-static-in-constants. This feature permits one to create a constant expression that references a static:
struct Vtable { ptr: *const u32 };
static VT: Vtable = Vtable { ptr: std::ptr::null() };
const C: &Vtable = &VT;
On stable Rust, this stabilization does not introduce any surprising behavior. The resulting constants C
will be equal to the address of &VT
at runtime -- note that this value is not knowable at compilation time (certainly not early on in the compilation) and so must be represented abstractly (i.e., the compiler thinks of the value of C
as "the address of VT
", whereas std::ptr::null()
is an example of a constant pointer whose value is known to be 0).
Given the limited surface area, this stabilization has no interactions with stable const generics. However, it does have some implications for future const generics; those are discussed in the Future interactions section. The conclusion is that supporting refs-to-statics does not introduce new challenges for const generics that were not already present in some other form.
Procedural note
Const-refs-to-statics never had an RFC. This stabilization report could be made into an RFC if we think that makes sense. Niko's general opinion is that incremental extensions to const generics do not all seem worthy of RFCs, and yet there is some value to establish principles like 'statics have significant addresses that ought to be preserved'.
What is being proposed for stabilization
Background: const evaluation and const values
The term const evaluation refers to evaluating a constant expression at compilation time. The result of const evaluation is a const value. Const values can contain abstract pointers (e.g., the result of &VT
is "the address of the static VT
") that are not truly known. We cannot always know whether two const values should be considered equal or whether they would compare as equal at runtimer. Const values are used to store the initializer for a named static (static S: T = /* initializer */
), the values of named constants (const C: T = ...
), and the values of associated constants (<T as Something>::SIZE
).
What const_refs_to_static
allows
Currently, const evaluation does not allow a const
value to reference a static
. A program like this one therefore requires a feature gate:
#![feature(const_refs_to_static)]
static S: u32 = 66;
const C: u32 = S;
fn main() {
println!("{C}"); // prints 66
}
The feature gate allows not only reading the value of a static but also taking references (and even dereferencing them):
#![feature(const_refs_to_static)]
static S: u32 = 66;
const C: &u32 = &S;
const D: u32 = *C;
fn main() {
println!("{D}"); // prints 66
}
The "significant address" property
The key distinguishing feature of a static
versus any other form of variable is that they have a significant address. In short, &S
for some static S
is often expected to be the same pointer everywhere in the program whenever it occurs (but see the caveat below). This is distinct from a local variable, say, which may have different addresses on each invocation of the function1; it is also distinct from a constant like const { &22 }
, which can also refer to different memory locations (though they will always have 22
).
Const evaluation and constants preserve this property (playground):
#![feature(const_refs_to_static)]
static S: usize = 44;
const S_X: &usize = &S;
const S_Y: &usize = &S;
static T: usize = 44;
const T_X: &usize = &T;
const T_Y: &usize = &T;
fn main() {
// These assertions are guaranteed to be true.
assert!(std::ptr::eq(&S, S_X));
assert!(std::ptr::eq(S_X, S_Y));
assert!(std::ptr::eq(&T, T_X));
assert!(std::ptr::eq(T_X, T_Y));
assert!(!std::ptr::eq(&S, &T));
}
In contrast, the pointer values of constants are not guaranteed to be equal, and hence equivalent assertions would not be guaranteed to be true for these declarations (playground):
const S: usize = 44;
const S_X: &usize = &S;
const S_Y: &usize = &S;
const T: usize = 44;
const T_X: &usize = &T;
const T_Y: &usize = &T;
fn main() {
// These assertions hold in practice because LLVM coallesces
// pointers to constants, but that is a "best effort" optimization
// and they are not guaranteed to hold:
assert!(std::ptr::eq(&S, S_X));
assert!(std::ptr::eq(S_X, S_Y));
assert!(std::ptr::eq(&T, T_X));
assert!(std::ptr::eq(T_X, T_Y));
// As above, `S` and `T` are distinct constants, but they are coallesced
// in practice (not guaranteed):
assert!(std::ptr::eq(&S, &T));
}
Caveat (generic statics): Statics are currently forbidden from having generic parameters in large because it is not clear if and how the significant address property could be maintained given monomorphization. Future extensions of statics to support generics may revise the precise guarantee being offered here (e.g., to say that generic statics instantiated in distinct compilation units may sometimes have distinct addresses) and they would have to address how that interacts with constants.
Extern statics
Extern statics are treated conservatively. It is possible to get their address as a raw pointer but it is not possible to read from them (what would the value be) or to include a safe reference to them in your final value (playground):
#![feature(const_refs_to_static)]
extern {
static S: u32;
}
// ERROR cannot access extern static
const C: u32 = unsafe { S };
// ERROR encountered reference to `extern` static in `const`
const D: &u32 = unsafe { &S };
// OK
const E: *const u32 = unsafe { std::ptr::addr_of!(S) };
Freeze requirement
Const evaluation is not allowed to
- access the contents of any mutable static (whether that via interior mutability or
static mut
). - result in values that safely reference anything mutable (whether that is via interior mutability or
&mut
).
"Safely reference" here refers to recursively traversing the value in the same way safe code could (but ignoring visibility), i.e. recursing through references but not through raw pointers or unions.
It is possible to create static values with UnsafeCell
contents, but they can not typically be used from constants except in very narrow ways. For example, creating a constant whose value includes an UnsafeCell
(or a reference to memory contained in an unsafe cell) triggers an error that "it is undefined behavior to use this value":
#![feature(const_refs_to_static)]
#![feature(sync_unsafe_cell)] // required to use `SyncUnsafeCell`, trivial to do on stable
use std::cell::SyncUnsafeCell;
static S: SyncUnsafeCell<u32> = SyncUnsafeCell::new(66);
const C: &SyncUnsafeCell<u32> = &S; // ERROR: undefined behavior to use this value
Similarly attempting to access the contents of an unsafe cell results in "constant accesses mutable global memory":
#![feature(const_refs_to_static)]
#![feature(const_mut_refs)] // required to deref the raw pointer
#![feature(sync_unsafe_cell)] // required to use `SyncUnsafeCell`, trivial to do on stable
use std::cell::SyncUnsafeCell;
static S: SyncUnsafeCell<u32> = SyncUnsafeCell::new(66);
const C: u32 = unsafe { *S.get() }; // ERROR: constant accesses mutable global memory
It is however possible to use statics that have UnsafeCell
in other ways, e.g. returning a raw pointer to their contents:
#![feature(const_refs_to_static)]
#![feature(sync_unsafe_cell)]
use std::cell::SyncUnsafeCell;
static S: SyncUnsafeCell<u32> = SyncUnsafeCell::new(66);
const C: *mut u32 = S.get(); // OK
Static mut
Statics declared as static mut
generally behave "as if" they were enclosed in an unsafe cell (playground):
#![feature(const_refs_to_static)]
#![feature(const_mut_refs)]
static mut S: u32 = 0;
// ERROR constant accesses mutable global memory
const C: u32 = unsafe { S };
// ERROR it is undefined behavior to use this value
const D: &u32 = unsafe { &S };
// OK, requires feature(const_mut_refs)
const E: *mut u32 = unsafe { std::ptr::addr_of_mut!(S) };
The same is true of external statics (playground):
#![feature(const_refs_to_static)]
#![feature(const_mut_refs)]
extern {
static mut S: u32;
}
// ERROR constant accesses mutable global memory
const C: u32 = unsafe { S };
// ERROR it is undefined behavior to use this value
const D: &u32 = unsafe { &S };
// OK, requires feature(const_mut_refs)
const E: *mut u32 = unsafe { std::ptr::addr_of_mut!(S) };
Future interactions
Const generics refers to Rust items with generic parameters of kind const
, such as fn foo<const C: usize>()
. Stable Rust requires that const generic parameters have simple scalar types like usize
or i32
. This limitation means that there is no real interaction between the stable surface area of const generics and const_refs_to_static
.
So long as we do not extend const generics to permit values of &
-type, then there are no problems at all (but of course we limit what users can do, and in particular don't support &str
values). If however we wish to extend const generics to permit parameters of &
-type (e.g., fn foo<const C: &usize>()
), then we will need to extend the current implementation to preserve the "significant address" property. This section dives into detail as to why that property is not currently preserved, the various options to fix that, and some related challenges.
Background: Const generics and monomorphization
Given a function fn foo<const C: SomeType>()
, Rust's type system must be able to decide whether foo::<X>
and foo::<Y>
represent two different instances of the same generic function (or, equivalently, given struct Foo<const C: SomeType>
, whether Foo<X>
and Foo<Y>
are the same type). This requires being able to determine whether X
and Y
are equal (i.e., the same value). This equality comparison cannot be done for all const values since some of them lack a well-defined notion of equality (e.g., two values of type fn()
). Stable Rust sidesteps this issue by only permitting const generics where the type is a scalar value (e.g., u32
) and the constant expression can be evaluated to a fixed constant (in particular, the expression is not allowed to reference generic types).
Introducing valtrees
To support a richer set of values in const generics, nightly Rust makes use of valtrees. A valtree ("value tree") is a simplified form of const value consisting of "branch nodes" and "leaf nodes", which carry simple scalar values. The "value" of a const generic parameter is always a valtree, not an arbitrary const value.
For the simple types supported in const generics today, valtree conversion is infallible -- simply convert the scalar value to a leaf node. The same is true for ADTs composed of those simple types. Converting a (u32, u32)
tuple like (22, 44)
for example simply means you get a valtree like (I32LeafNode(22_i32), I32LeafNode(44_i32))
.
Valtrees do not carry type information. The same valtree (I32LeafNode(22_i32), I32LeafNode(44_i32))
that represents a tuple would also represent a fixed-length array like [22, 44]
or a value of struct Point { x: u32, y: u32 }
. At monomorphization time, generic constants have both a type and an associated valtree suitable for that type, and that type can be used to instantiate the valtree into an actual value.
Values of more complex types may not have a well-defined valtree. For example, there is no way to represent a fn()
value as a valtree. In the nightly version of const generics, whenever a const value is given as the value for a const generic, the compiler internally attempts to convert that const value to a valtree. This process can fail, in which case an error results. But if it succeeds, then the const generic can be compiled. Whenever the const generic argument is referenced, the valtree will be converted into a const value which can in turn be converted into a real value at runtime.
Example. Let's walk through an example supported on stable today:
fn test<const C: u32>() {
let x = C;
println!("{x}");
}
fn main() {
test::<{22 + 44}>();
}
- In
main
, the expression22 + 44
is const evaluated into a const valueConstVal(66)
. ConstVal(66)
is then converted into a valtreeI32LeafNode(66)
.- During codegen time, the function
test::<I32LeafNode(66)>
is compiled. - When
let x = C
is compiled,I32LeafNode(66)
is converted back toConstVal(66)
and from there the code is compiled to load a constant. Execution proceeds as expected.
Supporting references in valtrees
As currently implemented, references are ignored when creating a valtree, so the valtrees for 22
and &22
and even &&22
are all the same (just I32LeafNode(22)
). This preserves the property that, given two values X
and Y
, if valtree(X) == valtree(Y)
then x == y
. For refrences, this means that pointer equalty ought not be considered part of identity, since the ==
operator for &T
says that two references are equal if their referents are equal (and it doesn't consider the pointer address). Put another way, the Eq
trait doesn't respect "significant addresses", and valtrees are currently defined to align with Eq
, so they do not either.
The current definition of valtrees implies that const generics of type &usize
(or any reference) will preserve the value of the referent but not its address (as that is not part of the valtree). This can create observable behavior on nightly. Consider this example from #120961:
#![feature(const_refs_to_static)]
#![feature(adt_const_params)]
static FOO: usize = 42;
const BAR: &usize = &FOO;
fn foo<const X: &'static usize>() {
if std::ptr::eq(X, &FOO) {
// Never prints! But isn't `X == BAR == &FOO`??
println!("activating special mode");
}
}
fn main() {
foo::<BAR>();
}
When executed, this example does NOT print anything, even though you might expect that it would. What is happening?
- The value of
BAR
isConstVal(&FOO)
, which tracks that it is the address of the staticFOO
. - The value of
BAR
is converted into a valtree, which results in just42
(the value of the static is used to create the valtree). - When
foo::<Leaf(64)>
is compiled, the valtree must be converted into a&usize
. A new temporary value is synthesized. Thestr::ptr::eq
(which observes the physical pointer address) compares the address of this temporary toFOO
and they have different addresses.- In practice, an anonymous constant like
const BAZ: &'static u32 = &42
would typically be equal toX
, but that is because LLVM deduplicates such constants into a single allocation; such deduplication is also not guaranteed to occur, particularly across codegen units.
- In practice, an anonymous constant like
There is general agreement that this behavior is surprising and not desirable. But note that it requires multiple feature gates -- const_refs_to_static
AND adt_const_params
(and as of very recently, unsized_const_params
). Stabilizing just const_refs_to_static
does not really change anything. In other words, the problem with the above example is not due to permitting references to statics in constants, it's due to valtrees encoding references in a surprising way (though if you didn't have references to statics, you couldn't observe it).
Options to support references in const generics
So, what are the options for supporting reference types in const generics, while avoiding surprising examples like the one from #120961 above?
Option A: Disallow creating valtrees from references to statics
We could make valtree construction fail if it encounters a reference to a static (but succeed for references to anonymous constants). This would avoid the issues but only be preventing users from doing something they likely want to do. This program would not compile, for example, since it invokes foo
with the constant &S
:
fn foo<const C: &usize>() { }
static S: usize = 22;
fn main() {
foo::<&S>(); // People will want to do this!
}
This option is not very appealing, ecause users likely want to create valtrees that reference statics.
Option B: Extend valtree to represent ref-to-static
A more appealing option is to extend valtrees so that "ref-to-static" is something they can directly encode, and thus sacrifice the invariant that valtree(X) == valtree(Y)
implies X == Y
. This recognizes the fact that there are additional properties to values that we may wish to preserve beyond what is compared by the Eq
trait. Significant addresses are not the only examples of such properties, there are many that arise when const functions use unsafe code, such as the value of padding bits, provenance, and potentially things like which NaN is in use (if we wished to support f64
). We will have to decide which of them we wish to make observable in const evaluation.
The upshot: Stuff to figure out, but refs-to-statics doesn't make it harder
Notably that any of the solutions to making refs to statics not behave weirdly in const generics, wind up being strongly related to existing problems in const generics that already need to be solved. So while there are open questions here they don't actually really make anything worse (in my opinion).
I wouldn't want anyone to read this and come away wondering whether the feature should be blocked for a while until const generics stuff is figured out.
Links
Footnotes
-
And potentially even within a single function call, if the value is moved or becomes dead -- though arguably that is a separate variable. Precise limitations here still TBD. ↩