Skip to content

Commit d91f004

Browse files
committed
active discussion: validity invariants
1 parent fb2ef0c commit d91f004

File tree

1 file changed

+102
-0
lines changed

1 file changed

+102
-0
lines changed

active_discussion/validity.md

+102
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
# Data type validity requirements
2+
3+
This discussion is meant to focus on the question: Which invariants derived from
4+
types are there that the compiler expects to be *always* maintained, and
5+
(equivalently) that unsafe code must *always* uphold. This is what is called
6+
"validity invariant" in
7+
[Ralf's blog post](https://www.ralfj.de/blog/2018/08/22/two-kinds-of-invariants.html),
8+
but we might also decide to change that name.
9+
10+
### Interactions and constraints
11+
12+
Choices of invariants interact, in particular, with layout optimizations: For
13+
example, the fact that `Option<&T>` is pointer-sized relies on the fact that the
14+
validity invariant for `&T` rules out `0x0`, and hence we can use that value as
15+
signaling the `None` case.
16+
17+
Moreover, the invariants are constrained by attributes that we emit when
18+
generating LLVM IR. For example, we emit `aligned` attributes pretty much any
19+
time we can, which means it is probably a good idea to say that valid references
20+
must be aligned.
21+
22+
### Extent of "always"
23+
24+
One point we will have to figure out is what exactly "always" means. Thinking
25+
in terms of a semantics for MIR, data most probably needs to be valid any time
26+
it is copied, which primarily happens when executing assignment statements (the
27+
other cases are passing of function arguments and return values). However, it
28+
is less clear whether merely creating a place without accessing the data inside
29+
(such as in `&*x`) should require the data to be valid.
30+
31+
### Possible bit patterns
32+
33+
The validity invariant of a type is, basically, a set of bit patterns that is
34+
allowed to occur at that type. ("Basically" because the invariant may also be
35+
allowed to depend on memory.) To discuss this properly, we need to first agree
36+
on what "bit patterns" even are. It is certainly not enough to just consider
37+
sequences of 0 and 1, because we also need to take uninitialized data into
38+
account. For the purpose of this discussion, I think it is sufficient to
39+
consider every bit as being either 0, 1 or uninitialized.
40+
[That is not always sufficient](https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html),
41+
but I think we can mostly ignore the extra complications introduced by pointer
42+
values.
43+
44+
## Goals
45+
46+
* For every primitive type, determine which assumptions (if any) the compiler
47+
makes about values *not* occurring at that type (serving as a lower bound for
48+
what to declare invalid), and determine which popular patterns in unsafe code
49+
might create "interesting" values of this type that safe code cannot create on
50+
its own (serving as an upper bound for how much we want to declare invalid).
51+
Both of these bounds are soft, but informative.
52+
* Based on that, map out a design space of invariants that seem reasonable.
53+
* Determine when exactly the validity invariant is assumed to hold.
54+
55+
## Active threads
56+
57+
To start, we will create threads for each major category of types.
58+
59+
* Integers and floating point types
60+
** Do we allow values that contain uninitialized bits? If yes, what are the
61+
rules for arithmetic and logical operations involving uninitialized bits,
62+
e.g. in cases like `x * 0`?
63+
64+
* Raw pointers
65+
** Do we allow values that contain uninitialized bits?
66+
** Are there any requirements on the metadata?
67+
68+
* References
69+
** Presumably, references must be non-NULL.
70+
** They probably also must be aligned, but is that required every time a
71+
reference is taken? Also see the [ongoing discussion in RFC 2582][RFC2582].
72+
** Can there ever be uninitialized bits in a reference?
73+
** Do they have to be dereferencable? What exactly does that even mean?
74+
** Does `&[mut] T` have to point to data that is valid at `T`? This interacts
75+
with the question of whether `&*x` is allowed when `x` is a well-aligned
76+
non-null dereferencable pointer that points to invalid data.
77+
** Out of scope: aliasing rules
78+
79+
* Function pointers
80+
** Presumably, these must be non-NULL. Anything else? Can there ever be
81+
uninitialized bits?
82+
83+
* Unions
84+
** Do we make any restrictions here, or are unions just "bags of bits" that may
85+
contain anything? That would mean we can do no layout optimizations.
86+
87+
* Enums
88+
** Is there anything to say besides: The discriminant must be valid, and all
89+
fields of the active variant must be valid at their respective types?
90+
** The padding between fields can be anything, including uninitialized.
91+
92+
* Structs, tuples, arrays and all other aggregates (closures, ...)
93+
** Is there anything to say besides: All fields must be valid at their
94+
respective types?
95+
** The padding between fields can be anything, including uninitialized. It was
96+
** [recently determined][generators-maybe-uninit] that generators behave
97+
** different from other aggregates here. Are we okay with that? Should we push
98+
** for generator fields to reflect this in their types?
99+
100+
[RFC2582]: https://github.com/rust-lang/rfcs/pull/2582
101+
[generators-maybe-uninit]: https://github.com/rust-lang/rust/pull/56100
102+

0 commit comments

Comments
 (0)