You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: reference/src/glossary.md
+111-111
Original file line number
Diff line number
Diff line change
@@ -55,7 +55,77 @@ somewhat differently from this definition. However, that's considered a low
55
55
level detail of a particular Rust implementation. When programming Rust, the
56
56
Abstract Rust Machine is intended to operate according to the definition here.
57
57
58
-
### (Pointer) Provenance
58
+
### Interior mutability
59
+
60
+
*Interior Mutation* means mutating memory where there also exists a live shared reference pointing to the same memory; or mutating memory through a pointer derived from a shared reference.
61
+
"live" here means a value that will be "used again" later.
62
+
"derived from" means that the pointer was obtained by casting a shared reference and potentially adding an offset.
63
+
This is not yet precisely defined, which will be fixed as part of developing a precise aliasing model.
64
+
65
+
Finding live shared references propagates recursively through references, but not through raw pointers.
66
+
So, for example, if data immediately pointed to by a `&T` or `& &mut T` is mutated, that's interior mutability.
67
+
If data immediately pointed to by a `*const T` or `&*const T` is mutated, that's *not* interior mutability.
68
+
69
+
*Interior mutability* refers to the ability to perform interior mutation without causing UB.
70
+
All interior mutation in Rust has to happen inside an [`UnsafeCell`](https://doc.rust-lang.org/core/cell/struct.UnsafeCell.html), so all data structures that have interior mutability must (directly or indirectly) use `UnsafeCell` for this purpose.
71
+
72
+
### Layout
73
+
[layout]: #layout
74
+
75
+
The *layout* of a type defines its size and alignment as well as the offsets of its subobjects (e.g. fields of structs/unions/enum/... or elements of arrays).
76
+
Moreover, the layout of a type records its *function call ABI* (or just *ABI* for short): how the type is passed *by value* across a function boundary.
77
+
78
+
Note: Originally, *layout* and *representation* were treated as synonyms, and Rust language features like the `#[repr]` attribute reflect this.
79
+
In this document, *layout* and *representation* are not synonyms.
80
+
81
+
### Niche
82
+
83
+
The *niche* of a type determines invalid bit-patterns that will be used by layout optimizations.
84
+
85
+
For example, `&mut T` has at least one niche, the "all zeros" bit-pattern. This
86
+
niche is used by layout optimizations like ["`enum` discriminant
87
+
elision"](layout/enums.html#discriminant-elision-on-option-like-enums) to
88
+
guarantee that `Option<&mut T>` has the same size as `&mut T`.
89
+
90
+
While all niches are invalid bit-patterns, not all invalid bit-patterns are
91
+
niches. For example, the "all bits uninitialized" is an invalid bit-pattern for
92
+
`&mut T`, but this bit-pattern cannot be used by layout optimizations, and is not a
93
+
niche.
94
+
95
+
### Padding
96
+
[padding]: #padding
97
+
98
+
*Padding* (of a type `T`) refers to the space that the compiler leaves between fields of a struct or enum variant to satisfy alignment requirements, and before/after variants of a union or enum to make all variants equally sized.
99
+
100
+
Padding can be thought of as the type containing secret fields of type `[Pad; N]` for some hypothetical type `Pad` (of size 1) with the following properties:
101
+
*`Pad` is valid for any byte, i.e., it has the same validity invariant as `MaybeUninit<u8>`.
102
+
* Copying `Pad` ignores the source byte, and writes *any* value to the target byte. Or, equivalently (in terms of Abstract Machine behavior), copying `Pad` marks the target byte as uninitialized.
103
+
104
+
Note that padding is a property of the *type* and not the memory: reading from the padding of an `&Foo` (by casting to a byte reference) may produce initialized values if the `&Foo` is pointing to memory that was initialized (for example, if it was originally a byte buffer initialized to `0`), but the moment you perform a typed copy out of that reference you will have uninitialized padding bytes in the copy.
105
+
106
+
107
+
We can also define padding in terms of the [representation relation]:
108
+
A byte at index `i` is a padding byte for type `T` if,
109
+
for all values `v` and lists of bytes `b` such that `v` and `b` are related at `T` (let's write this `Vrel_T(v, b)`),
110
+
changing `b` at index `i` to any other byte yields a `b'` such `v` and `b'` are related (`Vrel_T(v, b')`).
111
+
In other words, the byte at index `i` is entirely ignored by `Vrel_T` (the value relation for `T`), and two lists of bytes that only differ in padding bytes relate to the same value(s), if any.
112
+
113
+
This definition works fine for product types (structs, tuples, arrays, ...).
114
+
The desired notion of "padding byte" for enums and unions is still unclear.
115
+
116
+
### Place
117
+
118
+
A *place* (called "lvalue" in C and "glvalue" in C++) is the result of computing a [*place expression*][place-value-expr].
119
+
A place is basically a pointer (pointing to some location in memory, potentially carrying [provenance](#pointer-provenance)), but might contain more information such as size or alignment (the details will have to be determined as the Rust Abstract Machine gets specified more precisely).
120
+
A place has a type, indicating the type of [values](#value) that it stores.
121
+
122
+
The key operations on a place are:
123
+
* Storing a [value](#value) of the same type in it (when it is used on the left-hand side of an assignment).
124
+
* Loading a [value](#value) of the same type from it (through the place-to-value coercion).
125
+
* Converting between a place (of type `T`) and a pointer value (of type `&T`, `&mut T`, `*const T` or `*mut T`) using the `&` and `*` operators.
126
+
This is also the only way a place can be "stored": by converting it to a value first.
127
+
128
+
### Pointer Provenance
59
129
60
130
The *provenance* of a pointer is used to distinguish pointers that point to the same memory address (i.e., pointers that, when cast to `usize`, will compare equal).
61
131
Provenance is extra state that only exists in the Rust Abstract Machine; it is needed to specify program behavior but not present any more when the program runs on real hardware.
@@ -95,19 +165,43 @@ For some more information, see [this document proposing a more precise definitio
95
165
Another example of pointer provenance is the "tag" from [Stacked Borrows][stacked-borrows].
96
166
For some more information, see [this blog post](https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html).
*Interior Mutation* means mutating memory where there also exists a live shared reference pointing to the same memory; or mutating memory through a pointer derived from a shared reference.
101
-
"live" here means a value that will be "used again" later.
102
-
"derived from" means that the pointer was obtained by casting a shared reference and potentially adding an offset.
103
-
This is not yet precisely defined, which will be fixed as part of developing a precise aliasing model.
171
+
A *representation* of a [value](#value) is a list of bytes that is used to store or "represent" that value in memory.
104
172
105
-
Finding live shared references propagates recursively through references, but not through raw pointers.
106
-
So, for example, if data immediately pointed to by a `&T` or `& &mut T` is mutated, that's interior mutability.
107
-
If data immediately pointed to by a `*const T` or `&*const T` is mutated, that's *not* interior mutability.
173
+
We also sometimes speak of the *representation of a type*; this should more correctly be called the *representation relation* as it relates values of this type to lists of bytes that represent this value.
174
+
The term "relation" here is used in the mathematical sense: the representation relation is a predicate that, given a value and a list of bytes, says whether this value is represented by that list of bytes (`val -> list byte -> Prop`).
108
175
109
-
*Interior mutability* refers to the ability to perform interior mutation without causing UB.
110
-
All interior mutation in Rust has to happen inside an [`UnsafeCell`](https://doc.rust-lang.org/core/cell/struct.UnsafeCell.html), so all data structures that have interior mutability must (directly or indirectly) use `UnsafeCell` for this purpose.
176
+
The relation should be functional for a fixed list of bytes (i.e., every list of bytes has at most one associated representation).
177
+
It is partial in both directions: not all values have a representation (e.g. the mathematical integer `300` has no representation at type `u8`), and not all lists of bytes correspond to a value of a specific type (e.g. lists of the wrong size correspond to no value, and the list consisting of the single byte `0x10` corresponds to no value of type `bool`).
178
+
For a fixed value, there can be many representations (e.g., when considering type `#[repr(C)] Pair(u8, u16)`, the second byte is a [padding byte][padding] so changing it does not affect the value represented by a list of bytes).
179
+
180
+
See the [value domain][value-domain] for an example how values and representation relations can be made more precise.
181
+
182
+
### Soundness (of code / of a library)
183
+
[soundness]: #soundness-of-code--of-a-library
184
+
185
+
*Soundness* is a type system concept (actually originating from the study of logics) and means that the type system is "correct" in the sense that well-typed programs actually have the desired properties.
186
+
For Rust, this means well-typed programs cannot cause [Undefined Behavior][ub].
187
+
This promise only extends to safe code however; for `unsafe` code, it is up to the programmer to uphold this contract.
188
+
189
+
Accordingly, we say that a library (or an individual function) is *sound* if it is impossible for safe code to cause Undefined Behavior using its public API.
190
+
Conversely, the library/function is *unsound* if safe code *can* cause Undefined Behavior.
191
+
192
+
### Undefined Behavior
193
+
[ub]: #undefined-behavior
194
+
195
+
*Undefined Behavior* is a concept of the contract between the Rust programmer and the compiler:
196
+
The programmer promises that the code exhibits no undefined behavior.
197
+
In return, the compiler promises to compile the code in a way that the final program does on the real hardware what the source program does according to the Rust Abstract Machine.
198
+
If it turns out the program *does* have undefined behavior, the contract is void, and the program produced by the compiler is essentially garbage (in particular, it is not bound by any specification; the program does not even have to be well-formed executable code).
199
+
200
+
In Rust, the [Nomicon](https://doc.rust-lang.org/nomicon/what-unsafe-does.html) and the [Reference](https://doc.rust-lang.org/reference/behavior-considered-undefined.html) both have a list of behavior that the language considers undefined.
201
+
Rust promises that safe code cannot cause Undefined Behavior---the compiler and authors of unsafe code takes the burden of this contract on themselves.
202
+
For unsafe code, however, the burden is still on the programmer.
203
+
204
+
Also see: [Soundness][soundness].
111
205
112
206
### Validity and safety invariant
113
207
@@ -146,95 +240,6 @@ Moreover, such unsafe code must not return a non-UTF-8 string to the "outside" o
146
240
To summarize: *Data must always be valid, but it only must be safe in safe code.*
147
241
For some more information, see [this blog post](https://www.ralfj.de/blog/2018/08/22/two-kinds-of-invariants.html).
148
242
149
-
### Undefined Behavior
150
-
[ub]: #undefined-behavior
151
-
152
-
*Undefined Behavior* is a concept of the contract between the Rust programmer and the compiler:
153
-
The programmer promises that the code exhibits no undefined behavior.
154
-
In return, the compiler promises to compile the code in a way that the final program does on the real hardware what the source program does according to the Rust Abstract Machine.
155
-
If it turns out the program *does* have undefined behavior, the contract is void, and the program produced by the compiler is essentially garbage (in particular, it is not bound by any specification; the program does not even have to be well-formed executable code).
156
-
157
-
In Rust, the [Nomicon](https://doc.rust-lang.org/nomicon/what-unsafe-does.html) and the [Reference](https://doc.rust-lang.org/reference/behavior-considered-undefined.html) both have a list of behavior that the language considers undefined.
158
-
Rust promises that safe code cannot cause Undefined Behavior---the compiler and authors of unsafe code takes the burden of this contract on themselves.
159
-
For unsafe code, however, the burden is still on the programmer.
160
-
161
-
Also see: [Soundness][soundness].
162
-
163
-
### Soundness (of code / of a library)
164
-
[soundness]: #soundness-of-code--of-a-library
165
-
166
-
*Soundness* is a type system concept (actually originating from the study of logics) and means that the type system is "correct" in the sense that well-typed programs actually have the desired properties.
167
-
For Rust, this means well-typed programs cannot cause [Undefined Behavior][ub].
168
-
This promise only extends to safe code however; for `unsafe` code, it is up to the programmer to uphold this contract.
169
-
170
-
Accordingly, we say that a library (or an individual function) is *sound* if it is impossible for safe code to cause Undefined Behavior using its public API.
171
-
Conversely, the library/function is *unsound* if safe code *can* cause Undefined Behavior.
172
-
173
-
### Layout
174
-
[layout]: #layout
175
-
176
-
The *layout* of a type defines its size and alignment as well as the offsets of its subobjects (e.g. fields of structs/unions/enum/... or elements of arrays).
177
-
Moreover, the layout of a type records its *function call ABI* (or just *ABI* for short): how the type is passed *by value* across a function boundary.
178
-
179
-
Note: Originally, *layout* and *representation* were treated as synonyms, and Rust language features like the `#[repr]` attribute reflect this.
180
-
In this document, *layout* and *representation* are not synonyms.
181
-
182
-
### Niche
183
-
184
-
The *niche* of a type determines invalid bit-patterns that will be used by layout optimizations.
185
-
186
-
For example, `&mut T` has at least one niche, the "all zeros" bit-pattern. This
187
-
niche is used by layout optimizations like ["`enum` discriminant
188
-
elision"](layout/enums.html#discriminant-elision-on-option-like-enums) to
189
-
guarantee that `Option<&mut T>` has the same size as `&mut T`.
190
-
191
-
While all niches are invalid bit-patterns, not all invalid bit-patterns are
192
-
niches. For example, the "all bits uninitialized" is an invalid bit-pattern for
193
-
`&mut T`, but this bit-pattern cannot be used by layout optimizations, and is not a
194
-
niche.
195
-
196
-
### Zero-sized type / ZST
197
-
198
-
Types with zero size are called zero-sized types, which is abbreviated as "ZST".
199
-
This document also uses the "1-ZST" abbreviation, which stands for "one-aligned
200
-
zero-sized type", to refer to zero-sized types with an alignment requirement of 1.
201
-
202
-
For example, `()` is a "1-ZST" but `[u16; 0]` is not because it has an alignment
203
-
requirement of 2.
204
-
205
-
### Padding
206
-
[padding]: #padding
207
-
208
-
*Padding* (of a type `T`) refers to the space that the compiler leaves between fields of a struct or enum variant to satisfy alignment requirements, and before/after variants of a union or enum to make all variants equally sized.
209
-
210
-
Padding can be thought of as the type containing secret fields of type `[Pad; N]` for some hypothetical type `Pad` (of size 1) with the following properties:
211
-
*`Pad` is valid for any byte, i.e., it has the same validity invariant as `MaybeUninit<u8>`.
212
-
* Copying `Pad` ignores the source byte, and writes *any* value to the target byte. Or, equivalently (in terms of Abstract Machine behavior), copying `Pad` marks the target byte as uninitialized.
213
-
214
-
Note that padding is a property of the *type* and not the memory: reading from the padding of an `&Foo` (by casting to a byte reference) may produce initialized values if the `&Foo` is pointing to memory that was initialized (for example, if it was originally a byte buffer initialized to `0`), but the moment you perform a typed copy out of that reference you will have uninitialized padding bytes in the copy.
215
-
216
-
217
-
We can also define padding in terms of the [representation relation]:
218
-
A byte at index `i` is a padding byte for type `T` if,
219
-
for all values `v` and lists of bytes `b` such that `v` and `b` are related at `T` (let's write this `Vrel_T(v, b)`),
220
-
changing `b` at index `i` to any other byte yields a `b'` such `v` and `b'` are related (`Vrel_T(v, b')`).
221
-
In other words, the byte at index `i` is entirely ignored by `Vrel_T` (the value relation for `T`), and two lists of bytes that only differ in padding bytes relate to the same value(s), if any.
222
-
223
-
This definition works fine for product types (structs, tuples, arrays, ...).
224
-
The desired notion of "padding byte" for enums and unions is still unclear.
225
-
226
-
### Place
227
-
228
-
A *place* (called "lvalue" in C and "glvalue" in C++) is the result of computing a [*place expression*][place-value-expr].
229
-
A place is basically a pointer (pointing to some location in memory, potentially carrying [provenance](#pointer-provenance)), but might contain more information such as size or alignment (the details will have to be determined as the Rust Abstract Machine gets specified more precisely).
230
-
A place has a type, indicating the type of [values](#value) that it stores.
231
-
232
-
The key operations on a place are:
233
-
* Storing a [value](#value) of the same type in it (when it is used on the left-hand side of an assignment).
234
-
* Loading a [value](#value) of the same type from it (through the place-to-value coercion).
235
-
* Converting between a place (of type `T`) and a pointer value (of type `&T`, `&mut T`, `*const T` or `*mut T`) using the `&` and `*` operators.
236
-
This is also the only way a place can be "stored": by converting it to a value first.
237
-
238
243
### Value
239
244
240
245
A *value* (called "value of the expression" or "rvalue" in C and "prvalue" in C++) is what gets stored in a [place](#place), and also the result of computing a [*value expression*][place-value-expr].
@@ -245,19 +250,14 @@ Values can be (according to their type) turned into a list of bytes, which is ca
245
250
Values are ephemeral; they arise during the computation of an instruction but are only ever persisted in memory through their representation.
246
251
(This is comparable to how run-time data in a program is ephemeral and is only ever persisted in serialized form.)
A *representation* of a [value](#value) is a list of bytes that is used to store or "represent" that value in memory.
252
-
253
-
We also sometimes speak of the *representation of a type*; this should more correctly be called the *representation relation* as it relates values of this type to lists of bytes that represent this value.
254
-
The term "relation" here is used in the mathematical sense: the representation relation is a predicate that, given a value and a list of bytes, says whether this value is represented by that list of bytes (`val -> list byte -> Prop`).
253
+
### Zero-sized type / ZST
255
254
256
-
The relation should be functional for a fixed list of bytes (i.e., every list of bytes has at most one associated representation).
257
-
It is partial in both directions: not all values have a representation (e.g. the mathematical integer `300` has no representation at type `u8`), and not all lists of bytes correspond to a value of a specific type (e.g. lists of the wrong size correspond to no value, and the list consisting of the single byte `0x10` corresponds to no value of type `bool`).
258
-
For a fixed value, there can be many representations (e.g., when considering type`#[repr(C)] Pair(u8, u16)`, the second byte is a [padding byte][padding] so changing it does not affect the value represented by a list of bytes).
255
+
Types with zero size are called zero-sized types, which is abbreviated as "ZST".
256
+
This document also uses the "1-ZST" abbreviation, which stands for "one-aligned
257
+
zero-sized type", to refer to zero-sized types with an alignment requirement of 1.
259
258
260
-
See the [value domain][value-domain] for an example how values and representation relations can be made more precise.
259
+
For example, `()` is a "1-ZST" but `[u16; 0]` is not because it has an alignment
0 commit comments