Skip to content

Commit e7111e7

Browse files
committed
Refocus unsafe code chapter on unsafe itself.
1 parent 5910dc0 commit e7111e7

File tree

1 file changed

+113
-168
lines changed

1 file changed

+113
-168
lines changed

src/doc/trpl/unsafe-code.md

+113-168
Original file line numberDiff line numberDiff line change
@@ -1,183 +1,128 @@
11
% Unsafe Code
22

3-
# Introduction
4-
5-
Rust aims to provide safe abstractions over the low-level details of
6-
the CPU and operating system, but sometimes one needs to drop down and
7-
write code at that level. This guide aims to provide an overview of
8-
the dangers and power one gets with Rust's unsafe subset.
9-
10-
Rust provides an escape hatch in the form of the `unsafe { ... }`
11-
block which allows the programmer to dodge some of the compiler's
12-
checks and do a wide range of operations, such as:
13-
14-
- dereferencing [raw pointers](#raw-pointers)
15-
- calling a function via FFI ([covered by the FFI guide](ffi.html))
16-
- casting between types bitwise (`transmute`, aka "reinterpret cast")
17-
- [inline assembly](#inline-assembly)
18-
19-
Note that an `unsafe` block does not relax the rules about lifetimes
20-
of `&` and the freezing of borrowed data.
21-
22-
Any use of `unsafe` is the programmer saying "I know more than you" to
23-
the compiler, and, as such, the programmer should be very sure that
24-
they actually do know more about why that piece of code is valid. In
25-
general, one should try to minimize the amount of unsafe code in a
26-
code base; preferably by using the bare minimum `unsafe` blocks to
27-
build safe interfaces.
28-
29-
> **Note**: the low-level details of the Rust language are still in
30-
> flux, and there is no guarantee of stability or backwards
31-
> compatibility. In particular, there may be changes that do not cause
32-
> compilation errors, but do cause semantic changes (such as invoking
33-
> undefined behaviour). As such, extreme care is required.
34-
35-
# Pointers
36-
37-
## References
38-
39-
One of Rust's biggest features is memory safety. This is achieved in
40-
part via [the ownership system](ownership.html), which is how the
41-
compiler can guarantee that every `&` reference is always valid, and,
42-
for example, never pointing to freed memory.
43-
44-
These restrictions on `&` have huge advantages. However, they also
45-
constrain how we can use them. For example, `&` doesn't behave
46-
identically to C's pointers, and so cannot be used for pointers in
47-
foreign function interfaces (FFI). Additionally, both immutable (`&`)
48-
and mutable (`&mut`) references have some aliasing and freezing
49-
guarantees, required for memory safety.
50-
51-
In particular, if you have an `&T` reference, then the `T` must not be
52-
modified through that reference or any other reference. There are some
53-
standard library types, e.g. `Cell` and `RefCell`, that provide inner
54-
mutability by replacing compile time guarantees with dynamic checks at
55-
runtime.
56-
57-
An `&mut` reference has a different constraint: when an object has an
58-
`&mut T` pointing into it, then that `&mut` reference must be the only
59-
such usable path to that object in the whole program. That is, an
60-
`&mut` cannot alias with any other references.
61-
62-
Using `unsafe` code to incorrectly circumvent and violate these
63-
restrictions is undefined behaviour. For example, the following
64-
creates two aliasing `&mut` pointers, and is invalid.
65-
66-
```
67-
use std::mem;
68-
let mut x: u8 = 1;
69-
70-
let ref_1: &mut u8 = &mut x;
71-
let ref_2: &mut u8 = unsafe { mem::transmute(&mut *ref_1) };
72-
73-
// oops, ref_1 and ref_2 point to the same piece of data (x) and are
74-
// both usable
75-
*ref_1 = 10;
76-
*ref_2 = 20;
3+
Rust’s main draw is its powerful static guarantees about behavior. But safety
4+
checks are conservative by nature: there are some programs that are actually
5+
safe, but the compiler is not able to verify this is true. To write these kinds
6+
of programs, we need to tell the compiler to relax its restrictions a bit. For
7+
this, Rust has a keyword, `unsafe`. Code using `unsafe` has less restrictions
8+
than normal code does.
9+
10+
Let’s go over the syntax, and then we’ll talk semantics. `unsafe` is used in
11+
two contexts. The first one is to mark a function as unsafe:
12+
13+
```rust
14+
unsafe fn danger_will_robinson() {
15+
// scary stuff
16+
}
7717
```
7818

79-
## Raw pointers
80-
81-
Rust offers two additional pointer types (*raw pointers*), written as
82-
`*const T` and `*mut T`. They're an approximation of C's `const T*` and `T*`
83-
respectively; indeed, one of their most common uses is for FFI,
84-
interfacing with external C libraries.
85-
86-
Raw pointers have much fewer guarantees than other pointer types
87-
offered by the Rust language and libraries. For example, they
88-
89-
- are not guaranteed to point to valid memory and are not even
90-
guaranteed to be non-null (unlike both `Box` and `&`);
91-
- do not have any automatic clean-up, unlike `Box`, and so require
92-
manual resource management;
93-
- are plain-old-data, that is, they don't move ownership, again unlike
94-
`Box`, hence the Rust compiler cannot protect against bugs like
95-
use-after-free;
96-
- lack any form of lifetimes, unlike `&`, and so the compiler cannot
97-
reason about dangling pointers; and
98-
- have no guarantees about aliasing or mutability other than mutation
99-
not being allowed directly through a `*const T`.
100-
101-
Fortunately, they come with a redeeming feature: the weaker guarantees
102-
mean weaker restrictions. The missing restrictions make raw pointers
103-
appropriate as a building block for implementing things like smart
104-
pointers and vectors inside libraries. For example, `*` pointers are
105-
allowed to alias, allowing them to be used to write shared-ownership
106-
types like reference counted and garbage collected pointers, and even
107-
thread-safe shared memory types (`Rc` and the `Arc` types are both
108-
implemented entirely in Rust).
109-
110-
There are two things that you are required to be careful about
111-
(i.e. require an `unsafe { ... }` block) with raw pointers:
112-
113-
- dereferencing: they can have any value: so possible results include
114-
a crash, a read of uninitialised memory, a use-after-free, or
115-
reading data as normal.
116-
- pointer arithmetic via the `offset` [intrinsic](#intrinsics) (or
117-
`.offset` method): this intrinsic uses so-called "in-bounds"
118-
arithmetic, that is, it is only defined behaviour if the result is
119-
inside (or one-byte-past-the-end) of the object from which the
120-
original pointer came.
121-
122-
The latter assumption allows the compiler to optimize more
123-
effectively. As can be seen, actually *creating* a raw pointer is not
124-
unsafe, and neither is converting to an integer.
125-
126-
### References and raw pointers
127-
128-
At runtime, a raw pointer `*` and a reference pointing to the same
129-
piece of data have an identical representation. In fact, an `&T`
130-
reference will implicitly coerce to an `*const T` raw pointer in safe code
131-
and similarly for the `mut` variants (both coercions can be performed
132-
explicitly with, respectively, `value as *const T` and `value as *mut T`).
133-
134-
Going the opposite direction, from `*const` to a reference `&`, is not
135-
safe. A `&T` is always valid, and so, at a minimum, the raw pointer
136-
`*const T` has to point to a valid instance of type `T`. Furthermore,
137-
the resulting pointer must satisfy the aliasing and mutability laws of
138-
references. The compiler assumes these properties are true for any
139-
references, no matter how they are created, and so any conversion from
140-
raw pointers is asserting that they hold. The programmer *must*
141-
guarantee this.
142-
143-
The recommended method for the conversion is
19+
All functions called from [FFI][ffi] must be marked as `unsafe`, for example.
20+
The second use of `unsafe` is an unsafe block:
14421

145-
```
146-
let i: u32 = 1;
147-
// explicit cast
148-
let p_imm: *const u32 = &i as *const u32;
149-
let mut m: u32 = 2;
150-
// implicit coercion
151-
let p_mut: *mut u32 = &mut m;
22+
[ffi]: ffi.html
15223

24+
```rust
15325
unsafe {
154-
let ref_imm: &u32 = &*p_imm;
155-
let ref_mut: &mut u32 = &mut *p_mut;
26+
// scary stuff
15627
}
15728
```
15829

159-
The `&*x` dereferencing style is preferred to using a `transmute`.
160-
The latter is far more powerful than necessary, and the more
161-
restricted operation is harder to use incorrectly; for example, it
162-
requires that `x` is a pointer (unlike `transmute`).
30+
It’s important to be able to explicitly delineate code that may have bugs that
31+
cause big problems. If a Rust program segfaults, you can be sure it’s somewhere
32+
in the sections marked `unsafe`.
33+
34+
# What does ‘safe’ mean?
35+
36+
Safe, in the context of Rust, means “doesn’t do anything unsafe.” Easy!
37+
38+
Okay, let’s try again: what is not safe to do? Here’s a list:
39+
40+
* Data races
41+
* Dereferencing a null/dangling raw pointer
42+
* Reads of [undef][undef] (uninitialized) memory
43+
* Breaking the [pointer aliasing rules][aliasing] with raw pointers.
44+
* `&mut T` and `&T` follow LLVM’s scoped [noalias][noalias] model, except if
45+
the `&T` contains an `UnsafeCell<U>`. Unsafe code must not violate these
46+
aliasing guarantees.
47+
* Mutating an immutable value/reference without `UnsafeCell<U>`
48+
* Invoking undefined behavior via compiler intrinsics:
49+
* Indexing outside of the bounds of an object with `std::ptr::offset`
50+
(`offset` intrinsic), with
51+
the exception of one byte past the end which is permitted.
52+
* Using `std::ptr::copy_nonoverlapping_memory` (`memcpy32`/`memcpy64`
53+
intrinsics) on overlapping buffers
54+
* Invalid values in primitive types, even in private fields/locals:
55+
* Null/dangling references or boxes
56+
* A value other than `false` (0) or `true` (1) in a `bool`
57+
* A discriminant in an `enum` not included in its type definition
58+
* A value in a `char` which is a surrogate or above `char::MAX`
59+
* Non-UTF-8 byte sequences in a `str`
60+
* Unwinding into Rust from foreign code or unwinding from Rust into foreign
61+
code.
62+
63+
[noalias]: http://llvm.org/docs/LangRef.html#noalias
64+
[undef]: http://llvm.org/docs/LangRef.html#undefined-values
65+
[aliasing]: http://llvm.org/docs/LangRef.html#pointer-aliasing-rules
66+
67+
Whew! That’s a bunch of stuff. It’s also important to notice all kinds of
68+
behaviors that are certainly bad, but are expressly _not_ unsafe:
69+
70+
* Deadlocks
71+
* Reading data from private fields
72+
* Leaks due to reference count cycles
73+
* Exiting without calling destructors
74+
* Sending signals
75+
* Accessing/modifying the file system
76+
* Integer overflow
77+
78+
Rust cannot prevent all kinds of software problems. Buggy code can and will be
79+
written in Rust. These things arne’t great, but they don’t qualify as `unsafe`
80+
specifically.
81+
82+
# Unsafe Superpowers
83+
84+
In both unsafe functions and unsafe blocks, Rust will let you do three things
85+
that you normally can not do. Just three. Here they are:
86+
87+
1. Access or update a [static mutable variable][static].
88+
2. Dereference a raw pointer.
89+
3. Call unsafe functions. This is the most powerful ability.
90+
91+
That’s it. It’s important that `unsafe` does not, for example, ‘turn off the
92+
borrow checker’. Adding `unsafe` to some random Rust code doesn’t change its
93+
semantics, it won’t just start accepting anything.
94+
95+
But it will let you write things that _do_ break some of the rules. Let’s go
96+
over these three abilities in order.
97+
98+
## Access or update a `static mut`
99+
100+
Rust has a feature called ‘`static mut`’ which allows for mutable global state.
101+
Doing so can cause a data race, and as such is inherently not safe. For more
102+
details, see the [static][static] section of the book.
103+
104+
[static]: static.html
105+
106+
## Dereference a raw pointer
107+
108+
Raw pointers let you do arbitrary pointer arithmetic, and can cause a number of
109+
different memory safety and security issues. In some senses, the ability to
110+
dereference an arbitrary pointer is one of the most dangerous things you can
111+
do. For more on raw pointers, see [their section of the book][rawpointers].
112+
113+
[rawpointers]: raw-pointers.html
163114

115+
## Call unsafe functions
164116

117+
This last ability works with both aspects of `unsafe`: you can only call
118+
functions marked `unsafe` from inside an unsafe block.
165119

166-
## Making the unsafe safe(r)
120+
This ability is powerful and varied. Rust exposes some [compiler
121+
intrinsics][intrinsics] as unsafe functions, and some unsafe functions bypass
122+
safety checks, trading safety for speed.
167123

168-
There are various ways to expose a safe interface around some unsafe
169-
code:
124+
I’ll repeat again: even though you _can_ do arbitrary things in unsafe blocks
125+
and functions doesn’t mean you should. The compiler will act as though you’re
126+
upholding its invariants, so be careful!
170127

171-
- store pointers privately (i.e. not in public fields of public
172-
structs), so that you can see and control all reads and writes to
173-
the pointer in one place.
174-
- use `assert!()` a lot: since you can't rely on the protection of the
175-
compiler & type-system to ensure that your `unsafe` code is correct
176-
at compile-time, use `assert!()` to verify that it is doing the
177-
right thing at run-time.
178-
- implement the `Drop` for resource clean-up via a destructor, and use
179-
RAII (Resource Acquisition Is Initialization). This reduces the need
180-
for any manual memory management by users, and automatically ensures
181-
that clean-up is always run, even when the thread panics.
182-
- ensure that any data stored behind a raw pointer is destroyed at the
183-
appropriate time.
128+
[intrinsics]: intrinsics.html

0 commit comments

Comments
 (0)