|
1 | 1 | % Unsafe Code
|
2 | 2 |
|
3 |
| -# Introduction |
4 |
| - |
5 |
| -Rust aims to provide safe abstractions over the low-level details of |
6 |
| -the CPU and operating system, but sometimes one needs to drop down and |
7 |
| -write code at that level. This guide aims to provide an overview of |
8 |
| -the dangers and power one gets with Rust's unsafe subset. |
9 |
| - |
10 |
| -Rust provides an escape hatch in the form of the `unsafe { ... }` |
11 |
| -block which allows the programmer to dodge some of the compiler's |
12 |
| -checks and do a wide range of operations, such as: |
13 |
| - |
14 |
| -- dereferencing [raw pointers](#raw-pointers) |
15 |
| -- calling a function via FFI ([covered by the FFI guide](ffi.html)) |
16 |
| -- casting between types bitwise (`transmute`, aka "reinterpret cast") |
17 |
| -- [inline assembly](#inline-assembly) |
18 |
| - |
19 |
| -Note that an `unsafe` block does not relax the rules about lifetimes |
20 |
| -of `&` and the freezing of borrowed data. |
21 |
| - |
22 |
| -Any use of `unsafe` is the programmer saying "I know more than you" to |
23 |
| -the compiler, and, as such, the programmer should be very sure that |
24 |
| -they actually do know more about why that piece of code is valid. In |
25 |
| -general, one should try to minimize the amount of unsafe code in a |
26 |
| -code base; preferably by using the bare minimum `unsafe` blocks to |
27 |
| -build safe interfaces. |
28 |
| - |
29 |
| -> **Note**: the low-level details of the Rust language are still in |
30 |
| -> flux, and there is no guarantee of stability or backwards |
31 |
| -> compatibility. In particular, there may be changes that do not cause |
32 |
| -> compilation errors, but do cause semantic changes (such as invoking |
33 |
| -> undefined behaviour). As such, extreme care is required. |
34 |
| -
|
35 |
| -# Pointers |
36 |
| - |
37 |
| -## References |
38 |
| - |
39 |
| -One of Rust's biggest features is memory safety. This is achieved in |
40 |
| -part via [the ownership system](ownership.html), which is how the |
41 |
| -compiler can guarantee that every `&` reference is always valid, and, |
42 |
| -for example, never pointing to freed memory. |
43 |
| - |
44 |
| -These restrictions on `&` have huge advantages. However, they also |
45 |
| -constrain how we can use them. For example, `&` doesn't behave |
46 |
| -identically to C's pointers, and so cannot be used for pointers in |
47 |
| -foreign function interfaces (FFI). Additionally, both immutable (`&`) |
48 |
| -and mutable (`&mut`) references have some aliasing and freezing |
49 |
| -guarantees, required for memory safety. |
50 |
| - |
51 |
| -In particular, if you have an `&T` reference, then the `T` must not be |
52 |
| -modified through that reference or any other reference. There are some |
53 |
| -standard library types, e.g. `Cell` and `RefCell`, that provide inner |
54 |
| -mutability by replacing compile time guarantees with dynamic checks at |
55 |
| -runtime. |
56 |
| - |
57 |
| -An `&mut` reference has a different constraint: when an object has an |
58 |
| -`&mut T` pointing into it, then that `&mut` reference must be the only |
59 |
| -such usable path to that object in the whole program. That is, an |
60 |
| -`&mut` cannot alias with any other references. |
61 |
| - |
62 |
| -Using `unsafe` code to incorrectly circumvent and violate these |
63 |
| -restrictions is undefined behaviour. For example, the following |
64 |
| -creates two aliasing `&mut` pointers, and is invalid. |
65 |
| - |
66 |
| -``` |
67 |
| -use std::mem; |
68 |
| -let mut x: u8 = 1; |
69 |
| -
|
70 |
| -let ref_1: &mut u8 = &mut x; |
71 |
| -let ref_2: &mut u8 = unsafe { mem::transmute(&mut *ref_1) }; |
72 |
| -
|
73 |
| -// oops, ref_1 and ref_2 point to the same piece of data (x) and are |
74 |
| -// both usable |
75 |
| -*ref_1 = 10; |
76 |
| -*ref_2 = 20; |
| 3 | +Rust’s main draw is its powerful static guarantees about behavior. But safety |
| 4 | +checks are conservative by nature: there are some programs that are actually |
| 5 | +safe, but the compiler is not able to verify this is true. To write these kinds |
| 6 | +of programs, we need to tell the compiler to relax its restrictions a bit. For |
| 7 | +this, Rust has a keyword, `unsafe`. Code using `unsafe` has less restrictions |
| 8 | +than normal code does. |
| 9 | + |
| 10 | +Let’s go over the syntax, and then we’ll talk semantics. `unsafe` is used in |
| 11 | +two contexts. The first one is to mark a function as unsafe: |
| 12 | + |
| 13 | +```rust |
| 14 | +unsafe fn danger_will_robinson() { |
| 15 | + // scary stuff |
| 16 | +} |
77 | 17 | ```
|
78 | 18 |
|
79 |
| -## Raw pointers |
80 |
| - |
81 |
| -Rust offers two additional pointer types (*raw pointers*), written as |
82 |
| -`*const T` and `*mut T`. They're an approximation of C's `const T*` and `T*` |
83 |
| -respectively; indeed, one of their most common uses is for FFI, |
84 |
| -interfacing with external C libraries. |
85 |
| - |
86 |
| -Raw pointers have much fewer guarantees than other pointer types |
87 |
| -offered by the Rust language and libraries. For example, they |
88 |
| - |
89 |
| -- are not guaranteed to point to valid memory and are not even |
90 |
| - guaranteed to be non-null (unlike both `Box` and `&`); |
91 |
| -- do not have any automatic clean-up, unlike `Box`, and so require |
92 |
| - manual resource management; |
93 |
| -- are plain-old-data, that is, they don't move ownership, again unlike |
94 |
| - `Box`, hence the Rust compiler cannot protect against bugs like |
95 |
| - use-after-free; |
96 |
| -- lack any form of lifetimes, unlike `&`, and so the compiler cannot |
97 |
| - reason about dangling pointers; and |
98 |
| -- have no guarantees about aliasing or mutability other than mutation |
99 |
| - not being allowed directly through a `*const T`. |
100 |
| - |
101 |
| -Fortunately, they come with a redeeming feature: the weaker guarantees |
102 |
| -mean weaker restrictions. The missing restrictions make raw pointers |
103 |
| -appropriate as a building block for implementing things like smart |
104 |
| -pointers and vectors inside libraries. For example, `*` pointers are |
105 |
| -allowed to alias, allowing them to be used to write shared-ownership |
106 |
| -types like reference counted and garbage collected pointers, and even |
107 |
| -thread-safe shared memory types (`Rc` and the `Arc` types are both |
108 |
| -implemented entirely in Rust). |
109 |
| - |
110 |
| -There are two things that you are required to be careful about |
111 |
| -(i.e. require an `unsafe { ... }` block) with raw pointers: |
112 |
| - |
113 |
| -- dereferencing: they can have any value: so possible results include |
114 |
| - a crash, a read of uninitialised memory, a use-after-free, or |
115 |
| - reading data as normal. |
116 |
| -- pointer arithmetic via the `offset` [intrinsic](#intrinsics) (or |
117 |
| - `.offset` method): this intrinsic uses so-called "in-bounds" |
118 |
| - arithmetic, that is, it is only defined behaviour if the result is |
119 |
| - inside (or one-byte-past-the-end) of the object from which the |
120 |
| - original pointer came. |
121 |
| - |
122 |
| -The latter assumption allows the compiler to optimize more |
123 |
| -effectively. As can be seen, actually *creating* a raw pointer is not |
124 |
| -unsafe, and neither is converting to an integer. |
125 |
| - |
126 |
| -### References and raw pointers |
127 |
| - |
128 |
| -At runtime, a raw pointer `*` and a reference pointing to the same |
129 |
| -piece of data have an identical representation. In fact, an `&T` |
130 |
| -reference will implicitly coerce to an `*const T` raw pointer in safe code |
131 |
| -and similarly for the `mut` variants (both coercions can be performed |
132 |
| -explicitly with, respectively, `value as *const T` and `value as *mut T`). |
133 |
| - |
134 |
| -Going the opposite direction, from `*const` to a reference `&`, is not |
135 |
| -safe. A `&T` is always valid, and so, at a minimum, the raw pointer |
136 |
| -`*const T` has to point to a valid instance of type `T`. Furthermore, |
137 |
| -the resulting pointer must satisfy the aliasing and mutability laws of |
138 |
| -references. The compiler assumes these properties are true for any |
139 |
| -references, no matter how they are created, and so any conversion from |
140 |
| -raw pointers is asserting that they hold. The programmer *must* |
141 |
| -guarantee this. |
142 |
| - |
143 |
| -The recommended method for the conversion is |
| 19 | +All functions called from [FFI][ffi] must be marked as `unsafe`, for example. |
| 20 | +The second use of `unsafe` is an unsafe block: |
144 | 21 |
|
145 |
| -``` |
146 |
| -let i: u32 = 1; |
147 |
| -// explicit cast |
148 |
| -let p_imm: *const u32 = &i as *const u32; |
149 |
| -let mut m: u32 = 2; |
150 |
| -// implicit coercion |
151 |
| -let p_mut: *mut u32 = &mut m; |
| 22 | +[ffi]: ffi.html |
152 | 23 |
|
| 24 | +```rust |
153 | 25 | unsafe {
|
154 |
| - let ref_imm: &u32 = &*p_imm; |
155 |
| - let ref_mut: &mut u32 = &mut *p_mut; |
| 26 | + // scary stuff |
156 | 27 | }
|
157 | 28 | ```
|
158 | 29 |
|
159 |
| -The `&*x` dereferencing style is preferred to using a `transmute`. |
160 |
| -The latter is far more powerful than necessary, and the more |
161 |
| -restricted operation is harder to use incorrectly; for example, it |
162 |
| -requires that `x` is a pointer (unlike `transmute`). |
| 30 | +It’s important to be able to explicitly delineate code that may have bugs that |
| 31 | +cause big problems. If a Rust program segfaults, you can be sure it’s somewhere |
| 32 | +in the sections marked `unsafe`. |
| 33 | + |
| 34 | +# What does ‘safe’ mean? |
| 35 | + |
| 36 | +Safe, in the context of Rust, means “doesn’t do anything unsafe.” Easy! |
| 37 | + |
| 38 | +Okay, let’s try again: what is not safe to do? Here’s a list: |
| 39 | + |
| 40 | +* Data races |
| 41 | +* Dereferencing a null/dangling raw pointer |
| 42 | +* Reads of [undef][undef] (uninitialized) memory |
| 43 | +* Breaking the [pointer aliasing rules][aliasing] with raw pointers. |
| 44 | +* `&mut T` and `&T` follow LLVM’s scoped [noalias][noalias] model, except if |
| 45 | + the `&T` contains an `UnsafeCell<U>`. Unsafe code must not violate these |
| 46 | + aliasing guarantees. |
| 47 | +* Mutating an immutable value/reference without `UnsafeCell<U>` |
| 48 | +* Invoking undefined behavior via compiler intrinsics: |
| 49 | + * Indexing outside of the bounds of an object with `std::ptr::offset` |
| 50 | + (`offset` intrinsic), with |
| 51 | + the exception of one byte past the end which is permitted. |
| 52 | + * Using `std::ptr::copy_nonoverlapping_memory` (`memcpy32`/`memcpy64` |
| 53 | + intrinsics) on overlapping buffers |
| 54 | +* Invalid values in primitive types, even in private fields/locals: |
| 55 | + * Null/dangling references or boxes |
| 56 | + * A value other than `false` (0) or `true` (1) in a `bool` |
| 57 | + * A discriminant in an `enum` not included in its type definition |
| 58 | + * A value in a `char` which is a surrogate or above `char::MAX` |
| 59 | + * Non-UTF-8 byte sequences in a `str` |
| 60 | +* Unwinding into Rust from foreign code or unwinding from Rust into foreign |
| 61 | + code. |
| 62 | + |
| 63 | +[noalias]: http://llvm.org/docs/LangRef.html#noalias |
| 64 | +[undef]: http://llvm.org/docs/LangRef.html#undefined-values |
| 65 | +[aliasing]: http://llvm.org/docs/LangRef.html#pointer-aliasing-rules |
| 66 | + |
| 67 | +Whew! That’s a bunch of stuff. It’s also important to notice all kinds of |
| 68 | +behaviors that are certainly bad, but are expressly _not_ unsafe: |
| 69 | + |
| 70 | +* Deadlocks |
| 71 | +* Reading data from private fields |
| 72 | +* Leaks due to reference count cycles |
| 73 | +* Exiting without calling destructors |
| 74 | +* Sending signals |
| 75 | +* Accessing/modifying the file system |
| 76 | +* Integer overflow |
| 77 | + |
| 78 | +Rust cannot prevent all kinds of software problems. Buggy code can and will be |
| 79 | +written in Rust. These things arne’t great, but they don’t qualify as `unsafe` |
| 80 | +specifically. |
| 81 | + |
| 82 | +# Unsafe Superpowers |
| 83 | + |
| 84 | +In both unsafe functions and unsafe blocks, Rust will let you do three things |
| 85 | +that you normally can not do. Just three. Here they are: |
| 86 | + |
| 87 | +1. Access or update a [static mutable variable][static]. |
| 88 | +2. Dereference a raw pointer. |
| 89 | +3. Call unsafe functions. This is the most powerful ability. |
| 90 | + |
| 91 | +That’s it. It’s important that `unsafe` does not, for example, ‘turn off the |
| 92 | +borrow checker’. Adding `unsafe` to some random Rust code doesn’t change its |
| 93 | +semantics, it won’t just start accepting anything. |
| 94 | + |
| 95 | +But it will let you write things that _do_ break some of the rules. Let’s go |
| 96 | +over these three abilities in order. |
| 97 | + |
| 98 | +## Access or update a `static mut` |
| 99 | + |
| 100 | +Rust has a feature called ‘`static mut`’ which allows for mutable global state. |
| 101 | +Doing so can cause a data race, and as such is inherently not safe. For more |
| 102 | +details, see the [static][static] section of the book. |
| 103 | + |
| 104 | +[static]: static.html |
| 105 | + |
| 106 | +## Dereference a raw pointer |
| 107 | + |
| 108 | +Raw pointers let you do arbitrary pointer arithmetic, and can cause a number of |
| 109 | +different memory safety and security issues. In some senses, the ability to |
| 110 | +dereference an arbitrary pointer is one of the most dangerous things you can |
| 111 | +do. For more on raw pointers, see [their section of the book][rawpointers]. |
| 112 | + |
| 113 | +[rawpointers]: raw-pointers.html |
163 | 114 |
|
| 115 | +## Call unsafe functions |
164 | 116 |
|
| 117 | +This last ability works with both aspects of `unsafe`: you can only call |
| 118 | +functions marked `unsafe` from inside an unsafe block. |
165 | 119 |
|
166 |
| -## Making the unsafe safe(r) |
| 120 | +This ability is powerful and varied. Rust exposes some [compiler |
| 121 | +intrinsics][intrinsics] as unsafe functions, and some unsafe functions bypass |
| 122 | +safety checks, trading safety for speed. |
167 | 123 |
|
168 |
| -There are various ways to expose a safe interface around some unsafe |
169 |
| -code: |
| 124 | +I’ll repeat again: even though you _can_ do arbitrary things in unsafe blocks |
| 125 | +and functions doesn’t mean you should. The compiler will act as though you’re |
| 126 | +upholding its invariants, so be careful! |
170 | 127 |
|
171 |
| -- store pointers privately (i.e. not in public fields of public |
172 |
| - structs), so that you can see and control all reads and writes to |
173 |
| - the pointer in one place. |
174 |
| -- use `assert!()` a lot: since you can't rely on the protection of the |
175 |
| - compiler & type-system to ensure that your `unsafe` code is correct |
176 |
| - at compile-time, use `assert!()` to verify that it is doing the |
177 |
| - right thing at run-time. |
178 |
| -- implement the `Drop` for resource clean-up via a destructor, and use |
179 |
| - RAII (Resource Acquisition Is Initialization). This reduces the need |
180 |
| - for any manual memory management by users, and automatically ensures |
181 |
| - that clean-up is always run, even when the thread panics. |
182 |
| -- ensure that any data stored behind a raw pointer is destroyed at the |
183 |
| - appropriate time. |
| 128 | +[intrinsics]: intrinsics.html |
0 commit comments