-
Notifications
You must be signed in to change notification settings - Fork 290
clarify what is UB #149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clarify what is UB #149
Changes from 19 commits
dfb93b2
1dcafde
280761a
86d9e2c
3241c00
909b14c
f59eca2
738a338
0f51082
efb5086
df5ff63
c41d492
1f613d8
c73730b
bd6215e
7386b5c
864625f
64bf0a5
86a89ae
c5778a1
7703c18
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,18 +16,41 @@ to your program. You definitely *should not* invoke Undefined Behavior. | |
Unlike C, Undefined Behavior is pretty limited in scope in Rust. All the core | ||
language cares about is preventing the following things: | ||
|
||
* Dereferencing null, dangling, or unaligned pointers | ||
* Reading [uninitialized memory][] | ||
* Dereferencing (using the `*` operator on) dangling, or unaligned pointers, or | ||
wide pointers with invalid metadata (see below) | ||
* Breaking the [pointer aliasing rules][] | ||
* Producing invalid primitive values: | ||
* dangling/null references | ||
* null `fn` pointers | ||
* Unwinding into another language | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it really strictly defined to unwind between two rust libraries that are directly calling eachother via Specifically it might be that we just need to enumerate that you can only "exit a function via unwinding" if it has one of these 3 rust-specific calling conventions:
I think "rust-intrinsic" isn't supposed to unwind, but I might be wrong on that (perhaps the builtin operator impls have this ABI, somewhere?). But also "rust-intrinsic" kinda doesn't matter since those implementations are only provided by rustc. Only slightly matters if we care about helping devs assume raw re-exported intrinsics never unwind. But making an intrinsic a "normal" function would hardly be considered a breaking change! Also I'm guessing incompatible compilation flags like mixing panic=abort is subsumed by the "target features" bullet below? extern "platform-intrinsic" is also a thing but I think not relevant. I'm guessing it has something to do with libc functions that llvm is allowed to make implementation assumptions about, like malloc?) Also I thought we had automatic guards against unwinding from an extern fn. Is that not the case? (I only work on panic=abort software so I never worry about this issue...) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is currently de facto undefined to unwind out of a Rust function defined as |
||
* Causing a [data race][race] | ||
* Executing code compiled with target features that the current thread of execution does | ||
not support (see [`target_feature`][]) | ||
RalfJung marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* Producing invalid primitive values (either alone or as a field of a compound | ||
RalfJung marked this conversation as resolved.
Show resolved
Hide resolved
|
||
type such as `enum`/`struct`/array/tuple): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All of our bickering about unions does actually wrap back around to this point: is (we can probably gloss over this, but it is something to make clearer when we have a better answer) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The answer is "we don't know yet, see rust-lang/unsafe-code-guidelines#73". So yes, this is a good question, and one that I would prefer we could skip over for now. |
||
* a `bool` that isn't 0 or 1 | ||
* an undefined `enum` discriminant | ||
RalfJung marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* null `fn` pointers | ||
RalfJung marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* a `char` outside the ranges [0x0, 0xD7FF] and [0xE000, 0x10FFFF] | ||
* A non-utf8 `str` | ||
* Unwinding into another language | ||
* Causing a [data race][race] | ||
* a `!` (all values are invalid for this type) | ||
* dangling/unaligned references, references that do themselves point to | ||
invalid values, or wide references (to a dynamically sized type) with | ||
invalid metadata | ||
RalfJung marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* slice metadata is invalid if the slice has a total size larger than | ||
`isize::MAX` bytes in memory | ||
* `dyn Trait` metadata is invalid if it is not a pointer to a vtable for | ||
`Trait` that matches the actual dynamic trait the reference points to | ||
* a non-utf8 `str` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. reword for consistency:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or maybe we want to skip this entirely because this is just a library invariant? |
||
* an integer (`i*`/`u*`), floating point value (`f*`), or raw pointer read from | ||
[uninitialized memory][] | ||
* an invalid library type with custom invalid values, such as a `NonNull` or | ||
the `NonZero` family of types, that is 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. reword:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have to say "a type with custom invalid values that is one of those values" sounds rather awkward. I don't have a better proposal either, though. |
||
|
||
"Producing" a value happens any time a value is assigned, passed to a | ||
function/primitive operation or returned from a function/primitive operation. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. suggested reword and massive clarification: Many have trouble accepting the consequences of invalid values, so they merit some extra discussion. The claim being made here is a very strong one, so read carefully. A value is produced whenever it is assigned, passed to something, or returned from something. Keep in mind references get to assume their referents are valid, so you can't even create a reference to an invalid value. Additionally, uninitialized memory is always invalid, so you can't assign it to anything, pass it to anything, return it from anything, or take a reference to it. (Padding bytes are not technically part of a value's memory, and so may be left uninitialized.) In simple and blunt terms: you cannot ever even suggest the existence of an invalid value. No, it's not ok if you "don't use" or "don't read" the value. Invalid values are instant Undefined Behaviour. The only correct way to manipulate memory that could be invalid is with raw pointers using methods like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I applied most of your suggestions but this one is big enough that it is probably easier to hand the PR off to you. ;) I'd love to do a pass over what you got when you are done, if you don't mind. I like this new text, as usual in you very pointed style! One comment though:
That's not true for |
||
|
||
A reference/pointer is "dangling" if it is null or not all of the bytes it | ||
points to are part of the same allocation (so in particular they all have to be | ||
part of *some* allocation). The span of bytes it points to is determined by the | ||
pointer value and the size of the pointee type. As a consequence, if the span is | ||
empty, "dangling" is the same as "non-null". | ||
RalfJung marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
That's it. That's all the causes of Undefined Behavior baked into Rust. Of | ||
course, unsafe functions and traits are free to declare arbitrary other | ||
|
@@ -58,3 +81,4 @@ these problems are considered impractical to categorically prevent. | |
[pointer aliasing rules]: references.html | ||
[uninitialized memory]: uninitialized.html | ||
[race]: races.html | ||
[`target_feature`]: ../reference/attributes/codegen.html#the-target_feature-attribute |
Uh oh!
There was an error while loading. Please reload this page.