Should we have a language concept of erroneous behavior?

Defining "erroneous behavior" as an operation that has a defined result (does not cause UB) but is still considered incorrect for a program to perform, endorsing sanitizing environments (such as Miri, Valgrind, or CHERI) diagnosing the presence of erroneous behavior and halting program execution.

I would expect the cases of erroneous behavior to be fairly limited, but it seems like it could be beneficial for those cases where an operation is defined not because we're endorsing doing it but because making it undefined instead would be worse. Potential cases include:

- Modifying a `let` bound value without `mut` and lacking any internal/shared mutability.[^1]
- Writing through a shared reference to bytes *next to* but not *covered by* `UnsafeCell`.[^2]
- Relying on exposed provenance instead of using a strict provenance compatible API.[^3]
- Retagging bytes as a type that they aren't a valid instance of (e.g. reference to uninit).[^4]
- Cases of transmute by function call ABI that control flow integrity doesn't like allowing.[^5]
- Diagnosable library UB that isn't immediately elevated to language level UB.[^6]
- Runtime conditions like `Arc` reference count exceeding its maximum and aborting.[^7]

[^1]: Miri could in theory diagnose this similarly to how the immutability of `static` places is enforced. However, the optimization potential is questionable, and delayed initialization of `let` bindings makes it less straightforward, since the place is actually just mutable until it isn't.

[^2]: Stacked Borrows prohibits this for structs/tuples/arrays, but Tree Borrows [tags the full reference range uniformly based on any use of `UnsafeCell`.](https://www.ralfj.de/blog/2023/06/02/tree-borrows.html#fnref:1)

[^3]: Miri already warns when this occurs. IIUC CHERI deterministically segfaults the process when trying to read/write through a spoofed pointer. This could also apply to other fun code crimes like abusing non-pointer-layout carriers of provenance in a way that breaks on CHERI.

[^4]: Making reference retagging depend on the contents of the retagged memory appears to have no optimization benefits and would still require the concept of shallow validity to exist. However, it could be beneficial to assign blame to the creator of an unsafe reference than only diagnosing the symptom once a read occurs.

[^5]: I don't know many details here, but IIRC CFI checks want to catch pointer type mismatch and Rust would prefer pointer ABI to only care about the unsized tail kind.

[^6]: This would imply that `cfg(ub_checks)` is a kind of lightly sanitizing environment.

[^7]: Okay, this one is definitely a stretch and is just the runtime behavior of the code as written, but it provides a framework for interpreting the choice between causing an abort or saturating and leaking like the linux kernel would prefer.

(Please don't discuss whether these examples should be allowed or not here; use the issue for each case for that. This issue should focus on whether this is a class of behavior we want to officially recognize as defined but erroneous.)

If UB is an Abstract Machine error, you could think of erroneous behavior as an AM warning. Sanitizers could always choose to diagnose even without any "permission" from the spec, but this would still be a "false positive," and people tend to write code that relies on doing a thing when you tell them that doing the thing is allowed, even if it's discouraged to do so.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we have a language concept of erroneous behavior? #512

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Should we have a language concept of erroneous behavior? #512

Description

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions