Skip to content

Should we have a language concept of erroneous behavior? #512

Open
@CAD97

Description

@CAD97

Defining "erroneous behavior" as an operation that has a defined result (does not cause UB) but is still considered incorrect for a program to perform, endorsing sanitizing environments (such as Miri, Valgrind, or CHERI) diagnosing the presence of erroneous behavior and halting program execution.

I would expect the cases of erroneous behavior to be fairly limited, but it seems like it could be beneficial for those cases where an operation is defined not because we're endorsing doing it but because making it undefined instead would be worse. Potential cases include:

  • Modifying a let bound value without mut and lacking any internal/shared mutability.1
  • Writing through a shared reference to bytes next to but not covered by UnsafeCell.2
  • Relying on exposed provenance instead of using a strict provenance compatible API.3
  • Retagging bytes as a type that they aren't a valid instance of (e.g. reference to uninit).4
  • Cases of transmute by function call ABI that control flow integrity doesn't like allowing.5
  • Diagnosable library UB that isn't immediately elevated to language level UB.6
  • Runtime conditions like Arc reference count exceeding its maximum and aborting.7

(Please don't discuss whether these examples should be allowed or not here; use the issue for each case for that. This issue should focus on whether this is a class of behavior we want to officially recognize as defined but erroneous.)

If UB is an Abstract Machine error, you could think of erroneous behavior as an AM warning. Sanitizers could always choose to diagnose even without any "permission" from the spec, but this would still be a "false positive," and people tend to write code that relies on doing a thing when you tell them that doing the thing is allowed, even if it's discouraged to do so.

Footnotes

  1. Miri could in theory diagnose this similarly to how the immutability of static places is enforced. However, the optimization potential is questionable, and delayed initialization of let bindings makes it less straightforward, since the place is actually just mutable until it isn't.

  2. Stacked Borrows prohibits this for structs/tuples/arrays, but Tree Borrows tags the full reference range uniformly based on any use of UnsafeCell.

  3. Miri already warns when this occurs. IIUC CHERI deterministically segfaults the process when trying to read/write through a spoofed pointer. This could also apply to other fun code crimes like abusing non-pointer-layout carriers of provenance in a way that breaks on CHERI.

  4. Making reference retagging depend on the contents of the retagged memory appears to have no optimization benefits and would still require the concept of shallow validity to exist. However, it could be beneficial to assign blame to the creator of an unsafe reference than only diagnosing the symptom once a read occurs.

  5. I don't know many details here, but IIRC CFI checks want to catch pointer type mismatch and Rust would prefer pointer ABI to only care about the unsized tail kind.

  6. This would imply that cfg(ub_checks) is a kind of lightly sanitizing environment.

  7. Okay, this one is definitely a stretch and is just the runtime behavior of the code as written, but it provides a framework for interpreting the choice between causing an abort or saturating and leaking like the linux kernel would prefer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions