Skip to content

remove language-level UB for non-UTF-8 str #71033

Closed
rust-lang/reference
#792
@RalfJung

Description

@RalfJung

This is the Rust-side issue for rust-lang/reference#792 just so that we can use fcpbot. The change description follows.

Ever since Rust 1.0, the reference said that a non-UTF-8 str causes immediate UB. In terms of today's terminology, that means that str has a validity invariant of being valid UTF-8.

However, that seems unnecessary: the compiler does not actually exploit this, nor is there any clear way it could exploit this. Making UTF-8 a library-level safety invariant is more than enough for everything str does. Most likely, it was made a validity invariant because we had not yet properly teased apart those two concepts when the document was initially written.

This is also the conclusion that the UCG WG arrived at in rust-lang/unsafe-code-guidelines#78.

I therefore propose we remove the UTF-8 clause from the language spec, so that str will have the same validity invariant as [u8].

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-UnicodeArea: UnicodeC-enhancementCategory: An issue proposing an enhancement or a PR with one.T-langRelevant to the language team, which will review and decide on the PR/issue.disposition-mergeThis issue / PR is in PFCP or FCP with a disposition to merge it.finished-final-comment-periodThe final comment period is finished for this PR / Issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions