Skip to content

char::encode_utf8 crash on nightly #82833

Closed
@davidhewitt

Description

@davidhewitt

On nightly 2021-03-05, with RUSTFLAGS="-Ccodegen-units=1 -Cinline-threshold=0 -Clink-dead-code", I am experiencing undefined behaviour which seems to be based around char::encode_utf8.

I ran this code:

fn make_string(ch: char) -> String {
    let mut bytes = [0u8; 4];
    ch.encode_utf8(&mut bytes).into()
}

fn main() {
    let ch = '😃';
    dbg!(ch);
    let string = make_string(ch);
    dbg!(string);
}

I expected to see the following output:

[src/bin/string_crash.rs:8] ch = '😃'
[src/bin/string_crash.rs:10] string = "😃"

I get the above output on stable 1.50.0, or on the same nightly version if I remove at least one of the three RUSTFLAGS listed above.

With all three flags present, on nightly 2021-03-05 I see the following output:

[src/bin/string_crash.rs:8] ch = '😃'
memory allocation of 140730017967032 bytes failed
Aborted

The exact bytes count varies, so this looks like UB to me.

Meta

rustc +nightly --version --verbose:

rustc 1.52.0-nightly (caca2121f 2021-03-05)
binary: rustc
commit-hash: caca2121ffe4cb47d8ea2d9469c493995f57e0b5
commit-date: 2021-03-05
host: x86_64-unknown-linux-gnu
release: 1.52.0-nightly
LLVM version: 12.0.0

Metadata

Metadata

Assignees

Labels

A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-bugCategory: This is a bug.I-unsoundIssue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/SoundnessICEBreaker-LLVMBugs identified for the LLVM ICE-breaker groupP-criticalCritical priorityT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.regression-from-stable-to-nightlyPerformance or correctness regression from stable to nightly.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions