Skip to content

Incorrect size_hint() on EncodeUtf16 #113897

Closed
@ajtribick

Description

@ajtribick

I tried this code:

println!("{:?}", "12345678901234".encode_utf16().size_hint());

let mut it = "\u{101234}".encode_utf16();
it.next().unwrap();
println!("{:?}", it.size_hint());

I expected to see this happen:

(5, Some(14))
(1, Some(1))

Instead, this happened:

(4, Some(28))
(0, Some(0))

Meta

rustc --version --verbose:

rustc 1.73.0-nightly (39f42ad9e 2023-07-19)
binary: rustc
commit-hash: 39f42ad9e8430a8abb06c262346e89593278c515
commit-date: 2023-07-19
host: x86_64-pc-windows-msvc
release: 1.73.0-nightly
LLVM version: 16.0.5

The reason is that the EncodeUtf16 iterator calculates its size hint in terms of the contained Chars iterator size hint, assuming that each character can correspond to either 1 or 2 code units.

In the case that the iterator is NOT in the middle of a surrogate pair, this leads to too-low lower bounds and too high upper-bounds.
In the case that the iterator IS in the middle of a surrogate pair, the remaining code unit is not taken into account as the iterator has advanced past this point.

The actual calculation should be done in terms of the remaining bytes:

  • The lower bound is achieved by assuming the remaining bytes consist of as many 3-byte sequences as possible, optionally followed by a 1 or 2-byte sequence, leading to a lower bound of (bytes_remaining + 2) / 3
  • The upper bound is achieved by assuming the remaining bytes consist of 1-byte sequences, leading to an upper bound of bytes_remaining.

In the case of the iterator being positioned in the middle of a surrogate pair, both these values should be increased by 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-iteratorsArea: IteratorsC-bugCategory: This is a bug.T-libsRelevant to the library team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions