Incorrect `size_hint()` on `EncodeUtf16`

I tried this code:

```rust
println!("{:?}", "12345678901234".encode_utf16().size_hint());

let mut it = "\u{101234}".encode_utf16();
it.next().unwrap();
println!("{:?}", it.size_hint());
```

I expected to see this happen:
```none
(5, Some(14))
(1, Some(1))
```

Instead, this happened:
```none
(4, Some(28))
(0, Some(0))
```

### Meta


`rustc --version --verbose`:
```
rustc 1.73.0-nightly (39f42ad9e 2023-07-19)
binary: rustc
commit-hash: 39f42ad9e8430a8abb06c262346e89593278c515
commit-date: 2023-07-19
host: x86_64-pc-windows-msvc
release: 1.73.0-nightly
LLVM version: 16.0.5
```

The reason is that the `EncodeUtf16` iterator calculates its size hint in terms of the contained `Chars` iterator size hint, assuming that each character can correspond to either 1 or 2 code units.

In the case that the iterator is NOT in the middle of a surrogate pair, this leads to too-low lower bounds and too high upper-bounds.
In the case that the iterator IS in the middle of a surrogate pair, the remaining code unit is not taken into account as the iterator has advanced past this point.

The actual calculation should be done in terms of the remaining bytes:

- The lower bound is achieved by assuming the remaining bytes consist of as many 3-byte sequences as possible, optionally followed by a 1 or 2-byte sequence, leading to a lower bound of `(bytes_remaining + 2) / 3`
- The upper bound is achieved by assuming the remaining bytes consist of 1-byte sequences, leading to an upper bound of `bytes_remaining`.

In the case of the iterator being positioned in the middle of a surrogate pair, both these values should be increased by 1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect `size_hint()` on `EncodeUtf16` #113897

Meta

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect size_hint() on EncodeUtf16 #113897

Description

Meta

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Incorrect `size_hint()` on `EncodeUtf16` #113897