Condition for handling malformed UTF-8; also an interface to iconv

Currently even this simple `cat` program:

```
use io::ReaderUtil;
fn main() {
    for io::stdin().each_line |line| { io::println(line); }
}
```

...fails on the broken or invalid UTF-8 strings (or possibly in other character encodings, as this example illustrates):

```
$ echo 깨진 글자 | iconv -f utf-8 -t cp949 | ./test
rust: task failed at 'Assertion is_utf8(vv) failed', [...]/rust/src/libcore/str.rs:50
rust: domain main @0x7fcf32815e10 root task failed
```

...due to the byte sequence is assumed to be in UTF-8 (which is not). But there is currently no standard way to fix broken UTF-8 strings by replacing offending substrings by some other valid UTF-8, so it is hard to fix this kind of bugs.

This issue is ultimately linked to the general character encoding handling (libiconv binding, perhaps?) and a strict distinction between byte sequence and Unicode (UTF-8) string. I found Python's approach reasonable (bytes and str are separated, converted to each other via `encode` and `decode` methods, normal file `open` reads bytes, `codecs.open` with an encoding converts them to str), but I'm really not sure about the actual interface.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Condition for handling malformed UTF-8; also an interface to iconv #4837

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Condition for handling malformed UTF-8; also an interface to iconv #4837

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions