Skip to content

io::IoError should carry info on the invalid byte sequence on non-utf8 InvalidInput #12113

Closed
@pnkfelix

Description

@pnkfelix

If you feed in a byte stream that is almost utf-8 but has errors, a looped series of calls to fn read_char will eventually return an IoError with kind == InvalidResult.

Unfortunately, the returned IoError does not include any information about what the bytes were that were invalid (nor does it include information like how many bytes were read from the input before the error was encountered).

It seems like it would not be that bad to change IoError so that its detail field could be an Option<Either<~str, ~[u8]>>, or something along those lines, so that in this scenario, the InvalidResult would imply that one could look at the detail field to determine what the byte sequence was that caused the problem (and then the client code would have the option of substituting in a different character sequence specific to the byte sequence that failed).

(Alternatively, we could change IoErrorKind so that the InvalidResult variant carried an Option<~[u8]>, but then the IoErrorKind would no longer be a C-like enum.)

I believe that this is strictly more expressive than just mapping every replacement to a single replacement character, as is done by from_utf8_lossy (#12062).

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-enhancementCategory: An issue proposing an enhancement or a PR with one.T-libs-apiRelevant to the library API team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions