Description
The current behavior of ~str
is that it unilaterally rejects any invalid UTF-8 sequence (modulo #3787). Unfortunately, this opens up rust programs to denial-of-service attacks where maliciously crafted user input can cause unexpected task failure. Two cases that exist right now are invalid UTF-8 in the args list and in the environment. The mere presence of the invalid UTF-8 will cause os::args()
and os::env()
to immediately raise the str::not_utf8
condition, which is unlikely to be handled by callers of these functions.
I've suggested this before on the IRC channel, but I think it's worth suggesting again, that when parsing UTF-8 we should consider simply translating the first byte of any invalid sequence into the Replacement Character (U+FFFD) instead of failing outright.