Description
Current behavior 😯
gix-attributes (in state::ValueRef) unsafely creates a &str
from a &[u8]
containing non-UTF8 data, with the justification that so long as nothing reads the &str and relies on it being UTF-8 in the &str, there is no UB:
// SAFETY: our API makes accessing that value as `str` impossible, so illformed UTF8 is never exposed as such.
The problem is that the non-UTF8 str
is exposed to outside code: first to the kstring
crate itself, which requires UTF-8 in its documentation and may have UB as a consequence of this, but also to serde
, where it propagates to e.g. serde_json
, serde_yaml
, etc., where the same problems occur.
As far as I know, this is not sound, and either is or can cause UB down the line in these places that can view the &str
.
Expected behavior 🤔
I think gix-attributes should probably use a Vec<u8>
or smallvec
or similar, at least until kstring
can support bytes.
Absolute worst case, one could at least add an unsafe
feature, so that code that opts out of unsafe
can get the slower Vec<u8>
-based implementation which is guaranteed to be sound.
Git behavior
N/A
Steps to reproduce 🕹
N/A