Description
I tried this code:
fn main() {
dbg!(option_env!("NON_UNICODE_ENV_VAR"));
}
compiler with NON_UNICODE_ENV_VAR=$'\xFF' rustc code.rs
(using Bash shell syntax, any non-Unicode environment variable value for NON_UNICODE_ENV_VAR
will work).
I expected to see this happen: a Some
value, as the documentation currently states "If the named environment variable is present at compile time, this will expand into an expression of type Option<&'static str>
whose value is Some
of the value of the environment variable".
Instead, this happened: The program outputted:
[code.rs:2:2] option_env!("NON_UNICODE_ENV_VAR") = None
Currently, the documentation doesn't consider the possibility of a non-Unicode environment variable value (as only Unicode values can be expressed as a &'static str), and the implementation emits None
, the same as for a non-existent environment variable.
There are three possibilities that I can think of:
- The current behaviour is a bug, and a compilation error should be emitted as it isn't possibly to represent a non-Unicode value as a
&'static str
, and the documentation should also be updated to reflect this. Note that this would be a breaking change to the compiler itself, but would be allowed as it would be a bugfix. - The current behaviour is not a bug, and the documentation should be updated to reflect the current behaviour.
- Same as 2, but the behaviour is unexpected enough to warrant a warn/deny-by-default lint.
Personally, I believe 1 to be the case, as the current behaviour is unexpected and leads to code patterns like let was_defined_at_compile_time = option_env!("VAR").is_some()
silently returning incorrect results. There doesn't seem to be a "correct" value to return, as None
is already taken by "does not exist" and Some(&str)
cannot represent a non-Unicode value. It seems unlikely anyone is relying on the current behaviour given Rust's very UTF-8 everywhere design, the fact the same result can be achieved much more easily by just not setting the environment variable, and the platform-specific nature of what non-Unicode values can exist (Windows UTF-16 unpaired surrogates vs. Unix invalid UTF-8 bytes). It therefore seems to be more likely to occur when someone has mistyped something in their shell or due to data corruption, which the current output would make rather confusing to track down (as it would seem to say the variable was missing, rather than set incorrectly).
Meta
rustc --version --verbose
:
commit-hash: 07dca489ac2d933c78d3c5158e3f43beefeb02ce
commit-date: 2024-02-04
host: x86_64-unknown-linux-gnu
release: 1.76.0
LLVM version: 17.0.6
I believe the decision of what to do is down to libs-api, so
@rustbot label +T-libs-api