Description
We have encountered a miscompile where stack-slot sharing is incorrect for the results of:
void sincosf(float x, float *sin, float *cos);
I've bisected it to llvmorg-20-init-6824-g3073c3c2290a, so this is a regression in llvm20.
Note that a fix for an apparently related issue was made in PR 118117 (llvmorg-20-init-15653-ga7dafea384a5), but this problem still persists.
Here is a reduced test-case that shows a run-time failure:
extern "C" {
int printf(const char *, ...);
float cosf(float);
float sinf(float);
float fabsf(float);
}
int n_errors = 0;
__attribute__((noinline))
void CheckClose(float &computed, float &expected, float &tolerance, int lnum) {
float diff = fabsf(computed - expected);
if (diff >= tolerance) {
printf("Expected %f +/ %f but was %f (failure on line %d)\n",
expected, tolerance, computed, lnum);
++n_errors;
} else {
printf("Expected %f +/ %f and was %f\n", expected, tolerance, computed);
}
}
float input_value = 0.33981341f;
float expected_pos_cos = 0.94281685f;
float expected_neg_cos = -0.94281685f;
float expected_neg_sin = -0.33331117f;
int main() {
float value = input_value; // 0.33981341f (input)
float local_cos = ::cosf(value); // 0.94281685f (computed)
float local_sin = ::sinf(value); // 0.33331117f (computed)
float close = 0.000010f;
{
float computed = local_cos;
float expected = expected_pos_cos;
CheckClose(computed, expected, close, __LINE__);
}
{
float computed = -local_sin;
float expected = expected_neg_sin;
CheckClose(computed, expected, close, __LINE__);
}
{
float computed = -local_cos; // 0.33331117f is passed, expected -0.94281685f
float expected = expected_neg_cos;
CheckClose(computed, expected, close, __LINE__);
}
printf("%d error(s). %s\n", n_errors, ((n_errors == 0) ? "PASS" : "FAIL"));
}
The problem only manifests (for me at least) when cosf
and sinf
are turned into builtins. That happens with the PS5 target (and PS4), but it doesn't happen for me with the target x86_64-unknown-linux-gnu
. So to see the run-time failure in a godbolt link, starting with C++ source doesn't demonstrate it. So I'm posting an llvm IR godbolt link that processes the IR at -O2
, and shows the run-time failure.
Note that dropping has-predecessor-max-steps
("llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp") to a low number (the default is 8192) suppresses the transformation, and so hides the failure. Using the current head of main
, dropping it to 12:
-O2 -mllvm -has-predecessor-max-steps=12
makes the problem latent.
The problem is still there in the head of main
(tested with llvmorg-21-init-12399-gc78e6bbd830a).