Description
Apologies for not having a minimal reproduction, but this was an extremely difficult bug to even isolate occurring inside some complicated AVX2 code.
The bug is causing the wrong values to be computed. Whether or not it occurs depends on the following conditions:
- With
target-cpu
unset, the bug does NOT occur in debug builds, but DOES occur with--release
- With
target-cpu=haswell
, the bug does NOT occur in--release
builds and both debug and release builds are OK
I can attempt to further isolate and reduce the problem, but there's a lot of spooky-action-at-a-distance happening making that rather difficult.
For now, here is the best reproduction I can provide:
EDIT: I've deleted the poly1305/avx2-bug
branch as there is now a much smaller repro, but so long as GitHub hasn't GC'd it here's the original commit:
RustCrypto/universal-hashes@7485010
git clone https://github.com/RustCrypto/universal-hashes
cd universal-hashes/poly1305
git checkout poly1305/avx2-bug
NOTE: if you git show
from here, I've included lots of notes in the latest commit about the bug in the commit message. The commit also contains comments indicating lines you can comment or uncomment to make the tests succeed or fail.
Commands to run which DON'T trigger the bug
cargo test donna_self_test1 -- --nocapture
RUSTFLAGS="-Ctarget-cpu=haswell" cargo test donna_self_test1 --release -- --nocapture
Commands to run which DO trigger the bug
NOTE: as this is a bug in the AVX2 backend, you'll need to run it on an AVX2-capable host to trigger the bug.
cargo test donna_self_test1 --release -- --nocapture
This test fails with a miscomputed result (as do all of the tests across the board if you run the whole suite):
thread 'donna_self_test1' panicked at 'assertion failed: `(left == right)`
left: `[3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]`,
right: `[254, 255, 255, 255, 255, 255, 239, 255, 255, 63, 0, 0, 0, 254, 255, 255]`', poly1305/tests/lib.rs:47:5
Things which mysteriously make the tests pass
The aforementioned cargo test ... --release ...
will pass if any of the following things which are documented in the 74850109 commit (git show
) message and comments introduced in that commit are changed:
- A commented out
dbg!
statement near the first observation of the miscompilation is uncommented (heisenbug!) - The
#[target_feature(enable = "avx2")]
attribute on thefinalize
function is commented out. This function is in a completely different module, hence my descriptions of "spooky action at a distance" (the function in which the bug is occurring is annotated#[inline(always)]
, but the bug still occurs if that attribute is commented out)
Meta
This bug is easily reproducible and occurs on all versions of the Rust compiler and all operating systems I've tried. I've reproduced it locally on macOS and it also occurred on Linux/Ubuntu via GitHub Actions.
Here are some compiler versions I've tried:
rustc 1.48.0 (7eac88abb 2020-11-16)
Latest nightly as of opening this ticket:
rustc 1.50.0-nightly (1700ca07c 2020-12-08)
It also broke in CI which tests it under the MSRV of 1.41.0.