Skip to content

Possible miscompilation in >= 1.24.0 #49336

Closed
@bheisler

Description

@bheisler

A user reported this to Criterion.rs - bheisler/criterion.rs#133

The short version is this - when the user's test crate's benchmarks are run on 1.24.0 or newer, Criterion.rs calculates some values incorrectly even though the code looks correct to me. They claim to have reproduced this behavior on Arch Linux, Windows and Raspbian. I have only been able to confirm it on Windows. Each of us has confirmed this behavior on multiple machines.

I was initially reluctant to call this a miscompilation, but when I started investigating it, any change I made caused the bug to stop happening - inserting println's, or even pushing values into a vector and printing them later caused the bug to disappear. Eventually I tried simply adding a call to test::black_box, which should have no observable effect on the code except to inhibit certain compiler optimizations. That also caused the bug to stop occurring. It may still be a bug in my code, but if so I can't find it.

I've tried to create a minimal test case, but was unsuccessful. This bug is quite fragile.

Steps to reproduce:

  • Clone https://github.com/mbillingr/criterion-test.rs
  • Edit the Cargo.toml file to disable the default features for criterion (this isn't necessary but will save some compilation time)
  • With 1.23.0-x86_64-pc-windows-msvc, run cargo bench --bench my_benchmark -- --verbose.
    • Criterion.rs will run two benchmarks and report two iteration counts. The second should be significantly smaller than the first - this is the desired behavior. For example:
Benchmarking fib 1: Collecting 100 samples in estimated 1.0000 s (2469702500 iterations)
...
Benchmarking fib 2: Collecting 100 samples in estimated 1.0000 s (132784700 iterations)
  • Switch to 1.24.1 and run the benchmark command again. Note that this time, the second iteration count is roughly the same as the first (this is the unexpected behavior):
Benchmarking fib 1: Collecting 100 samples in estimated 1.0000 s (2518899600 iterations)
...
Benchmarking fib 2: Collecting 100 samples in estimated 1.0000 s (2514900000 iterations)
  • Optional: Switch to a nightly compiler and verify that the unexpected behavior persists. Clone the rustc_fix branch of https://github.com/japaric/criterion.rs and modify the Cargo.toml file of criterion-test.rs to use that instead. This patch merely enables the test feature/crate and inserts one call to test::black_box in routine.rs:warm_up. Verify that the expected behavior is restored.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCategory: This is a bug.P-highHigh priorityT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.regression-from-stable-to-stablePerformance or correctness regression from one stable version to another.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions