Open
Description
Current Architecture:
- Benchmarks are run using Bash and Ruby scripts at https://github.com/lampepfl/bench.
- Specifically, two scripts poll the GitHub API to identify newly merged pull requests or PR comments containing the string "test performance please."
- These scripts run on a manually maintained server.
- Benchmark configurations are YAML files and Bash scripts in bench/profiles.
Problems with the Current Architecture:
- Fragility: The scripts manually parse commit titles to determine which commits to benchmark. This is error-prone and has failed multiple times due to changes in how merge commit titles are formatted.
- Monitoring Issues: Error handling is often silent, alerts are not functional, and the logs are separate from other CI jobs, making it difficult to track issues.
- Maintenance Challenges: Scripts written in different languages are spread across various repositories and run on a stateful server, which complicates maintenance.
- Local Execution Difficulty: It is not straightforward to reproduce benchmark results locally with the current setup.
Proposed System:
- GitHub Actions: Use GitHub Actions to run the benchmarks (with a custom runner on a machine with a stable CPU frequency). This will provide easier access to logs, better visibility into failures, and enable alerting when benchmarks fail.
- JMH Benchmark Suite: Adopt a standard JMH benchmark suite to define benchmarks. This will simplify running benchmarks locally, allow the use of JMH features such as profilers and data export, and facilitate the addition of new benchmarks.
I have implemented a proof of concept for the proposed system at mbovel#10 and am currently testing it.
Other things:
- I don't plan to change how benchmarks data are stored (https://github.com/lampepfl/bench-data). CSV files are easy to process, to store and to read, and will stay of a manageable size.
- I also plan to improve the benchmarks visualizer by providing different data views. This is tracked in Improve the benchmarks visualizer #21720.
- We plan to add more benchmarks in the future.