-
Notifications
You must be signed in to change notification settings - Fork 13.3k
ci: Bring back ninja for dist builders #103295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The primary reason for this is that make can result in a substantial under utilization of parallelism, mostly due to the submake structure preventing good dependency tracking and scheduling. In f758c7b (Debian 6 doesn't have ninja, so use make for the dist builds) llvm.ninja was disabled due to lack of distro package. This is no longer the case with the CentOS 7 base, so bring ninja back for a performance boost.
Now that we have brought ninja back to the installed base packages, we can use it for the initial Clang build as well.
(rust-highfive has picked a reviewer for you, use r? to override) |
⌛ Trying commit 354e95a with merge 431cb4d2a7b345b52167aa0f2d16473c8e0fb01b... |
If that's the case I suppose it won't be much different this time, though one benefit of this is that it's quite handy when you test CI builds on a high-core workstation (make can only achieve ~32 load average out of 128 cores). |
☀️ Try build successful - checks-actions |
Comparing the final build of LLVM (which doesn't hit the cache due to PGO) of this PR to a recent master build (https://github.com/rust-lang-ci/rust/actions/runs/3301120227/jobs/5447282596), this PR took 15m11s while the master build took 12m58s. It looks like the CI machines suffers a lot from noisy neighbors though, hence I don't think we'll be able to get a conclusion of significance out of the timings. Given the simplicity of doing this (we don't need extra build-from-source scripts unlike #96969) I still think this is worth doing, mainly for convenience of testing on many-core machines. |
@bors try (we should do multiple builds, preferably in rapid succession, to make sure that caches are primed) |
⌛ Trying commit 354e95a with merge 2bfc8954ac2c8143b4f5c0c1fcddffdb7101fc84... |
Sure, a successive build would hit the Docker and first-stage LLVM cache, but what are we trying to measure here? Given the variance of CI machines, we'll need hundreds of builds to get a statistically significant comparison, which I don't think is worth the effort. The timing I measured above is from the PGOed build, whose --instr-use flag explicitly disable sccache. I doubt cache is what that matters here. |
Yeah it's not about statistical significance, just to see the final |
While comparing times you should also check CPU on particular runner (I think it's still logged). Discrepancy between slowest and fastest machines is rather big on GHA. |
☀️ Try build successful - checks-actions |
OK, sccache is hitting now, but the Docker one doesn't. Maybe the cache key is different for try builds? |
@bors try From my experience it's just a matter of running the build 2-3 times in a quick succession to make all caches work. |
⌛ Trying commit 354e95a with merge 0c08aa9071086a229d1b9bd610d0f21617610848... |
☀️ Try build successful - checks-actions |
2h11m on Skylake, time looks fine. |
Any opinions on whether this should be pushed forward? |
We use ninja on every other dist builder, so I see no reason why we shouldn't here. @bors r+ rollup=never (because of the caching effects) |
⌛ Testing commit 354e95a with merge d3f9aa5c6e20bfc58af72d7b30457e4f5de97101... |
💥 Test timed out |
Should be faster with the docker image now primed... @bors retry |
☀️ Test successful - checks-actions |
Finished benchmarking commit (5ab7445): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
Any idea what happened to perf? Perhaps this disrupted LTO/PGO? |
Could be. @Kobzol, what do you think? |
Note that only the |
I have no idea why there should be regressions from switching to |
Looks like the revert shows reverting the performance regressions as well. Would it make sense to revert until we can figure out why? |
Yes, I think we should go for the revert. I did look into the CI logs and tried to figure out if it's LTO or PGO, but I'm completely clueless and it seems unlikely I'll be able to reland this in the predictable future. |
Revert "ci: Bring back ninja for dist builders" Reverts rust-lang#103295 because of the perf regression. r? `@cuviper`
ci: Bring back ninja for dist builders The primary reason for this is that make can result in a substantial under utilization of parallelism (noticed while testing on a workstation), mostly due to the submake structure preventing good dependency tracking and scheduling. In f758c7b (Debian 6 doesn't have ninja, so use make for the dist builds) llvm.ninja was disabled due to lack of distro package. This is no longer the case with the CentOS 7 base, so bring ninja back for a performance boost.
Revert "ci: Bring back ninja for dist builders" Reverts rust-lang/rust#103295 because of the perf regression. r? `@cuviper`
Revert "ci: Bring back ninja for dist builders" Reverts rust-lang/rust#103295 because of the perf regression. r? `@cuviper`
The primary reason for this is that make can result in a substantial under utilization of parallelism (noticed while testing on a workstation), mostly due to the submake structure preventing good dependency tracking and scheduling.
In f758c7b (Debian 6 doesn't have ninja, so use make for the dist builds) llvm.ninja was disabled due to lack of distro package. This is no longer the case with the CentOS 7 base, so bring ninja back for a performance boost.