Description
Right now the cycle time for a passing pull request is 3+ hours, which is up from about 2 hours a week ago, and up from ~1hr if I remember correctly from a few months ago. This isn't really an artifact of rust getting slower, but rather @bors working even harder at testing. It does look like there's some easy low-hanging fruit which could speed up cycle time by at least an hour or two.
Currently the slowest builds are *-nopt (3+ hours), *-all (~2 hours), and *-vg (~1.5 hours). The baseline build time seems to be ~1 hour, and I think that we can get back to that level of speed with these suggestions:
-
For *-nopt builds, build the compiler with optimizations, but don't run tests with optimizations. This could be enabled by Allow disabling optimizations in tests only #8450, and I believe that with the runtime now unoptimized for *-nopt builds we're seeing a huge slowdown.
-
Parallelize the *-all builds. From what I understand, these builds essentially target 4 separate builds on linux/mac: H64 -> T32, H64 -> T64, H32 -> T64, H32 -> T32 (where H == host word size, T == target word size). Two of these are already tested by other builders (H64 -> T64, H32 -> T32), and the remaining two don't really need to run on the same builder, they could be built in parallel.
Basically I think we could get the same test coverage by removing the *-all builders and introducing *-cross32 and *-cross64 builders instead (where the host is different from the target architecture)
-
Either resolve Investigate running tests under Address Sanitizer #749 (should run a lot faster than valgrind), or run only a subset of the tests under valgrind. This is kind of tricky to do either of these, but I think that in the immediate future it's more reasonable to run a subset of the tests in valgrind. I'm not sure if others would agree, but I'd think that we could run only the run-pass, libXtest, and run-fail suites on valgrind bots and get mostly the same signal as running valgrind over the entire build.
Perhaps the snapshot builders could run valgrind for the entire suite, but inevitably we'd get a failure which would then be difficult to debug (but I think would be worth it).
I think that with these three possibilities combined, we could get the cycle time back down to an hour and start processing a lot more builds in one day.
@graydon, @brson, would it be possible to configure the bots this way?