Reduce builder size of jobs that take less than an hour #124985

dpaoliello · 2024-05-10T19:40:19Z

The current longest build time is ~2hr for the dist-x86_64-linux-alt build. This is already on a 16-core builder, so we can't make it any faster (by throwing more hardware at it).

Given that overall build times will be at least 2hrs, we can reduce build costs by reducing the builder size for any job that takes less than 1hr since it will still complete before dist-x86_64-linux-alt does.

Note that scaling isn't linear, halving the core count increases end-to-end build times by about 25-50%. In this sample build arm-android went from ~52m to 1h 5m and dist-arm-linux went from ~55m to 1h 17m (then failed due to missing metrics).

Current job builder sizes and times and proposed new sizes:

Job	Size	Proposed	Run 1	Run 2	Run 3	Run 4
aarch64-gnu	-		1h 9m 1s	1h 8m 47s	1h 8m 45s	1h 9m 6s
arm-android	8c	4c	52m 32s	52m 38s	51m 30s	53m 13s
armhf-gnu	8c	4c	37m 30s	37m 40s	38m 41s	37m 56s
dist-aarch64-linux	8c	4c	57m 11s	56m 48s	55m 53s	56m 19s
dist-android	8c	4c	24m 37s	25m 13s	25m 15s	24m 17s
dist-arm-linux	16c	8c	53m 34s	55m 11s	56m 1s	54m 29s
dist-armhf-linux	8c	4c	42m 1s	43m 32s	43m 27s	41m 55s
dist-armv7-linux	8c	4c	44m 51s	44m 35s	43m 34s	46m 2s
dist-i586-gnu-i586-i686-musl	8c	4c	37m 59s	37m 56s	38m 4s	38m 24s
dist-i686-linux	8c	4c	52m 20s	51m 3s	52m 53s	50m 38s
dist-loongarch64-linux	8c	4c	40m 39s	40m 20s	41m 6s	40m 44s
dist-ohos	8c	4c	25m 5s	24m 34s	25m 18s	23m 40s
dist-powerpc-linux	8c	4c	42m 31s	43m 53s	42m 35s	41m 56s
dist-powerpc64-linux	8c	4c	42m 52s	44m 36s	45m 32s	43m 51s
dist-powerpc64le-linux	8c	4c	43m 41s	44m 11s	43m 2s	44m 21s
dist-riscv64-linux	8c	4c	41m 25s	42m 41s	41m 52s	43m 47s
dist-s390x-linux	8c	4c	46m 48s	47m 18s	47m 27s	46m 49s
dist-various-1	8c	4c	42m 14s	43m 20s	43m 20s	41m 41s
dist-various-2	8c	4c	36m 18s	38m 15s	37m 41s	39m 28s
dist-x86_64-freebsd	8c	4c	39m 21s	39m 40s	40m 1s	40m 2s
dist-x86_64-illumos	8c	4c	45m 35s	46m 43s	46m 2s	46m 4s
dist-x86_64-linux	16c		1h 53m 10s	1h 51m 15s	1h 52m 18s	1h 52m 26s
dist-x86_64-linux-alt	16c		2h 3m 33s	2h 3m 31s	2h 4m 12s	2h 2m 21s
dist-x86_64-musl	8c		1h 5m 42s	1h 6m 13s	1h 7m 49s	1h 6m 6s
dist-x86_64-netbsd	8c	4c	40m 4s	39m 48s	40m 16s	39m 43s
i686-gnu	8c		1h 13m 38s	1h 13m 39s	1h 13m 48s	1h 13m 12s
i686-gnu-nopt	8c		1h 17m 44s	1h 18m 14s	1h 19m 55s	1h 18m 44s
mingw-check	4c		28m 15s	27m 39s	28m 36s	28m 38s
test-various	8c	4c	37m 45s	37m 17s	38m 26s	38m 11s
x86_64-gnu	4c		1h 34m 1s	1h 31m 51s	1h 30m 35s	1h 32m 53s
x86_64-gnu-stable	4c		1h 28m 26s	1h 28m 11s	1h 29m 40s	1h 46m 28s
x86_64-gnu-aux	4c		1h 33m 32s	1h 31m 57s	1h 34m 8s	1h 32m 57s
x86_64-gnu-integration	8c		1h 22m 2s	1h 20m 14s	1h 19m 46s	1h 21m 24s
x86_64-gnu-debug	8c	4c	52m 41s	53m 40s	51m 51s	56m 9s
x86_64-gnu-distcheck	8c		1h 9m 14s	1h 5m 31s	1h 6m 29s	1h 5m 50s
x86_64-gnu-llvm-18	8c		1h 39m 47s	1h 37m 57s	1h 38m 40s	1h 37m 38s
x86_64-gnu-llvm-17	8c		1h 41m 50s	1h 45m 43s	1h 45m 4s	1h 43m 4s
x86_64-gnu-nopt	4c		1h 20m 42s	1h 21m 38s	1h 20m 4s	1h 22m 11s
x86_64-gnu-tools	8c		1h 5m 0s	1h 5m 30s	1h 3m 1s	1h 3m 20s
dist-x86_64-apple	xl		1h 35m 1s	1h 39m 57s	2h 2m 31s	1h 47m 37s
dist-apple-various	xl		1h 18m 54s	1h 22m 31s	1h 13m 19s	1h 38m 18s
x86_64-apple-1	xl		1h 32m 8s	1h 40m 12s	1h 51m 28s	1h 40m 26s
x86_64-apple-2	xl		1h 0m 32s	1h 4m 5s	1h 9m 0s	1h 7m 17s
dist-aarch64-apple	m1		1h 3m 9s	1h 1m 14s	1h 2m 6s	1h 2m 24s
aarch64-apple	m1		53m 38s	1h 1m 5s	1h 3m 15s	1h 6m 11s
x86_64-msvc	8c		1h 27m 48s	1h 29m 38s	1h 29m 55s	1h 28m 4s
i686-msvc	8c		1h 38m 28s	1h 34m 7s	1h 39m 19s	1h 39m 28s
x86_64-msvc-ext	8c		1h 44m 5s	1h 38m 40s	1h 45m 21s	1h 44m 19s
i686-mingw	8c		1h 49m 57s	1h 45m 1s	1h 52m 4s	1h 51m 4s
x86_64-mingw	8c		1h 44m 2s	1h 37m 36s	1h 49m 58s	1h 47m 5s
dist-x86_64-msvc	8c		1h 57m 14s	1h 49m 43s	1h 52m 53s	1h 52m 35s
dist-i686-msvc	8c		1h 8m 5s	1h 4m 9s	1h 9m 26s	1h 12m 0s
dist-aarch64-msvc	8c		1h 18m 40s	1h 14m 4s	1h 22m 1s	1h 19m 6s
dist-i686-mingw	8c		1h 15m 36s	1h 14m 36s	1h 16m 38s	1h 16m 2s
dist-x86_64-mingw	8c		1h 11m 54s	1h 16m 12s	1h 16m 54s	1h 18m 2s
dist-x86_64-msvc-alt	8c		1h 11m 17s	1h 10m 0s	1h 11m 8s	1h 13m 14s

rustbot · 2024-05-10T19:40:27Z

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

matthiaskrgr · 2024-05-10T21:28:35Z

Hmm I would be careful.
Lets say we have a 8c runner that fails after 30 minutes now (regular ci failure) and all jobs gets cancelled.

If we make it slower/reduce to 4 cores, we get to know about the same test failure 60 or 50 minutes after start instead of 30, which means CI runs now $num_jobs x 30 minutes longer in total, which may be more expensive?

dpaoliello · 2024-05-10T22:05:18Z

Hmm I would be careful. Lets say we have a 8c runner that fails after 30 minutes now (regular ci failure) and all jobs gets cancelled.

If we make it slower/reduce to 4 cores, we get to know about the same test failure 60 or 50 minutes after start instead of 30, which means CI runs now $num_jobs x 30 minutes longer in total, which may be more expensive?

That's a reasonable concern.

One thing to note is that halving the core count doesn't double the time that it takes to run the build - I was seeing regressions closer to 25-50% (although if you want to do a try then we can get better numbers).

But, yes, it's reasonable to expect that if there were a failure in one of these jobs we may see it 5-20mins later than previously.

Looking at the rust-lang-ci queue for the auto and try branches, there are currently 6,071 successful runs and 2,268 failures, so 8:3 success to failure ratio. Looking at the most recent 25 failures, 8 failed in <10min and 14 in <30min, so failures usually happen early. So, the question is if the 11 late failures (assuming they were all for these particular jobs) would have eaten away at the saving from the ~29 successful builds (based on the success:failure ratio).

Mark-Simulacrum · 2024-05-19T23:47:51Z

@bors r+

This seems broadly reasonable, and we can always revert if it gives us significant problems.

bors · 2024-05-19T23:47:54Z

📌 Commit 1b5d91b has been approved by Mark-Simulacrum

It is now in the queue for this repository.

bors · 2024-05-20T11:31:01Z

⌛ Testing commit 1b5d91b with merge 44d679b...

bors · 2024-05-20T13:36:28Z

☀️ Test successful - checks-actions
Approved by: Mark-Simulacrum
Pushing 44d679b to master...

rust-timer · 2024-05-20T15:43:24Z

Finished benchmarking commit (44d679b): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary -4.4%, secondary -2.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-4.4%	[-4.4%, -4.4%]	1
Improvements ✅ (secondary)	-2.1%	[-2.1%, -2.1%]	1
All ❌✅ (primary)	-4.4%	[-4.4%, -4.4%]	1

Cycles

Results (secondary -3.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-3.5%	[-5.1%, -1.9%]	2
All ❌✅ (primary)	-	-	0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 670.321s -> 671.145s (0.12%)
Artifact size: 316.18 MiB -> 316.05 MiB (-0.04%)

rustbot assigned Mark-Simulacrum May 10, 2024

rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. labels May 10, 2024

This comment has been minimized.

Sign in to view

Reduce size of builders that take less than an hour

1b5d91b

dpaoliello force-pushed the rebalance branch from 3f98796 to 1b5d91b Compare May 10, 2024 21:19

dpaoliello marked this pull request as ready for review May 10, 2024 21:19

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 19, 2024

bors added the merged-by-bors This PR was explicitly merged by bors. label May 20, 2024

bors merged commit 44d679b into rust-lang:master May 20, 2024
7 checks passed

rustbot added this to the 1.80.0 milestone May 20, 2024

dpaoliello deleted the rebalance branch May 20, 2024 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce builder size of jobs that take less than an hour #124985

Reduce builder size of jobs that take less than an hour #124985

dpaoliello commented May 10, 2024 •

edited

Loading

rustbot commented May 10, 2024

This comment has been minimized.

matthiaskrgr commented May 10, 2024

dpaoliello commented May 10, 2024

Mark-Simulacrum commented May 19, 2024

bors commented May 19, 2024

bors commented May 20, 2024

bors commented May 20, 2024

rust-timer commented May 20, 2024

Reduce builder size of jobs that take less than an hour #124985

Reduce builder size of jobs that take less than an hour #124985

Conversation

dpaoliello commented May 10, 2024 • edited Loading

rustbot commented May 10, 2024

This comment has been minimized.

matthiaskrgr commented May 10, 2024

dpaoliello commented May 10, 2024

Mark-Simulacrum commented May 19, 2024

bors commented May 19, 2024

bors commented May 20, 2024

bors commented May 20, 2024

rust-timer commented May 20, 2024

Overall result: no relevant changes - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Binary size

dpaoliello commented May 10, 2024 •

edited

Loading