Skip to content

Commit b42b9df

Browse files
authored
Merge pull request #826 from kinnison/rustup-incident-report
insiders: Add incident report for rustup 1.24.0
2 parents 47143b4 + 25ec5f0 commit b42b9df

File tree

1 file changed

+95
-0
lines changed

1 file changed

+95
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
---
2+
layout: post
3+
title: "Rustup 1.24.0 release incident report for 2021-04-27"
4+
author: Daniel Silverstone
5+
team: the Rustup team <https://www.rust-lang.org/governance/teams/dev-tools#wg-rustup>
6+
---
7+
8+
On 2021-04-27 at 15:09 UTC we released a new version of Rustup (1.24.0). At
9+
15:23 UTC we received a report that we had introduced a regression in the part
10+
of the code which is responsible for proxying the `rustfmt` and `cargo-fmt`
11+
portions of Rust toolchains. At 15:27 UTC we had confirmed and identified the
12+
cause of the problem, and while we worked on a fix, we reverted the released
13+
Rustup to version 1.23.1 - an action completed by 15:56 UTC.
14+
15+
This means that for approximately 47 minutes, CI jobs which used the code
16+
formatting features of Rust toolchains may have had spurious failures, and users
17+
who upgraded will have had to revert to 1.23.1. The revert process is designed
18+
to be as easy as upgrading was, meaning that users should not have had lingering
19+
issues.
20+
21+
## Root cause of the issue
22+
23+
In an effort to
24+
[reduce confusion when downloaded copies of `rustup-init.exe`are renamed][rcon]
25+
we merged a change which causes Rustup to report an error if an unknown name is
26+
used when invoking the binary.
27+
28+
[rcon]: https://github.com/rust-lang/rustup/issues/2286
29+
30+
Due to past complexities with `rustfmt` and `cargo-fmt` being binaries which
31+
tended to be distributed by `cargo install` rather than via
32+
`rustup component add` there is some intricate handling in Rustup's proxy
33+
management for those specific binaries. The fix for the above issue failed to
34+
include these corner case proxies in the check it undertook to ensure the caller
35+
hadn't used an unexpected binary name.
36+
37+
The 1.24.0 release had been staged at this point, however there was a problem
38+
with the low-memory installation pathways which required a fix, and at the time
39+
we incorrectly assessed that it was low-impact to rebase the release onto the
40+
new master branch which had not only the fix for the low-memory installation
41+
pathway but also the "refuse bad names" change for the above issue.
42+
43+
Subsequent testing of the release focussed almost entirely on the installation
44+
pathways, omitting to validate the proxy name verification code we had also
45+
acquired into the release. As a result, this regression slipped in.
46+
47+
## Resolution
48+
49+
The author of the validation PR correctly identified it as the root-cause of
50+
the regression, and the team discussed and decided that it was better to fix
51+
the problem properly than to simply revert the change out of the release.
52+
53+
The release team member who was helping with the release process began the
54+
revert to Rustup 1.23.1 while the fixes were developed. In addition an issue
55+
was filed around adding some tests around all the proxies (we currently test a
56+
subset which we believed to be representative). We subsequently staged a
57+
proposed 1.24.1 release to Rust's development stage and we have issued a [call
58+
for beta testers][beta] to confirm that we have not introduced any other
59+
regressions.
60+
61+
[beta]: https://internals.rust-lang.org/t/seeking-beta-testers-for-rustup-1-24-1/14582
62+
63+
## Lessons learned
64+
65+
The big lesson here is that while we've taken similar notes away from past
66+
releases of Rustup and other tooling, we've not yet managed to set up a proper
67+
beta-testing process for Rustup. As such we will be making changes to the
68+
Rustup release process to codify testing phases with the wider community.
69+
70+
## Long term changes to Rustup releases
71+
72+
In order to try and reduce the chance of this happening again, the [release
73+
process][rp] will be updated to include a public beta-testing phase for any non-
74+
purely-bugfix release and we intend to look into the possibility of a "nightly"
75+
Rustup release for a _small_ subset of platforms.
76+
77+
[rp]: https://github.com/rust-lang/rustup/blob/master/CONTRIBUTING.md#making-a-release
78+
79+
Finally we are hoping to work with the [release team][rt] to do what we can to
80+
unify the Rustup release process with the well oiled Rust release process
81+
though, due to the historical differences in how Rustup has been released, this
82+
will likely be a long term effort.
83+
84+
[rt]: https://www.rust-lang.org/governance/teams/release
85+
86+
## Action items
87+
88+
- [#2739]: Testing for proxying, including TOOLS and DUP_TOOLS
89+
- [#2741]: Release process should include explicit beta test period
90+
91+
[#2739]: https://github.com/rust-lang/rustup/issues/2739
92+
[#2741]: https://github.com/rust-lang/rustup/issues/2741
93+
94+
And as mentioned above, longer term we shall look to see what unification we can
95+
do between releasing Rustup and how the Rust release train runs.

0 commit comments

Comments
 (0)