|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "Rustup 1.24.0 release incident report for 2021-04-27" |
| 4 | +author: Daniel Silverstone |
| 5 | +team: the Rustup team <https://www.rust-lang.org/governance/teams/dev-tools#wg-rustup> |
| 6 | +--- |
| 7 | + |
| 8 | +On 2021-04-27 at 15:09 UTC we released a new version of Rustup (1.24.0). At |
| 9 | +15:23 UTC we received a report that we had introduced a regression in the part |
| 10 | +of the code which is responsible for proxying the `rustfmt` and `cargo-fmt` |
| 11 | +portions of Rust toolchains. At 15:27 UTC we had confirmed and identified the |
| 12 | +cause of the problem, and while we worked on a fix, we reverted the released |
| 13 | +Rustup to version 1.23.1 - an action completed by 15:56 UTC. |
| 14 | + |
| 15 | +This means that for approximately 47 minutes, CI jobs which used the code |
| 16 | +formatting features of Rust toolchains may have had spurious failures, and users |
| 17 | +who upgraded will have had to revert to 1.23.1. The revert process is designed |
| 18 | +to be as easy as upgrading was, meaning that users should not have had lingering |
| 19 | +issues. |
| 20 | + |
| 21 | +## Root cause of the issue |
| 22 | + |
| 23 | +In an effort to |
| 24 | +[reduce confusion when downloaded copies of `rustup-init.exe`are renamed][rcon] |
| 25 | +we merged a change which causes Rustup to report an error if an unknown name is |
| 26 | +used when invoking the binary. |
| 27 | + |
| 28 | +[rcon]: https://github.com/rust-lang/rustup/issues/2286 |
| 29 | + |
| 30 | +Due to past complexities with `rustfmt` and `cargo-fmt` being binaries which |
| 31 | +tended to be distributed by `cargo install` rather than via |
| 32 | +`rustup component add` there is some intricate handling in Rustup's proxy |
| 33 | +management for those specific binaries. The fix for the above issue failed to |
| 34 | +include these corner case proxies in the check it undertook to ensure the caller |
| 35 | +hadn't used an unexpected binary name. |
| 36 | + |
| 37 | +The 1.24.0 release had been staged at this point, however there was a problem |
| 38 | +with the low-memory installation pathways which required a fix, and at the time |
| 39 | +we incorrectly assessed that it was low-impact to rebase the release onto the |
| 40 | +new master branch which had not only the fix for the low-memory installation |
| 41 | +pathway but also the "refuse bad names" change for the above issue. |
| 42 | + |
| 43 | +Subsequent testing of the release focussed almost entirely on the installation |
| 44 | +pathways, omitting to validate the proxy name verification code we had also |
| 45 | +acquired into the release. As a result, this regression slipped in. |
| 46 | + |
| 47 | +## Resolution |
| 48 | + |
| 49 | +The author of the validation PR correctly identified it as the root-cause of |
| 50 | +the regression, and the team discussed and decided that it was better to fix |
| 51 | +the problem properly than to simply revert the change out of the release. |
| 52 | + |
| 53 | +The release team member who was helping with the release process began the |
| 54 | +revert to Rustup 1.23.1 while the fixes were developed. In addition an issue |
| 55 | +was filed around adding some tests around all the proxies (we currently test a |
| 56 | +subset which we believed to be representative). We subsequently staged a |
| 57 | +proposed 1.24.1 release to Rust's development stage and we have issued a [call |
| 58 | +for beta testers][beta] to confirm that we have not introduced any other |
| 59 | +regressions. |
| 60 | + |
| 61 | +[beta]: https://internals.rust-lang.org/t/seeking-beta-testers-for-rustup-1-24-1/14582 |
| 62 | + |
| 63 | +## Lessons learned |
| 64 | + |
| 65 | +The big lesson here is that while we've taken similar notes away from past |
| 66 | +releases of Rustup and other tooling, we've not yet managed to set up a proper |
| 67 | +beta-testing process for Rustup. As such we will be making changes to the |
| 68 | +Rustup release process to codify testing phases with the wider community. |
| 69 | + |
| 70 | +## Long term changes to Rustup releases |
| 71 | + |
| 72 | +In order to try and reduce the chance of this happening again, the [release |
| 73 | +process][rp] will be updated to include a public beta-testing phase for any non- |
| 74 | +purely-bugfix release and we intend to look into the possibility of a "nightly" |
| 75 | +Rustup release for a _small_ subset of platforms. |
| 76 | + |
| 77 | +[rp]: https://github.com/rust-lang/rustup/blob/master/CONTRIBUTING.md#making-a-release |
| 78 | + |
| 79 | +Finally we are hoping to work with the [release team][rt] to do what we can to |
| 80 | +unify the Rustup release process with the well oiled Rust release process |
| 81 | +though, due to the historical differences in how Rustup has been released, this |
| 82 | +will likely be a long term effort. |
| 83 | + |
| 84 | +[rt]: https://www.rust-lang.org/governance/teams/release |
| 85 | + |
| 86 | +## Action items |
| 87 | + |
| 88 | +- [#2739]: Testing for proxying, including TOOLS and DUP_TOOLS |
| 89 | +- [#2741]: Release process should include explicit beta test period |
| 90 | + |
| 91 | +[#2739]: https://github.com/rust-lang/rustup/issues/2739 |
| 92 | +[#2741]: https://github.com/rust-lang/rustup/issues/2741 |
| 93 | + |
| 94 | +And as mentioned above, longer term we shall look to see what unification we can |
| 95 | +do between releasing Rustup and how the Rust release train runs. |
0 commit comments