Skip to content

Triage 2024 01 16 #1803

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 17, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
202 changes: 202 additions & 0 deletions triage/2024-01-16.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
# 2024-01-16 Triage Log

This week had some small regressions that did not warrant further investigation,
several of which were dismissed as being noise/blips in the data. There were
also a number of gains. (Don't get exicited about that 20.6% improvement, its an
measurement artifact from a temporary blip in the PR that immediately preceded
this week's triage.)

Triage done by **@pnkfelix**.
Revision range: [76101eec..f9c2421a](https://perf.rust-lang.org/?start=76101eecbe9aa80753664bbe637ad06d1925f315&end=f9c2421a2a6e34f3756900dd7d600704c08bfccb&absolute=false&stat=instructions%3Au)

**Summary**:

| (instructions:u) | mean | range | count |
|:----------------------------------:|:-----:|:---------------:|:-----:|
| Regressions ❌ <br /> (primary) | 0.7% | [0.6%, 0.7%] | 2 |
| Regressions ❌ <br /> (secondary) | 3.1% | [0.8%, 4.1%] | 9 |
| Improvements ✅ <br /> (primary) | -1.2% | [-20.6%, -0.2%] | 133 |
| Improvements ✅ <br /> (secondary) | -0.8% | [-7.3%, -0.1%] | 31 |
| All ❌✅ (primary) | -1.2% | [-20.6%, 0.7%] | 135 |


3 Regressions, 5 Improvements, 5 Mixed; 3 of them in rollups
55 artifact comparisons made in total

#### Regressions

Rollup of 10 pull requests [#119754](https://github.com/rust-lang/rust/pull/119754) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=ca663b06c5492ac2dde5e53cd11579fa8e4d68bd&end=d6affcf520091fd0f48df1a2b6bfcb9ef48e0f40&stat=instructions:u)

| (instructions:u) | mean | range | count |
|:----------------------------------:|:----:|:------------:|:-----:|
| Regressions ❌ <br /> (primary) | - | - | 0 |
| Regressions ❌ <br /> (secondary) | 2.4% | [2.4%, 2.4%] | 2 |
| Improvements ✅ <br /> (primary) | - | - | 0 |
| Improvements ✅ <br /> (secondary) | - | - | 0 |
| All ❌✅ (primary) | - | - | 0 |

* The 2 regressing (and secondary) benchmarks are tt-muncher debug {incr-full, full}. Its not transient.
* I've skimmed over the list of PR's in the rollup. None of them are obvious culprits here. I looked at the ones related to debuginfo (#118903) and to code-coverage (#119033 and #119681), but none of those seem likely to be to blame here
* Since this only affects a secondary benchmark, and only the instruction count (e.g. not cpu-clock:u nor wall-time for these two benchmarks), I do not think its worth further investigation and I'm going to mark it as triaged.

Exhaustiveness: use an `Option` instead of allocating fictitious patterns [#119688](https://github.com/rust-lang/rust/pull/119688) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=a2d9d73e608f1b24eba840c4fd2d68dbe3b65e01&end=0a8923361ec2a37fa341292c029ef7c6d0405d4b&stat=instructions:u)

| (instructions:u) | mean | range | count |
|:----------------------------------:|:----:|:------------:|:-----:|
| Regressions ❌ <br /> (primary) | - | - | 0 |
| Regressions ❌ <br /> (secondary) | 3.8% | [3.6%, 4.1%] | 6 |
| Improvements ✅ <br /> (primary) | - | - | 0 |
| Improvements ✅ <br /> (secondary) | - | - | 0 |
| All ❌✅ (primary) | - | - | 0 |

* This impacted the 6 variants of match-stress {incr-full,full} x {check,debug,opt}
* I think the impact on match-stress was probably well-anticipated, and within a reasonable range for a stress-test benchmark.
* Note that #119688 was a precursor to some further cleanup code (namely to remove the use of a local-arena within exhaustiveness checking).
* Marking as triaged.

never patterns: Check bindings wrt never patterns [#119610](https://github.com/rust-lang/rust/pull/119610) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=bfcc027a751595ba290c554f47907eaa3779f798&end=714b29a17ff5fa727c794bbb60bfd335f8e75d42&stat=instructions:u)

| (instructions:u) | mean | range | count |
|:----------------------------------:|:----:|:------------:|:-----:|
| Regressions ❌ <br /> (primary) | 0.3% | [0.3%, 0.4%] | 3 |
| Regressions ❌ <br /> (secondary) | - | - | 0 |
| Improvements ✅ <br /> (primary) | - | - | 0 |
| Improvements ✅ <br /> (secondary) | - | - | 0 |
| All ❌✅ (primary) | 0.3% | [0.3%, 0.4%] | 3 |

* This impacted 3 variants of unicode-normalization-0.1.19: debug incr-unchanged and check {incr-unchanged, incr-patched:println}.
* Interestingly, during two different try runs, those three variants were found to have improved by similar amounts by this PR.
* there's some weird interaction between that benchmark and the code paths impacted by this PR, and I do not think its worth investing effort in further investigation.
* marking as triaged.

#### Improvements

macro_rules: Add an expansion-local cache to span marker [#119693](https://github.com/rust-lang/rust/pull/119693) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=76101eecbe9aa80753664bbe637ad06d1925f315&end=0ee9cfd54db7b5f4be35f026588904500c866196&stat=instructions:u)

| (instructions:u) | mean | range | count |
|:----------------------------------:|:-----:|:---------------:|:-----:|
| Regressions ❌ <br /> (primary) | - | - | 0 |
| Regressions ❌ <br /> (secondary) | - | - | 0 |
| Improvements ✅ <br /> (primary) | -1.4% | [-20.5%, -0.2%] | 80 |
| Improvements ✅ <br /> (secondary) | -0.8% | [-1.9%, -0.3%] | 16 |
| All ❌✅ (primary) | -1.4% | [-20.5%, -0.2%] | 80 |

* the bitmaps changes (-20.5%, -17.9%, -13.1%) are all artifacts of returning to normal after a blip in the previous PR.

A more efficient slice comparison implementation for T: !BytewiseEq [#116846](https://github.com/rust-lang/rust/pull/116846) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=ae9d24de80b00b4158d1a29a212a6b02aeda0e75&end=190f4c96116a3b59b7de4881cfec544be0246d84&stat=instructions:u)

| (instructions:u) | mean | range | count |
|:----------------------------------:|:-----:|:--------------:|:-----:|
| Regressions ❌ <br /> (primary) | - | - | 0 |
| Regressions ❌ <br /> (secondary) | - | - | 0 |
| Improvements ✅ <br /> (primary) | -0.5% | [-0.9%, -0.2%] | 15 |
| Improvements ✅ <br /> (secondary) | -0.6% | [-0.6%, -0.6%] | 1 |
| All ❌✅ (primary) | -0.5% | [-0.9%, -0.2%] | 15 |

* it is too bad that work in PR #100124 stalled.

Remove a large amount of leb128-coded integers [#119791](https://github.com/rust-lang/rust/pull/119791) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=d73bd3fb3ba312f3e6b5af4d56d1161d37b71620&end=68acb393c5d2cff049b41981e35217a7e630f63a&stat=instructions:u)

| (instructions:u) | mean | range | count |
|:----------------------------------:|:-----:|:--------------:|:-----:|
| Regressions ❌ <br /> (primary) | - | - | 0 |
| Regressions ❌ <br /> (secondary) | 1.1% | [1.1%, 1.1%] | 1 |
| Improvements ✅ <br /> (primary) | -0.3% | [-0.3%, -0.2%] | 5 |
| Improvements ✅ <br /> (secondary) | -0.3% | [-0.5%, -0.1%] | 12 |
| All ❌✅ (primary) | -0.3% | [-0.3%, -0.2%] | 5 |

* the 1.1% hit is to deep-vector debug full. It may be transient; the history is pretty up-and-down at the time of this PR, and has settled at a lower level than where it was when this PR landed.
* in any case, the gains elsewhere, especially bootstrap, outweigh the loss to that one secondary benchmark. (Which ... I guess is what the rustc-perf bot now computes as well, since it categorized this as an Improvement rather than Mixed?)

Exhaustiveness: track overlapping ranges precisely [#119396](https://github.com/rust-lang/rust/pull/119396) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=bfd799f1a5a86d16e6b8caa2857bcb4aac6e0174&end=174e73a3f6df6f96ab453493796e461164dea94a&stat=instructions:u)

| (instructions:u) | mean | range | count |
|:----------------------------------:|:-----:|:--------------:|:-----:|
| Regressions ❌ <br /> (primary) | - | - | 0 |
| Regressions ❌ <br /> (secondary) | - | - | 0 |
| Improvements ✅ <br /> (primary) | -0.4% | [-1.7%, -0.2%] | 32 |
| Improvements ✅ <br /> (secondary) | - | - | 0 |
| All ❌✅ (primary) | -0.4% | [-1.7%, -0.2%] | 32 |


Rollup of 6 pull requests [#119889](https://github.com/rust-lang/rust/pull/119889) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=174e73a3f6df6f96ab453493796e461164dea94a&end=ce1f2ccf5a5ac9343623bd115a05e4151d93af0d&stat=instructions:u)

| (instructions:u) | mean | range | count |
|:----------------------------------:|:-----:|:--------------:|:-----:|
| Regressions ❌ <br /> (primary) | - | - | 0 |
| Regressions ❌ <br /> (secondary) | - | - | 0 |
| Improvements ✅ <br /> (primary) | -1.8% | [-1.9%, -1.7%] | 4 |
| Improvements ✅ <br /> (secondary) | -4.3% | [-7.4%, -1.3%] | 2 |
| All ❌✅ (primary) | -1.8% | [-1.9%, -1.7%] | 4 |


#### Mixed

Support async recursive calls (as long as they have indirection) [#117703](https://github.com/rust-lang/rust/pull/117703) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=387e7a5e42ac074e79a14361e82702a229a6aac8&end=dc641039d2b3f5c0894694e4b45f7c3951030685&stat=instructions:u)

| (instructions:u) | mean | range | count |
|:----------------------------------:|:-----:|:--------------:|:-----:|
| Regressions ❌ <br /> (primary) | - | - | 0 |
| Regressions ❌ <br /> (secondary) | 0.2% | [0.2%, 0.2%] | 1 |
| Improvements ✅ <br /> (primary) | -0.3% | [-0.4%, -0.3%] | 3 |
| Improvements ✅ <br /> (secondary) | - | - | 0 |
| All ❌✅ (primary) | -0.3% | [-0.4%, -0.3%] | 3 |

* this is weird, it looks like an inverse blip occurred on the preceding PR, where tt-muncher check incr-unchanged had a single point with -0.2% instruction-count, and then it preceding to "return to normal" on the succeeding PRs.
* (Its harder for me to explain away "inverse blips" ...)
* but at the same time, this does not seem like a significant regression by our usual metrics.
* marking as triaged.

Rollup of 9 pull requests [#119767](https://github.com/rust-lang/rust/pull/119767) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=be00c5a9b89161b7f45ba80340f709e8e41122f9&end=5876c8cdfd3df742c334d6447d44d760c77103b6&stat=instructions:u)

| (instructions:u) | mean | range | count |
|:----------------------------------:|:-----:|:--------------:|:-----:|
| Regressions ❌ <br /> (primary) | 1.3% | [0.4%, 2.3%] | 2 |
| Regressions ❌ <br /> (secondary) | 0.9% | [0.5%, 1.2%] | 2 |
| Improvements ✅ <br /> (primary) | - | - | 0 |
| Improvements ✅ <br /> (secondary) | -0.3% | [-0.3%, -0.3%] | 1 |
| All ❌✅ (primary) | 1.3% | [0.4%, 2.3%] | 2 |

* primary regressions: syn opt-full regressed by 2.3%, bitmaps check-incr-full by 0.35%. secondary regressions: coercions debug-full by 1.23%, ctfe-stress check-full by 0.51%
* from the overall history, it seems like syn opt-full returned to "normal" with later PRs that don't necessarily seem like they would have affected syn (e.g. PR #117449). bitmap check-incr-full's trend is likewise downward after this point.
* marking as triaged

Add assume into `NonZeroIntX::get` [#119452](https://github.com/rust-lang/rust/pull/119452) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=ce1f2ccf5a5ac9343623bd115a05e4151d93af0d&end=2319be8e265dd19973574eb581d28297baf44b11&stat=instructions:u)

| (instructions:u) | mean | range | count |
|:----------------------------------:|:-----:|:--------------:|:-----:|
| Regressions ❌ <br /> (primary) | 0.5% | [0.4%, 0.9%] | 4 |
| Regressions ❌ <br /> (secondary) | - | - | 0 |
| Improvements ✅ <br /> (primary) | -0.7% | [-0.7%, -0.7%] | 1 |
| Improvements ✅ <br /> (secondary) | - | - | 0 |
| All ❌✅ (primary) | 0.3% | [-0.7%, 0.9%] | 5 |

* scottmcm writes: "Instructions have a couple red in instruction counts for opt, but that's entirely reasonable for something intended to enable optimizations. Notably, the cycles are green, with no regressions. So I think this is fine."
* marking as triaged

Avoid some redundant work in GVN [#119439](https://github.com/rust-lang/rust/pull/119439) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=714b29a17ff5fa727c794bbb60bfd335f8e75d42&end=f9c2421a2a6e34f3756900dd7d600704c08bfccb&stat=instructions:u)

| (instructions:u) | mean | range | count |
|:----------------------------------:|:-----:|:--------------:|:-----:|
| Regressions ❌ <br /> (primary) | 0.8% | [0.6%, 0.9%] | 4 |
| Regressions ❌ <br /> (secondary) | - | - | 0 |
| Improvements ✅ <br /> (primary) | -0.7% | [-1.1%, -0.5%] | 4 |
| Improvements ✅ <br /> (secondary) | -0.4% | [-0.4%, -0.4%] | 1 |
| All ❌✅ (primary) | 0.0% | [-1.1%, 0.9%] | 8 |

* primary regressions are regex-1.5.5 debug-full, opt-incr-patched:Job, incr-full, and exa opt-full.
* the exa regression looks like a blip. The regex ones were predicted during a try run for the PR. I assume they were deemed acceptable as they are offset improvements elsewhere (or dismissed as noise?)
* marking as triaged.

Sandwich MIR optimizations between DSE. [#119672](https://github.com/rust-lang/rust/pull/119672) [(Comparison Link)](https://perf.rust-lang.org/compare.html?start=f9c2421a2a6e34f3756900dd7d600704c08bfccb&end=fa0dc208d0a34027c1d3cca7d47975d8238bcfde&stat=instructions:u)

| (instructions:u) | mean | range | count |
|:----------------------------------:|:-----:|:--------------:|:-----:|
| Regressions ❌ <br /> (primary) | 0.7% | [0.2%, 1.4%] | 14 |
| Regressions ❌ <br /> (secondary) | 0.5% | [0.2%, 2.7%] | 14 |
| Improvements ✅ <br /> (primary) | -1.0% | [-2.2%, -0.2%] | 31 |
| Improvements ✅ <br /> (secondary) | -0.9% | [-2.2%, -0.2%] | 10 |
| All ❌✅ (primary) | -0.4% | [-2.2%, 1.4%] | 45 |

* already marked as triaged by @lqd with the comment "As seen in the previous runs: some nice wins on bigger benchmarks, and overall gains outweigh the few losses."