Skip to content

TypeFoldable/TypeVisitable tidy up #117896

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

eggyal
Copy link
Contributor

@eggyal eggyal commented Nov 14, 2023

I'm breaking #117726 into more manageable, logical chunks. This is the first of those, and just makes some minor/tidying refactors in preparation for the chunks to follow.

r? types

Rather than being concerned with internals of ExternalConstraintsData
and PredefinedOpaquesData, we derive traversable impls for those types
to which we then delegate.
UnevaluatedConst is not a type of any interest to folders or visitors,
and therefore should not be "super visitable" at all.
Previously only enforced via comment, now via negative impls.
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-trait-system-refactor The Rustc Trait System Refactor Initiative (-Znext-solver) labels Nov 14, 2023
@eggyal
Copy link
Contributor Author

eggyal commented Nov 14, 2023

r? lcnr

@rustbot rustbot assigned lcnr and unassigned BoxyUwU Nov 14, 2023
@lcnr
Copy link
Contributor

lcnr commented Nov 14, 2023

r=me after perf

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 14, 2023
@bors
Copy link
Collaborator

bors commented Nov 14, 2023

⌛ Trying commit dce79e4 with merge a5f77ae...

bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 14, 2023
TypeFoldable/TypeVisitable tidy up

I'm breaking rust-lang#117726 into more manageable, logical chunks.  This is the first of those, and just makes some minor/tidying refactors in preparation for the chunks to follow.

r? types
@bors
Copy link
Collaborator

bors commented Nov 14, 2023

☀️ Try build successful - checks-actions
Build commit: a5f77ae (a5f77ae56cf37de2b9c2f162d50cbe34a0623e7b)

1 similar comment
@bors
Copy link
Collaborator

bors commented Nov 14, 2023

☀️ Try build successful - checks-actions
Build commit: a5f77ae (a5f77ae56cf37de2b9c2f162d50cbe34a0623e7b)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (a5f77ae): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.3% [0.2%, 0.3%] 9
Regressions ❌
(secondary)
0.7% [0.2%, 1.3%] 15
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.7% [-0.7%, -0.7%] 2
All ❌✅ (primary) 0.3% [0.2%, 0.3%] 9

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.6% [1.1%, 5.7%] 4
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 674.599s -> 674.341s (-0.04%)
Artifact size: 311.12 MiB -> 311.37 MiB (0.08%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Nov 14, 2023
@eggyal
Copy link
Contributor Author

eggyal commented Nov 22, 2023

I've had a good play with cachegrind, and think this latest commit should resolve the perf regression.

@lcnr
Copy link
Contributor

lcnr commented Nov 23, 2023

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 23, 2023
@bors
Copy link
Collaborator

bors commented Nov 23, 2023

⌛ Trying commit 35e3019 with merge b57fd59...

bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 23, 2023
TypeFoldable/TypeVisitable tidy up

I'm breaking rust-lang#117726 into more manageable, logical chunks.  This is the first of those, and just makes some minor/tidying refactors in preparation for the chunks to follow.

r? types
@bors
Copy link
Collaborator

bors commented Nov 23, 2023

☀️ Try build successful - checks-actions
Build commit: b57fd59 (b57fd597f8ebdc1056806885939796c4be726cda)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (b57fd59): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.4% [0.2%, 0.9%] 34
Regressions ❌
(secondary)
0.6% [0.3%, 1.5%] 29
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.4% [0.2%, 0.9%] 34

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.9% [-1.9%, -1.9%] 1
All ❌✅ (primary) - - 0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.7% [-2.2%, -0.7%] 3
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 677.592s -> 678.36s (0.11%)
Artifact size: 313.72 MiB -> 313.81 MiB (0.03%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 23, 2023
Copy link
Contributor

@lcnr lcnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

realized i have a pending review here 😅

do you feel happy with the perf changes and want me to review? I already looked at this before and was happy with the changes I think?

/// respectively folded (in definition order) using the `TypeFoldable` implementation for its
/// type. However, if a field of a struct or of an enum variant is annotated with
/// `#[type_foldable(identity)]` then that field will retain its incumbent value (and its type
/// is not required to implement `TypeFoldable`). However use of this attribute is dangerous
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// is not required to implement `TypeFoldable`). However use of this attribute is dangerous
/// is not required to implement `TypeFoldable`). Use of this attribute is dangerous

/// Each field of the struct or enum variant will be visited (in definition order) using the
/// `TypeVisitable` implementation for its type. However, if a field of a struct or of an enum
/// variant is annotated with `#[type_visitable(ignore)]` then that field will not be visited
/// (and its type is not required to implement `TypeVisitable`). However use of this attribute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// (and its type is not required to implement `TypeVisitable`). However use of this attribute
/// (and its type is not required to implement `TypeVisitable`). Use of this attribute

@lcnr lcnr added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 29, 2023
@lcnr
Copy link
Contributor

lcnr commented Nov 29, 2023

looking over it I would expect d0c9ba3 to be the cause of the regression 🤔 incredibly hot code so small changes can have a stupid impact here. I would expect that the additional match in GenericArgKind::pack is causing the regression 🤔

@eggyal
Copy link
Contributor Author

eggyal commented Nov 29, 2023

That was my thought too, hence the latest commit tried to ensure the delegation is inlined so that the match can get optimised away. That's what I appeared to have achieved locally, at least - doesn't appear to have the same result in CI though.

Sorry, hadn't spent much more time thinking about it yet - but still do intend to.

@lcnr
Copy link
Contributor

lcnr commented Nov 29, 2023

I personally think it's fine to keep the existing GenericArg impl, is there a specific reason you want to use the derived one here?

@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 11, 2024
@JohnCSimon
Copy link
Member

@eggyal

Ping from triage: I'm closing this due to inactivity, Please reopen when you are ready to continue with this.
Note: if you are going to continue please open the PR BEFORE you push to it, else you won't be able to reopen - this is a quirk of github.
Thanks for your contribution.

@rustbot label: +S-inactive

@JohnCSimon JohnCSimon closed this Apr 14, 2024
@rustbot rustbot added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label Apr 14, 2024
@eggyal eggyal deleted the traversables-1-tidy-up branch April 16, 2024 13:52
@eggyal eggyal restored the traversables-1-tidy-up branch April 16, 2024 16:18
@eggyal
Copy link
Contributor Author

eggyal commented Apr 19, 2024

I finally managed to do some real investigation into this performance hit and am dumping here my interim findings. Yes, d0c9ba3 resulted in different inlining decisions, but the real impact appears to be from the loss of a tail call optimisation in <GenericArg as TypeVisitable<TyCtxt<'tcx>>::visit_with::<HasEscapingVarsVisitor>. I verified this in the disassembly, and then inspected the responsible llvm-ir (paths abbreviated):

before

; Function Attrs: uwtable
define hidden noundef zeroext i1 @<GenericArg as TypeVisitable<TyCtxt>>::visit_with::<HasEscapingVarsVisitor>(ptr noalias nocapture noundef readonly align 8 dereferenceable(8) %0, ptr noalias noundef align 4 dereferenceable(4) %1) unnamed_addr #4 !dbg !11388 {
  %3 = load i64, ptr %0, align 8, !dbg !12456, !range !12457, !noundef !32
  %4 = and i64 %3, 3, !dbg !12458
  switch i64 %4, label %5 [
    i64 1, label %6
    i64 0, label %17
    i64 2, label %25
  ], !dbg !12460

5: ; preds = %2
unreachable, !dbg !12461

6: ; preds = %2
%7 = and i64 %3, -4, !dbg !12462
%8 = icmp ne i64 %7, 0
tail call void @llvm.assume(i1 %8)
%9 = inttoptr i64 %7 to ptr, !dbg !12462
%10 = load i32, ptr %1, align 4, !dbg !12463, !range !5545, !noundef !32
%11 = load i32, ptr %9, align 4, !dbg !12463, !range !11440, !noundef !32
%12 = getelementptr i8, ptr %9, i64 4, !dbg !12463
%13 = load i32, ptr %12, align 4, !dbg !12463
%14 = icmp eq i32 %11, 1, !dbg !12465
%15 = icmp uge i32 %13, %10
%16 = select i1 %14, i1 %15, i1 false, !dbg !12465
br label %30, !dbg !12468

17: ; preds = %2
%18 = and i64 %3, -4, !dbg !12462
%19 = icmp ne i64 %18, 0
tail call void @llvm.assume(i1 %19)
%20 = inttoptr i64 %18 to ptr, !dbg !12462
%21 = load i32, ptr %1, align 4, !dbg !12469, !noundef !32
%22 = getelementptr i8, ptr %20, i64 52, !dbg !12469
%23 = load i32, ptr %22, align 4, !dbg !12469, !range !5545, !noundef !32
%24 = icmp ugt i32 %23, %21, !dbg !12471
br label %30, !dbg !12477

25: ; preds = %2
%26 = and i64 %3, -4, !dbg !12462
%27 = icmp ne i64 %26, 0
tail call void @llvm.assume(i1 %27)
%28 = inttoptr i64 %26 to ptr, !dbg !12462
%29 = tail call noundef zeroext i1 @<HasEscapingVarsVisitor as TypeVisitor<TyCtxt>>::visit_const(ptr noalias noundef nonnull align 4 dereferenceable(4) %1, ptr noalias noundef nonnull readonly align 8 dereferenceable(40) %28), !dbg !12478
br label %30, !dbg !12480

30: ; preds = %25, %17, %6
%31 = phi i1 [ %29, %25 ], [ %24, %17 ], [ %16, %6 ]
ret i1 %31, !dbg !12481
}

after

; Function Attrs: uwtable
define hidden noundef zeroext i1 @<GenericArg as TypeVisitable<TyCtxt>>::visit_with::<HasEscapingVarsVisitor>(ptr noalias nocapture noundef readonly align 8 dereferenceable(8) %0, ptr noalias noundef align 4 dereferenceable(4) %1) unnamed_addr #4 !dbg !11312 {
  %3 = load i64, ptr %0, align 8, !dbg !12629, !range !11944, !noundef !28
  %4 = and i64 %3, 3, !dbg !12630
  switch i64 %4, label %5 [
    i64 1, label %6
    i64 0, label %17
    i64 2, label %25
  ], !dbg !12632

5: ; preds = %2
unreachable, !dbg !12633

6: ; preds = %2
%7 = and i64 %3, -4, !dbg !12634
%8 = icmp ne i64 %7, 0
tail call void @llvm.assume(i1 %8)
%9 = inttoptr i64 %7 to ptr, !dbg !12634
%10 = load i32, ptr %1, align 4, !dbg !12635, !range !5468, !alias.scope !12638, !noundef !28
%11 = load i32, ptr %9, align 4, !dbg !12635, !range !11329, !noalias !12638, !noundef !28
%12 = getelementptr i8, ptr %9, i64 4, !dbg !12635
%13 = load i32, ptr %12, align 4, !dbg !12635, !noalias !12638
%14 = icmp ne i32 %11, 1, !dbg !12641
%15 = icmp ult i32 %13, %10
%16 = select i1 %14, i1 true, i1 %15, !dbg !12644
br i1 %16, label %30, label %31, !dbg !12644

17: ; preds = %2
%18 = and i64 %3, -4, !dbg !12634
%19 = icmp ne i64 %18, 0
tail call void @llvm.assume(i1 %19)
%20 = inttoptr i64 %18 to ptr, !dbg !12634
%21 = load i32, ptr %1, align 4, !dbg !12645, !alias.scope !12638, !noundef !28
%22 = getelementptr i8, ptr %20, i64 52, !dbg !12645
%23 = load i32, ptr %22, align 4, !dbg !12645, !range !5468, !noalias !12638, !noundef !28
%24 = icmp ugt i32 %23, %21, !dbg !12647
br i1 %24, label %31, label %30, !dbg !12653

25: ; preds = %2
%26 = and i64 %3, -4, !dbg !12634
%27 = icmp ne i64 %26, 0
tail call void @llvm.assume(i1 %27)
%28 = inttoptr i64 %26 to ptr, !dbg !12634
tail call void @llvm.experimental.noalias.scope.decl(metadata !12638), !dbg !12629
%29 = tail call noundef zeroext i1 @<HasEscapingVarsVisitor as TypeVisitor<TyCtxt>>::visit_const(ptr noalias noundef nonnull align 4 dereferenceable(4) %1, ptr noalias noundef nonnull readonly align 8 dereferenceable(40) %28), !dbg !12654
br i1 %29, label %31, label %30, !dbg !12656

30: ; preds = %25, %17, %6
br label %31, !dbg !12657

31: ; preds = %30, %25, %17, %6
%32 = phi i1 [ false, %30 ], [ true, %6 ], [ true, %17 ], [ true, %25 ]
ret i1 %32, !dbg !12658
}

That's as far as I've got right now. I'm not hugely familiar with LLVM IR so brushing up on that is my next port of call.

Of immediate note however:

  • %10and %21 now !alias.scope
  • %11, %13 and %23 now !noalias
  • tail call void @llvm.experimental.noalias.scope.decl now present
  • %29 now followed by an additional br and block
  • final phi now uses immediate boolean values rather than referring to temporaries

I suspect It appears that these last two differences, perhaps enabled by the earlier ones, have made it too difficult for LLVM to spot that TCO is possible. Not sure whether rustc ought to preempt that and deliver IR that's easier on LLVM or whether LLVM could be smarter here. In any event, I'll try and find a reduced test case for which I'll open an issue, referencing here.

@eggyal
Copy link
Contributor Author

eggyal commented Apr 19, 2024

Okay, I got to a reduced test case. The problem appears to originate in the early return (via ? operator in the real code) upon handling each enum variant, whereas the original code just returned the match expression (which we can change the macro to output, albeit with greater complexity as less code is able to be shared with derives on structs).

@eggyal
Copy link
Contributor Author

eggyal commented Apr 20, 2024

Having made that fix, I'm (locally, on aarch64-unknown-linux-gnu without PGO) broadly seeing quite encouraging results: but still a few benchmarks (primarily full builds of bitmaps and typenum) that are slightly regressed:

Summary

Range Mean Count
Regressions 0.32%, 2.52% 0.97% 27
Improvements -1.25%, -0.30% -0.53% 98
All -1.25%, 2.52% -0.20% 125

Aggregations

Profile

check

[ +0.32%,  +2.52%]
+0.93%15 (7)
[ -1.15%,  -0.30%]
-0.54%31 (18)
[ -1.15%,  +2.52%]
-0.06%46 (22)

debug

[ +0.40%,  +2.26%]
+1.18%5 (3)
[ -1.16%,  -0.30%]
-0.52%34 (22)
[ -1.16%,  +2.26%]
-0.30%39 (22)

opt

[ +0.33%,  +1.95%]
+0.90%7 (4)
[ -1.25%,  -0.32%]
-0.52%33 (22)
[ -1.25%,  +1.95%]
-0.27%40 (23)
Scenarios

full

[ +0.32%,  +2.52%]
+1.07%13 (7)
[ -0.66%,  -0.43%]
-0.58%4 (2)
[ -0.66%,  +2.52%]
+0.68%17 (9)

incr-full

[ +0.35%,  +1.65%]
+0.86%11 (6)
[ -0.54%,  -0.39%]
-0.50%6 (2)
[ -0.54%,  +1.65%]
+0.38%17 (8)

incr-unchanged

No regressions
[ -1.25%,  -0.30%]
-0.54%57 (23)
[ -1.25%,  -0.30%]
-0.54%57 (23)

incr-patched

[ +0.66%,  +1.13%]
+0.93%3 (1)
[ -1.13%,  -0.30%]
-0.50%31 (14)
[ -1.13%,  +1.13%]
-0.37%34 (15)
Category

primary

[ +0.38%,  +2.52%]
+1.17%16 (4)
[ -1.25%,  -0.30%]
-0.51%63 (15)
[ -1.25%,  +2.52%]
-0.17%79 (16)

secondary

[ +0.32%,  +1.21%]
+0.67%11 (3)
[ -1.02%,  -0.30%]
-0.56%35 (9)
[ -1.02%,  +1.21%]
-0.27%46 (11)

Details

Primary benchmarks
Benchmark Profile Scenario % Change Significance Factor
bitmaps-3.1.0 check full 2.52% 12.62x
bitmaps-3.1.0 debug full 2.26% 11.29x
bitmaps-3.1.0 opt full 1.95% 9.73x
bitmaps-3.1.0 check incr-full 1.65% 8.25x
bitmaps-3.1.0 debug incr-full 1.43% 7.15x
bitmaps-3.1.0 opt incr-full 1.26% 6.30x
typenum-1.17.0 opt incr-unchanged -1.25% 6.27x
typenum-1.17.0 debug incr-unchanged -1.16% 5.80x
typenum-1.17.0 check incr-unchanged -1.15% 5.73x
typenum-1.17.0 check incr-patched: add fn -1.13% 5.65x
typenum-1.17.0 opt full 1.10% 5.52x
typenum-1.17.0 debug full 1.09% 5.45x
typenum-1.17.0 check full 1.07% 5.36x
typenum-1.17.0 debug incr-patched: add fn -1.04% 5.21x
typenum-1.17.0 opt incr-patched: add fn -1.03% 5.13x
cranelift-codegen-0.82.1 check full 0.86% 4.28x
unicode-normalization-0.1.19 check incr-unchanged -0.77% 3.83x
unicode-normalization-0.1.19 opt incr-unchanged -0.76% 3.78x
unicode-normalization-0.1.19 debug incr-unchanged -0.74% 3.72x
cranelift-codegen-0.82.1 check incr-full 0.73% 3.64x
typenum-1.17.0 check incr-full 0.72% 3.61x
typenum-1.17.0 debug incr-full 0.72% 3.58x
unicode-normalization-0.1.19 check incr-patched: println -0.71% 3.54x
typenum-1.17.0 opt incr-full 0.68% 3.41x
webrender-2022 check incr-unchanged -0.60% 3.00x
webrender-2022 opt incr-unchanged -0.60% 2.98x
stm32f4-0.14.0 opt incr-unchanged -0.56% 2.82x
webrender-2022 check incr-patched: println -0.56% 2.81x
stm32f4-0.14.0 debug incr-unchanged -0.56% 2.78x
serde-1.0.136 opt incr-unchanged -0.55% 2.74x
webrender-2022 debug incr-unchanged -0.55% 2.74x
diesel-1.4.8 opt incr-unchanged -0.54% 2.70x
diesel-1.4.8 opt incr-patched: println -0.54% 2.68x
serde-1.0.136 opt incr-patched: println -0.53% 2.64x
stm32f4-0.14.0 debug incr-patched: negate -0.51% 2.57x
diesel-1.4.8 debug incr-unchanged -0.51% 2.56x
serde-1.0.136 debug incr-unchanged -0.50% 2.50x
serde-1.0.136 debug incr-patched: println -0.50% 2.48x
webrender-2022 debug incr-patched: println -0.50% 2.48x
cranelift-codegen-0.82.1 opt incr-unchanged -0.49% 2.46x
stm32f4-0.14.0 opt incr-patched: negate -0.48% 2.39x
stm32f4-0.14.0 check incr-unchanged -0.47% 2.36x
cranelift-codegen-0.82.1 debug incr-unchanged -0.46% 2.31x
diesel-1.4.8 debug incr-patched: println -0.46% 2.28x
bitmaps-3.1.0 debug incr-unchanged -0.45% 2.24x
stm32f4-0.14.0 check incr-patched: negate -0.45% 2.24x
bitmaps-3.1.0 opt incr-unchanged -0.44% 2.18x
exa-0.10.1 check incr-unchanged -0.43% 2.17x
bitmaps-3.1.0 check incr-unchanged -0.41% 2.06x
syn-1.0.89 opt incr-unchanged -0.40% 1.98x
serde-1.0.136 check incr-unchanged -0.39% 1.95x
syn-1.0.89 debug incr-patched: println -0.39% 1.95x
diesel-1.4.8 check incr-unchanged -0.39% 1.94x
syn-1.0.89 check incr-patched: println -0.39% 1.94x
serde-1.0.136 check incr-patched: println -0.38% 1.92x
cargo-0.60.0 opt incr-unchanged -0.38% 1.90x
serde_derive-1.0.136 check incr-full 0.38% 1.90x
syn-1.0.89 check incr-unchanged -0.38% 1.90x
serde_derive-1.0.136 check full 0.38% 1.88x
image-0.24.1 opt incr-unchanged -0.37% 1.87x
diesel-1.4.8 check incr-patched: println -0.37% 1.84x
bitmaps-3.1.0 opt incr-patched: println -0.37% 1.83x
cranelift-codegen-0.82.1 check incr-unchanged -0.36% 1.81x
image-0.24.1 debug incr-unchanged -0.36% 1.80x
cargo-0.60.0 debug incr-unchanged -0.36% 1.78x
hyper-0.14.18 opt incr-unchanged -0.35% 1.73x
unicode-normalization-0.1.19 debug incr-patched: println -0.34% 1.70x
cranelift-codegen-0.82.1 debug incr-patched: println -0.34% 1.68x
libc-0.2.124 opt incr-unchanged -0.33% 1.67x
cranelift-codegen-0.82.1 check incr-patched: println -0.33% 1.65x
regex-1.5.5 opt incr-patched: byte frequencies -0.33% 1.65x
regex-1.5.5 debug incr-patched: byte frequencies -0.33% 1.64x
unicode-normalization-0.1.19 opt incr-patched: println -0.33% 1.63x
regex-1.5.5 opt incr-unchanged -0.32% 1.58x
hyper-0.14.18 debug incr-unchanged -0.31% 1.55x
syn-1.0.89 debug incr-unchanged -0.31% 1.54x
exa-0.10.1 check incr-patched: printlns -0.31% 1.54x
image-0.24.1 debug incr-patched: println -0.30% 1.52x
libc-0.2.124 check incr-unchanged -0.30% 1.52x
Secondary benchmarks
Benchmark Profile Scenario % Change Significance Factor
deep-vector check full 1.21% 6.06x
deep-vector check incr-full 1.15% 5.74x
deep-vector check incr-patched: println 1.13% 5.65x
ucd check incr-unchanged -1.02% 5.09x
deep-vector check incr-patched: add vec item 1.00% 5.00x
coercions debug incr-patched: println -0.99% 4.95x
tuple-stress check incr-unchanged -0.97% 4.86x
tuple-stress opt incr-unchanged -0.92% 4.59x
tuple-stress debug incr-unchanged -0.87% 4.34x
ucd opt incr-unchanged -0.80% 4.01x
many-assoc-items check incr-unchanged -0.67% 3.36x
deep-vector opt incr-patched: add vec item 0.66% 3.32x
ucd debug incr-unchanged -0.66% 3.32x
wf-projection-stress-65510 check full -0.66% 3.31x
many-assoc-items opt incr-unchanged -0.66% 3.28x
many-assoc-items debug incr-unchanged -0.61% 3.06x
wf-projection-stress-65510 opt full -0.61% 3.04x
wf-projection-stress-65510 debug full -0.60% 3.01x
match-stress opt incr-unchanged -0.58% 2.88x
match-stress debug incr-unchanged -0.55% 2.77x
tuple-stress check incr-full -0.54% 2.69x
match-stress check incr-unchanged -0.54% 2.68x
tuple-stress debug incr-full -0.52% 2.59x
wf-projection-stress-65510 debug incr-full -0.51% 2.56x
wf-projection-stress-65510 opt incr-full -0.51% 2.54x
wf-projection-stress-65510 check incr-full -0.50% 2.51x
tuple-stress opt full -0.43% 2.17x
derive opt incr-unchanged -0.42% 2.11x
tuple-stress check incr-patched: new row -0.42% 2.10x
unify-linearly check incr-full 0.42% 2.08x
coercions check incr-unchanged -0.41% 2.07x
unify-linearly check full 0.41% 2.04x
tuple-stress opt incr-patched: new row -0.40% 2.00x
deep-vector debug full 0.40% 1.99x
tuple-stress debug incr-patched: new row -0.40% 1.99x
tuple-stress opt incr-full -0.39% 1.97x
derive debug incr-unchanged -0.39% 1.93x
derive check incr-unchanged -0.37% 1.86x
coercions check incr-patched: println -0.36% 1.79x
deep-vector opt incr-full 0.35% 1.74x
wg-grammar opt incr-unchanged -0.34% 1.69x
deep-vector opt incr-unchanged -0.34% 1.68x
tt-muncher opt full 0.33% 1.65x
tt-muncher check full 0.32% 1.58x
wg-grammar debug incr-unchanged -0.31% 1.56x
deep-vector debug incr-unchanged -0.30% 1.52x

I'm now examining the cachegrind diff for the most regressed benchmark (bitmaps-3.1.0 check full).

@eggyal
Copy link
Contributor Author

eggyal commented Apr 21, 2024

I managed to build locally with PGO (but not BOLT owing to llvm/llvm-project#89549) and, whilst different, I think the results are similarly encouraging:

Summary

Range Mean Count
Regressions 0.30%, 5.06% 0.59% 81
Improvements -3.88%, -0.35% -1.15% 10
All -3.88%, 5.06% 0.40% 91

Primary benchmarks

Benchmark Profile Scenario % Change Significance Factor
syn-1.0.89 opt full -2.56% 12.82x
serde_derive-1.0.136 opt incr-patched: println 1.21% 6.05x
serde_derive-1.0.136 opt full 0.90% 4.49x
diesel-1.4.8 check full 0.80% 3.98x
diesel-1.4.8 debug full 0.74% 3.69x
diesel-1.4.8 opt full 0.70% 3.51x
diesel-1.4.8 check incr-full 0.70% 3.48x
diesel-1.4.8 debug incr-full 0.64% 3.18x
diesel-1.4.8 opt incr-full 0.62% 3.08x
webrender-2022 debug full -0.59% 2.93x
unicode-normalization-0.1.19 debug incr-patched: println -0.40% 2.01x
cargo-0.60.0 debug full 0.39% 1.96x
serde-1.0.136 debug full 0.39% 1.96x
serde-1.0.136 check full 0.37% 1.85x
image-0.24.1 check full 0.37% 1.85x
regex-1.5.5 debug incr-unchanged -0.37% 1.85x
unicode-normalization-0.1.19 debug incr-unchanged -0.36% 1.79x
serde-1.0.136 check incr-full 0.32% 1.58x
image-0.24.1 check incr-full 0.31% 1.55x
regex-1.5.5 check full 0.31% 1.54x

Secondary benchmarks

Benchmark Profile Scenario % Change Significance Factor
coercions debug incr-patched: println 5.06% 25.29x
coercions debug incr-patched: add static arr item -3.88% 19.42x
coercions opt incr-patched: println -1.88% 9.42x
projection-caching check full 1.01% 5.06x
wf-projection-stress-65510 check full 0.94% 4.72x
wf-projection-stress-65510 debug full 0.93% 4.63x
projection-caching check incr-full 0.90% 4.52x
wf-projection-stress-65510 opt full 0.90% 4.48x
projection-caching opt full 0.82% 4.09x
deep-vector debug incr-patched: add vec item -0.78% 3.88x
wf-projection-stress-65510 check incr-full 0.73% 3.65x
wf-projection-stress-65510 debug incr-full 0.71% 3.55x
wg-grammar check full 0.70% 3.50x
wf-projection-stress-65510 opt incr-full 0.70% 3.49x
projection-caching opt incr-full 0.69% 3.47x
projection-caching debug full 0.69% 3.43x
regression-31157 check full 0.68% 3.42x
wg-grammar debug full 0.68% 3.40x
wg-grammar check incr-full 0.66% 3.29x
wg-grammar debug incr-full 0.64% 3.20x
wg-grammar opt incr-full 0.63% 3.16x
wg-grammar opt full 0.63% 3.13x
deep-vector check incr-patched: println 0.59% 2.95x
projection-caching debug incr-full 0.57% 2.87x
deep-vector check full 0.57% 2.86x
issue-58319 debug incr-unchanged 0.57% 2.83x
issue-46449 check full 0.57% 2.83x
deep-vector check incr-full 0.56% 2.82x
unify-linearly debug incr-unchanged 0.55% 2.76x
issue-46449 check incr-patched: u32 3072 0.55% 2.73x
deep-vector check incr-patched: add vec item 0.54% 2.68x
issue-46449 check incr-patched: empty 3072 0.53% 2.66x
unify-linearly debug incr-full 0.52% 2.60x
unify-linearly debug incr-patched: dummy fn 0.52% 2.60x
issue-46449 check incr-full 0.52% 2.59x
issue-46449 check incr-patched: io error 6144 0.50% 2.51x
regression-31157 check incr-full 0.50% 2.51x
coercions opt full 0.49% 2.43x
ucd opt full 0.48% 2.41x
issue-58319 debug incr-full 0.48% 2.40x
deep-vector debug incr-patched: println 0.46% 2.32x
tuple-stress check full 0.45% 2.24x
unify-linearly check full 0.45% 2.23x
tuple-stress check incr-patched: new row 0.44% 2.20x
issue-46449 check incr-patched: static str 6144 0.43% 2.17x
tuple-stress opt full 0.43% 2.14x
ucd debug full 0.42% 2.12x
tuple-stress debug full 0.41% 2.07x
tt-muncher opt incr-full 0.41% 2.06x
token-stream-stress debug incr-unchanged 0.40% 2.02x
deeply-nested-multi check full 0.40% 2.02x
deep-vector opt incr-full 0.39% 1.94x
tuple-stress check incr-full 0.39% 1.93x
unify-linearly check incr-full 0.38% 1.91x
helloworld-tiny opt full 0.38% 1.89x
tuple-stress opt incr-patched: new row 0.38% 1.88x
token-stream-stress debug incr-full 0.37% 1.86x
coercions check incr-unchanged 0.37% 1.85x
coercions debug incr-full -0.37% 1.84x
tuple-stress debug incr-full 0.37% 1.84x
tuple-stress debug incr-patched: new row 0.37% 1.84x
tuple-stress opt incr-full 0.37% 1.83x
ucd check full 0.36% 1.81x
deeply-nested-multi opt incr-unchanged 0.35% 1.74x
coercions opt incr-patched: add static arr item -0.35% 1.73x
tuple-stress check incr-unchanged 0.34% 1.69x
projection-caching debug incr-unchanged 0.34% 1.69x
deeply-nested-multi check incr-full 0.33% 1.67x
ucd check incr-full 0.33% 1.63x
deeply-nested-multi debug incr-unchanged 0.31% 1.55x
coercions opt incr-full 0.30% 1.52x

Seeing the following in cachegrind diff for the most regressed primary benchmark:

<   2,428,649   (5.1%,  93.6%)  compiler/rustc_middle/src/ty/generic_args.rs:
    3,723,192   (7.8%)            <&rustc_middle::ty::list::List<rustc_middle::ty::generic_args::GenericArg> as rustc_type_ir::fold::TypeFoldable<rustc_middle::ty::context::TyCtxt>>::try_fold_with::<rustc_infer::infer::freshen::TypeFreshener>
   -2,463,845  (-5.1%)            <rustc_trait_selection::traits::select::SelectionContext>::poly_select
    1,179,357   (2.5%)            <rustc_trait_selection::traits::select::SelectionContext>::match_impl
     -410,052  (-0.9%)            <rustc_middle::ty::Ty as rustc_type_ir::fold::TypeSuperFoldable<rustc_middle::ty::context::TyCtxt>>::try_super_fold_with::<rustc_infer::infer::freshen::TypeFreshener>
       95,950   (0.2%)            <&rustc_middle::ty::list::List<rustc_middle::ty::Ty> as rustc_type_ir::fold::TypeFoldable<rustc_middle::ty::context::TyCtxt>>::try_fold_with::<rustc_infer::infer::freshen::TypeFreshener>

Which I think again is down to inlining, but at least now narrowing down to a likely mono item...

bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 22, 2024
Prefer not to early return from derived visitables

LLVM is not able to tail-call optimise the final early return, as there instructions that follow it.  Making that call the returned expression resolves this. (As identified in rust-lang#117896 (comment)).

r? wg-compiler-performance
@eggyal eggyal deleted the traversables-1-tidy-up branch July 11, 2024 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-trait-system-refactor The Rustc Trait System Refactor Initiative (-Znext-solver)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants