TypeFoldable/TypeVisitable tidy up #117896

eggyal · 2023-11-14T01:16:56Z

I'm breaking #117726 into more manageable, logical chunks. This is the first of those, and just makes some minor/tidying refactors in preparation for the chunks to follow.

r? types

Rather than being concerned with internals of ExternalConstraintsData and PredefinedOpaquesData, we derive traversable impls for those types to which we then delegate.

UnevaluatedConst is not a type of any interest to folders or visitors, and therefore should not be "super visitable" at all.

Previously only enforced via comment, now via negative impls.

eggyal · 2023-11-14T11:35:47Z

r? lcnr

lcnr · 2023-11-14T13:20:52Z

r=me after perf

@bors try @rust-timer queue

bors · 2023-11-14T13:22:02Z

⌛ Trying commit dce79e4 with merge a5f77ae...

TypeFoldable/TypeVisitable tidy up I'm breaking rust-lang#117726 into more manageable, logical chunks. This is the first of those, and just makes some minor/tidying refactors in preparation for the chunks to follow. r? types

bors · 2023-11-14T14:49:04Z

☀️ Try build successful - checks-actions
Build commit: a5f77ae (a5f77ae56cf37de2b9c2f162d50cbe34a0623e7b)

bors · 2023-11-14T14:49:05Z

☀️ Try build successful - checks-actions
Build commit: a5f77ae (a5f77ae56cf37de2b9c2f162d50cbe34a0623e7b)

rust-timer · 2023-11-14T16:12:30Z

Finished benchmarking commit (a5f77ae): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.3%	[0.2%, 0.3%]	9
Regressions ❌ (secondary)	0.7%	[0.2%, 1.3%]	15
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.7%	[-0.7%, -0.7%]	2
All ❌✅ (primary)	0.3%	[0.2%, 0.3%]	9

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	3.6%	[1.1%, 5.7%]	4
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 674.599s -> 674.341s (-0.04%)
Artifact size: 311.12 MiB -> 311.37 MiB (0.08%)

eggyal · 2023-11-22T12:45:20Z

I've had a good play with cachegrind, and think this latest commit should resolve the perf regression.

lcnr · 2023-11-23T10:39:08Z

@bors try @rust-timer queue

bors · 2023-11-23T10:40:20Z

⌛ Trying commit 35e3019 with merge b57fd59...

TypeFoldable/TypeVisitable tidy up I'm breaking rust-lang#117726 into more manageable, logical chunks. This is the first of those, and just makes some minor/tidying refactors in preparation for the chunks to follow. r? types

bors · 2023-11-23T12:09:46Z

☀️ Try build successful - checks-actions
Build commit: b57fd59 (b57fd597f8ebdc1056806885939796c4be726cda)

rust-timer · 2023-11-23T18:32:32Z

Finished benchmarking commit (b57fd59): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.4%	[0.2%, 0.9%]	34
Regressions ❌ (secondary)	0.6%	[0.3%, 1.5%]	29
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.4%	[0.2%, 0.9%]	34

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.9%	[-1.9%, -1.9%]	1
All ❌✅ (primary)	-	-	0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.7%	[-2.2%, -0.7%]	3
All ❌✅ (primary)	-	-	0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 677.592s -> 678.36s (0.11%)
Artifact size: 313.72 MiB -> 313.81 MiB (0.03%)

lcnr

realized i have a pending review here 😅

do you feel happy with the perf changes and want me to review? I already looked at this before and was happy with the changes I think?

lcnr · 2023-11-14T13:18:20Z

compiler/rustc_macros/src/lib.rs

+    /// respectively folded (in definition order) using the `TypeFoldable` implementation for its
+    /// type. However, if a field of a struct or of an enum variant is annotated with
+    /// `#[type_foldable(identity)]` then that field will retain its incumbent value (and its type
+    /// is not required to implement `TypeFoldable`). However use of this attribute is dangerous


Suggested change

/// is not required to implement `TypeFoldable`). However use of this attribute is dangerous

/// is not required to implement `TypeFoldable`). Use of this attribute is dangerous

lcnr · 2023-11-14T13:19:20Z

compiler/rustc_macros/src/lib.rs

+    /// Each field of the struct or enum variant will be visited (in definition order) using the
+    /// `TypeVisitable` implementation for its type. However, if a field of a struct or of an enum
+    /// variant is annotated with `#[type_visitable(ignore)]` then that field will not be visited
+    /// (and its type is not required to implement `TypeVisitable`). However use of this attribute


Suggested change

/// (and its type is not required to implement `TypeVisitable`). However use of this attribute

/// (and its type is not required to implement `TypeVisitable`). Use of this attribute

lcnr · 2023-11-29T16:19:59Z

looking over it I would expect d0c9ba3 to be the cause of the regression 🤔 incredibly hot code so small changes can have a stupid impact here. I would expect that the additional match in GenericArgKind::pack is causing the regression 🤔

eggyal · 2023-11-29T16:23:50Z

That was my thought too, hence the latest commit tried to ensure the delegation is inlined so that the match can get optimised away. That's what I appeared to have achieved locally, at least - doesn't appear to have the same result in CI though.

Sorry, hadn't spent much more time thinking about it yet - but still do intend to.

lcnr · 2023-11-29T16:30:47Z

I personally think it's fine to keep the existing GenericArg impl, is there a specific reason you want to use the derived one here?

JohnCSimon · 2024-04-14T17:17:18Z

@eggyal

Ping from triage: I'm closing this due to inactivity, Please reopen when you are ready to continue with this.
Note: if you are going to continue please open the PR BEFORE you push to it, else you won't be able to reopen - this is a quirk of github.
Thanks for your contribution.

@rustbot label: +S-inactive

eggyal · 2024-04-19T09:48:13Z

I finally managed to do some real investigation into this performance hit and am dumping here my interim findings. Yes, d0c9ba3 resulted in different inlining decisions, but the real impact appears to be from the loss of a tail call optimisation in <GenericArg as TypeVisitable<TyCtxt<'tcx>>::visit_with::<HasEscapingVarsVisitor>. I verified this in the disassembly, and then inspected the responsible llvm-ir (paths abbreviated):

before


; Function Attrs: uwtable
define hidden noundef zeroext i1 @<GenericArg as TypeVisitable<TyCtxt>>::visit_with::<HasEscapingVarsVisitor>(ptr noalias nocapture noundef readonly align 8 dereferenceable(8) %0, ptr noalias noundef align 4 dereferenceable(4) %1) unnamed_addr #4 !dbg !11388 {
  %3 = load i64, ptr %0, align 8, !dbg !12456, !range !12457, !noundef !32
  %4 = and i64 %3, 3, !dbg !12458
  switch i64 %4, label %5 [
    i64 1, label %6
    i64 0, label %17
    i64 2, label %25
  ], !dbg !12460
5:                                                ; preds = %2

unreachable, !dbg !12461
6:                                                ; preds = %2

%7 = and i64 %3, -4, !dbg !12462

%8 = icmp ne i64 %7, 0

tail call void @llvm.assume(i1 %8)

%9 = inttoptr i64 %7 to ptr, !dbg !12462

%10 = load i32, ptr %1, align 4, !dbg !12463, !range !5545, !noundef !32

%11 = load i32, ptr %9, align 4, !dbg !12463, !range !11440, !noundef !32

%12 = getelementptr i8, ptr %9, i64 4, !dbg !12463

%13 = load i32, ptr %12, align 4, !dbg !12463

%14 = icmp eq i32 %11, 1, !dbg !12465

%15 = icmp uge i32 %13, %10

%16 = select i1 %14, i1 %15, i1 false, !dbg !12465

br label %30, !dbg !12468
17:                                               ; preds = %2

%18 = and i64 %3, -4, !dbg !12462

%19 = icmp ne i64 %18, 0

tail call void @llvm.assume(i1 %19)

%20 = inttoptr i64 %18 to ptr, !dbg !12462

%21 = load i32, ptr %1, align 4, !dbg !12469, !noundef !32

%22 = getelementptr i8, ptr %20, i64 52, !dbg !12469

%23 = load i32, ptr %22, align 4, !dbg !12469, !range !5545, !noundef !32

%24 = icmp ugt i32 %23, %21, !dbg !12471

br label %30, !dbg !12477
25:                                               ; preds = %2

%26 = and i64 %3, -4, !dbg !12462

%27 = icmp ne i64 %26, 0

tail call void @llvm.assume(i1 %27)

%28 = inttoptr i64 %26 to ptr, !dbg !12462

%29 = tail call noundef zeroext i1 @<HasEscapingVarsVisitor as TypeVisitor<TyCtxt>>::visit_const(ptr noalias noundef nonnull align 4 dereferenceable(4) %1, ptr noalias noundef nonnull readonly align 8 dereferenceable(40) %28), !dbg !12478

br label %30, !dbg !12480
30:                                               ; preds = %25, %17, %6

%31 = phi i1 [ %29, %25 ], [ %24, %17 ], [ %16, %6 ]

ret i1 %31, !dbg !12481

}

after


; Function Attrs: uwtable
define hidden noundef zeroext i1 @<GenericArg as TypeVisitable<TyCtxt>>::visit_with::<HasEscapingVarsVisitor>(ptr noalias nocapture noundef readonly align 8 dereferenceable(8) %0, ptr noalias noundef align 4 dereferenceable(4) %1) unnamed_addr #4 !dbg !11312 {
  %3 = load i64, ptr %0, align 8, !dbg !12629, !range !11944, !noundef !28
  %4 = and i64 %3, 3, !dbg !12630
  switch i64 %4, label %5 [
    i64 1, label %6
    i64 0, label %17
    i64 2, label %25
  ], !dbg !12632
5:                                                ; preds = %2

unreachable, !dbg !12633
6:                                                ; preds = %2

%7 = and i64 %3, -4, !dbg !12634

%8 = icmp ne i64 %7, 0

tail call void @llvm.assume(i1 %8)

%9 = inttoptr i64 %7 to ptr, !dbg !12634

%10 = load i32, ptr %1, align 4, !dbg !12635, !range !5468, !alias.scope !12638, !noundef !28

%11 = load i32, ptr %9, align 4, !dbg !12635, !range !11329, !noalias !12638, !noundef !28

%12 = getelementptr i8, ptr %9, i64 4, !dbg !12635

%13 = load i32, ptr %12, align 4, !dbg !12635, !noalias !12638

%14 = icmp ne i32 %11, 1, !dbg !12641

%15 = icmp ult i32 %13, %10

%16 = select i1 %14, i1 true, i1 %15, !dbg !12644

br i1 %16, label %30, label %31, !dbg !12644
17:                                               ; preds = %2

%18 = and i64 %3, -4, !dbg !12634

%19 = icmp ne i64 %18, 0

tail call void @llvm.assume(i1 %19)

%20 = inttoptr i64 %18 to ptr, !dbg !12634

%21 = load i32, ptr %1, align 4, !dbg !12645, !alias.scope !12638, !noundef !28

%22 = getelementptr i8, ptr %20, i64 52, !dbg !12645

%23 = load i32, ptr %22, align 4, !dbg !12645, !range !5468, !noalias !12638, !noundef !28

%24 = icmp ugt i32 %23, %21, !dbg !12647

br i1 %24, label %31, label %30, !dbg !12653
25:                                               ; preds = %2

%26 = and i64 %3, -4, !dbg !12634

%27 = icmp ne i64 %26, 0

tail call void @llvm.assume(i1 %27)

%28 = inttoptr i64 %26 to ptr, !dbg !12634

tail call void @llvm.experimental.noalias.scope.decl(metadata !12638), !dbg !12629

%29 = tail call noundef zeroext i1 @<HasEscapingVarsVisitor as TypeVisitor<TyCtxt>>::visit_const(ptr noalias noundef nonnull align 4 dereferenceable(4) %1, ptr noalias noundef nonnull readonly align 8 dereferenceable(40) %28), !dbg !12654

br i1 %29, label %31, label %30, !dbg !12656
30:                                               ; preds = %25, %17, %6

br label %31, !dbg !12657
31:                                               ; preds = %30, %25, %17, %6

%32 = phi i1 [ false, %30 ], [ true, %6 ], [ true, %17 ], [ true, %25 ]

ret i1 %32, !dbg !12658

}

~~That's as far as I've got right now. I'm not hugely familiar with LLVM IR so brushing up on that is my next port of call.~~

Of immediate note however:

%10and %21 now !alias.scope
%11, %13 and %23 now !noalias
tail call void @llvm.experimental.noalias.scope.decl now present
%29 now followed by an additional br and block
final phi now uses immediate boolean values rather than referring to temporaries

~~I suspect~~ It appears that these last two differences, ~~perhaps enabled by the earlier ones,~~ have made it too difficult for LLVM to spot that TCO is possible. Not sure whether rustc ought to preempt that and deliver IR that's easier on LLVM or whether LLVM could be smarter here. In any event, I'll try and find a reduced test case for which I'll open an issue, referencing here.

eggyal · 2024-04-19T23:31:18Z

Okay, I got to a reduced test case. The problem appears to originate in the early return (via ? operator in the real code) upon handling each enum variant, whereas the original code just returned the match expression (which we can change the macro to output, albeit with greater complexity as less code is able to be shared with derives on structs).

eggyal · 2024-04-20T15:31:48Z

Having made that fix, I'm (locally, on aarch64-unknown-linux-gnu without PGO) broadly seeing quite encouraging results: but still a few benchmarks (primarily full builds of bitmaps and typenum) that are slightly regressed:

Summary

	Range	Mean	Count
Regressions	0.32%, 2.52%	0.97%	27
Improvements	-1.25%, -0.30%	-0.53%	98
All	-1.25%, 2.52%	-0.20%	125

Aggregations

Profile

check

[ +0.32%, +2.52%]	+0.93%	15 (7)
[ -1.15%, -0.30%]	-0.54%	31 (18)
[ -1.15%, +2.52%]	-0.06%	46 (22)

debug

[ +0.40%, +2.26%]	+1.18%	5 (3)
[ -1.16%, -0.30%]	-0.52%	34 (22)
[ -1.16%, +2.26%]	-0.30%	39 (22)

opt

[ +0.33%, +1.95%]	+0.90%	7 (4)
[ -1.25%, -0.32%]	-0.52%	33 (22)
[ -1.25%, +1.95%]	-0.27%	40 (23)

Scenarios

full

[ +0.32%, +2.52%]	+1.07%	13 (7)
[ -0.66%, -0.43%]	-0.58%	4 (2)
[ -0.66%, +2.52%]	+0.68%	17 (9)

incr-full

[ +0.35%, +1.65%]	+0.86%	11 (6)
[ -0.54%, -0.39%]	-0.50%	6 (2)
[ -0.54%, +1.65%]	+0.38%	17 (8)

incr-unchanged

No regressions
[ -1.25%, -0.30%]	-0.54%	57 (23)
[ -1.25%, -0.30%]	-0.54%	57 (23)

incr-patched

[ +0.66%, +1.13%]	+0.93%	3 (1)
[ -1.13%, -0.30%]	-0.50%	31 (14)
[ -1.13%, +1.13%]	-0.37%	34 (15)

primary

[ +0.38%, +2.52%]	+1.17%	16 (4)
[ -1.25%, -0.30%]	-0.51%	63 (15)
[ -1.25%, +2.52%]	-0.17%	79 (16)

secondary

[ +0.32%, +1.21%]	+0.67%	11 (3)
[ -1.02%, -0.30%]	-0.56%	35 (9)
[ -1.02%, +1.21%]	-0.27%	46 (11)

Details

Primary benchmarks

Benchmark	Profile	Scenario	% Change	Significance Factor
bitmaps-3.1.0	check	full	2.52%	12.62x
bitmaps-3.1.0	debug	full	2.26%	11.29x
bitmaps-3.1.0	opt	full	1.95%	9.73x
bitmaps-3.1.0	check	incr-full	1.65%	8.25x
bitmaps-3.1.0	debug	incr-full	1.43%	7.15x
bitmaps-3.1.0	opt	incr-full	1.26%	6.30x
typenum-1.17.0	opt	incr-unchanged	-1.25%	6.27x
typenum-1.17.0	debug	incr-unchanged	-1.16%	5.80x
typenum-1.17.0	check	incr-unchanged	-1.15%	5.73x
typenum-1.17.0	check	incr-patched: add fn	-1.13%	5.65x
typenum-1.17.0	opt	full	1.10%	5.52x
typenum-1.17.0	debug	full	1.09%	5.45x
typenum-1.17.0	check	full	1.07%	5.36x
typenum-1.17.0	debug	incr-patched: add fn	-1.04%	5.21x
typenum-1.17.0	opt	incr-patched: add fn	-1.03%	5.13x
cranelift-codegen-0.82.1	check	full	0.86%	4.28x
unicode-normalization-0.1.19	check	incr-unchanged	-0.77%	3.83x
unicode-normalization-0.1.19	opt	incr-unchanged	-0.76%	3.78x
unicode-normalization-0.1.19	debug	incr-unchanged	-0.74%	3.72x
cranelift-codegen-0.82.1	check	incr-full	0.73%	3.64x
typenum-1.17.0	check	incr-full	0.72%	3.61x
typenum-1.17.0	debug	incr-full	0.72%	3.58x
unicode-normalization-0.1.19	check	incr-patched: println	-0.71%	3.54x
typenum-1.17.0	opt	incr-full	0.68%	3.41x
webrender-2022	check	incr-unchanged	-0.60%	3.00x
webrender-2022	opt	incr-unchanged	-0.60%	2.98x
stm32f4-0.14.0	opt	incr-unchanged	-0.56%	2.82x
webrender-2022	check	incr-patched: println	-0.56%	2.81x
stm32f4-0.14.0	debug	incr-unchanged	-0.56%	2.78x
serde-1.0.136	opt	incr-unchanged	-0.55%	2.74x
webrender-2022	debug	incr-unchanged	-0.55%	2.74x
diesel-1.4.8	opt	incr-unchanged	-0.54%	2.70x
diesel-1.4.8	opt	incr-patched: println	-0.54%	2.68x
serde-1.0.136	opt	incr-patched: println	-0.53%	2.64x
stm32f4-0.14.0	debug	incr-patched: negate	-0.51%	2.57x
diesel-1.4.8	debug	incr-unchanged	-0.51%	2.56x
serde-1.0.136	debug	incr-unchanged	-0.50%	2.50x
serde-1.0.136	debug	incr-patched: println	-0.50%	2.48x
webrender-2022	debug	incr-patched: println	-0.50%	2.48x
cranelift-codegen-0.82.1	opt	incr-unchanged	-0.49%	2.46x
stm32f4-0.14.0	opt	incr-patched: negate	-0.48%	2.39x
stm32f4-0.14.0	check	incr-unchanged	-0.47%	2.36x
cranelift-codegen-0.82.1	debug	incr-unchanged	-0.46%	2.31x
diesel-1.4.8	debug	incr-patched: println	-0.46%	2.28x
bitmaps-3.1.0	debug	incr-unchanged	-0.45%	2.24x
stm32f4-0.14.0	check	incr-patched: negate	-0.45%	2.24x
bitmaps-3.1.0	opt	incr-unchanged	-0.44%	2.18x
exa-0.10.1	check	incr-unchanged	-0.43%	2.17x
bitmaps-3.1.0	check	incr-unchanged	-0.41%	2.06x
syn-1.0.89	opt	incr-unchanged	-0.40%	1.98x
serde-1.0.136	check	incr-unchanged	-0.39%	1.95x
syn-1.0.89	debug	incr-patched: println	-0.39%	1.95x
diesel-1.4.8	check	incr-unchanged	-0.39%	1.94x
syn-1.0.89	check	incr-patched: println	-0.39%	1.94x
serde-1.0.136	check	incr-patched: println	-0.38%	1.92x
cargo-0.60.0	opt	incr-unchanged	-0.38%	1.90x
serde_derive-1.0.136	check	incr-full	0.38%	1.90x
syn-1.0.89	check	incr-unchanged	-0.38%	1.90x
serde_derive-1.0.136	check	full	0.38%	1.88x
image-0.24.1	opt	incr-unchanged	-0.37%	1.87x
diesel-1.4.8	check	incr-patched: println	-0.37%	1.84x
bitmaps-3.1.0	opt	incr-patched: println	-0.37%	1.83x
cranelift-codegen-0.82.1	check	incr-unchanged	-0.36%	1.81x
image-0.24.1	debug	incr-unchanged	-0.36%	1.80x
cargo-0.60.0	debug	incr-unchanged	-0.36%	1.78x
hyper-0.14.18	opt	incr-unchanged	-0.35%	1.73x
unicode-normalization-0.1.19	debug	incr-patched: println	-0.34%	1.70x
cranelift-codegen-0.82.1	debug	incr-patched: println	-0.34%	1.68x
libc-0.2.124	opt	incr-unchanged	-0.33%	1.67x
cranelift-codegen-0.82.1	check	incr-patched: println	-0.33%	1.65x
regex-1.5.5	opt	incr-patched: byte frequencies	-0.33%	1.65x
regex-1.5.5	debug	incr-patched: byte frequencies	-0.33%	1.64x
unicode-normalization-0.1.19	opt	incr-patched: println	-0.33%	1.63x
regex-1.5.5	opt	incr-unchanged	-0.32%	1.58x
hyper-0.14.18	debug	incr-unchanged	-0.31%	1.55x
syn-1.0.89	debug	incr-unchanged	-0.31%	1.54x
exa-0.10.1	check	incr-patched: printlns	-0.31%	1.54x
image-0.24.1	debug	incr-patched: println	-0.30%	1.52x
libc-0.2.124	check	incr-unchanged	-0.30%	1.52x

Secondary benchmarks

Benchmark	Profile	Scenario	% Change	Significance Factor
deep-vector	check	full	1.21%	6.06x
deep-vector	check	incr-full	1.15%	5.74x
deep-vector	check	incr-patched: println	1.13%	5.65x
ucd	check	incr-unchanged	-1.02%	5.09x
deep-vector	check	incr-patched: add vec item	1.00%	5.00x
coercions	debug	incr-patched: println	-0.99%	4.95x
tuple-stress	check	incr-unchanged	-0.97%	4.86x
tuple-stress	opt	incr-unchanged	-0.92%	4.59x
tuple-stress	debug	incr-unchanged	-0.87%	4.34x
ucd	opt	incr-unchanged	-0.80%	4.01x
many-assoc-items	check	incr-unchanged	-0.67%	3.36x
deep-vector	opt	incr-patched: add vec item	0.66%	3.32x
ucd	debug	incr-unchanged	-0.66%	3.32x
wf-projection-stress-65510	check	full	-0.66%	3.31x
many-assoc-items	opt	incr-unchanged	-0.66%	3.28x
many-assoc-items	debug	incr-unchanged	-0.61%	3.06x
wf-projection-stress-65510	opt	full	-0.61%	3.04x
wf-projection-stress-65510	debug	full	-0.60%	3.01x
match-stress	opt	incr-unchanged	-0.58%	2.88x
match-stress	debug	incr-unchanged	-0.55%	2.77x
tuple-stress	check	incr-full	-0.54%	2.69x
match-stress	check	incr-unchanged	-0.54%	2.68x
tuple-stress	debug	incr-full	-0.52%	2.59x
wf-projection-stress-65510	debug	incr-full	-0.51%	2.56x
wf-projection-stress-65510	opt	incr-full	-0.51%	2.54x
wf-projection-stress-65510	check	incr-full	-0.50%	2.51x
tuple-stress	opt	full	-0.43%	2.17x
derive	opt	incr-unchanged	-0.42%	2.11x
tuple-stress	check	incr-patched: new row	-0.42%	2.10x
unify-linearly	check	incr-full	0.42%	2.08x
coercions	check	incr-unchanged	-0.41%	2.07x
unify-linearly	check	full	0.41%	2.04x
tuple-stress	opt	incr-patched: new row	-0.40%	2.00x
deep-vector	debug	full	0.40%	1.99x
tuple-stress	debug	incr-patched: new row	-0.40%	1.99x
tuple-stress	opt	incr-full	-0.39%	1.97x
derive	debug	incr-unchanged	-0.39%	1.93x
derive	check	incr-unchanged	-0.37%	1.86x
coercions	check	incr-patched: println	-0.36%	1.79x
deep-vector	opt	incr-full	0.35%	1.74x
wg-grammar	opt	incr-unchanged	-0.34%	1.69x
deep-vector	opt	incr-unchanged	-0.34%	1.68x
tt-muncher	opt	full	0.33%	1.65x
tt-muncher	check	full	0.32%	1.58x
wg-grammar	debug	incr-unchanged	-0.31%	1.56x
deep-vector	debug	incr-unchanged	-0.30%	1.52x

I'm now examining the cachegrind diff for the most regressed benchmark (bitmaps-3.1.0 check full).

eggyal · 2024-04-21T20:23:46Z

I managed to build locally with PGO (but not BOLT owing to llvm/llvm-project#89549) and, whilst different, I think the results are similarly encouraging:

Summary

	Range	Mean	Count
Regressions	0.30%, 5.06%	0.59%	81
Improvements	-3.88%, -0.35%	-1.15%	10
All	-3.88%, 5.06%	0.40%	91

Primary benchmarks

Benchmark	Profile	Scenario	% Change	Significance Factor
syn-1.0.89	opt	full	-2.56%	12.82x
serde_derive-1.0.136	opt	incr-patched: println	1.21%	6.05x
serde_derive-1.0.136	opt	full	0.90%	4.49x
diesel-1.4.8	check	full	0.80%	3.98x
diesel-1.4.8	debug	full	0.74%	3.69x
diesel-1.4.8	opt	full	0.70%	3.51x
diesel-1.4.8	check	incr-full	0.70%	3.48x
diesel-1.4.8	debug	incr-full	0.64%	3.18x
diesel-1.4.8	opt	incr-full	0.62%	3.08x
webrender-2022	debug	full	-0.59%	2.93x
unicode-normalization-0.1.19	debug	incr-patched: println	-0.40%	2.01x
cargo-0.60.0	debug	full	0.39%	1.96x
serde-1.0.136	debug	full	0.39%	1.96x
serde-1.0.136	check	full	0.37%	1.85x
image-0.24.1	check	full	0.37%	1.85x
regex-1.5.5	debug	incr-unchanged	-0.37%	1.85x
unicode-normalization-0.1.19	debug	incr-unchanged	-0.36%	1.79x
serde-1.0.136	check	incr-full	0.32%	1.58x
image-0.24.1	check	incr-full	0.31%	1.55x
regex-1.5.5	check	full	0.31%	1.54x

Secondary benchmarks

Benchmark	Profile	Scenario	% Change	Significance Factor
coercions	debug	incr-patched: println	5.06%	25.29x
coercions	debug	incr-patched: add static arr item	-3.88%	19.42x
coercions	opt	incr-patched: println	-1.88%	9.42x
projection-caching	check	full	1.01%	5.06x
wf-projection-stress-65510	check	full	0.94%	4.72x
wf-projection-stress-65510	debug	full	0.93%	4.63x
projection-caching	check	incr-full	0.90%	4.52x
wf-projection-stress-65510	opt	full	0.90%	4.48x
projection-caching	opt	full	0.82%	4.09x
deep-vector	debug	incr-patched: add vec item	-0.78%	3.88x
wf-projection-stress-65510	check	incr-full	0.73%	3.65x
wf-projection-stress-65510	debug	incr-full	0.71%	3.55x
wg-grammar	check	full	0.70%	3.50x
wf-projection-stress-65510	opt	incr-full	0.70%	3.49x
projection-caching	opt	incr-full	0.69%	3.47x
projection-caching	debug	full	0.69%	3.43x
regression-31157	check	full	0.68%	3.42x
wg-grammar	debug	full	0.68%	3.40x
wg-grammar	check	incr-full	0.66%	3.29x
wg-grammar	debug	incr-full	0.64%	3.20x
wg-grammar	opt	incr-full	0.63%	3.16x
wg-grammar	opt	full	0.63%	3.13x
deep-vector	check	incr-patched: println	0.59%	2.95x
projection-caching	debug	incr-full	0.57%	2.87x
deep-vector	check	full	0.57%	2.86x
issue-58319	debug	incr-unchanged	0.57%	2.83x
issue-46449	check	full	0.57%	2.83x
deep-vector	check	incr-full	0.56%	2.82x
unify-linearly	debug	incr-unchanged	0.55%	2.76x
issue-46449	check	incr-patched: u32 3072	0.55%	2.73x
deep-vector	check	incr-patched: add vec item	0.54%	2.68x
issue-46449	check	incr-patched: empty 3072	0.53%	2.66x
unify-linearly	debug	incr-full	0.52%	2.60x
unify-linearly	debug	incr-patched: dummy fn	0.52%	2.60x
issue-46449	check	incr-full	0.52%	2.59x
issue-46449	check	incr-patched: io error 6144	0.50%	2.51x
regression-31157	check	incr-full	0.50%	2.51x
coercions	opt	full	0.49%	2.43x
ucd	opt	full	0.48%	2.41x
issue-58319	debug	incr-full	0.48%	2.40x
deep-vector	debug	incr-patched: println	0.46%	2.32x
tuple-stress	check	full	0.45%	2.24x
unify-linearly	check	full	0.45%	2.23x
tuple-stress	check	incr-patched: new row	0.44%	2.20x
issue-46449	check	incr-patched: static str 6144	0.43%	2.17x
tuple-stress	opt	full	0.43%	2.14x
ucd	debug	full	0.42%	2.12x
tuple-stress	debug	full	0.41%	2.07x
tt-muncher	opt	incr-full	0.41%	2.06x
token-stream-stress	debug	incr-unchanged	0.40%	2.02x
deeply-nested-multi	check	full	0.40%	2.02x
deep-vector	opt	incr-full	0.39%	1.94x
tuple-stress	check	incr-full	0.39%	1.93x
unify-linearly	check	incr-full	0.38%	1.91x
helloworld-tiny	opt	full	0.38%	1.89x
tuple-stress	opt	incr-patched: new row	0.38%	1.88x
token-stream-stress	debug	incr-full	0.37%	1.86x
coercions	check	incr-unchanged	0.37%	1.85x
coercions	debug	incr-full	-0.37%	1.84x
tuple-stress	debug	incr-full	0.37%	1.84x
tuple-stress	debug	incr-patched: new row	0.37%	1.84x
tuple-stress	opt	incr-full	0.37%	1.83x
ucd	check	full	0.36%	1.81x
deeply-nested-multi	opt	incr-unchanged	0.35%	1.74x
coercions	opt	incr-patched: add static arr item	-0.35%	1.73x
tuple-stress	check	incr-unchanged	0.34%	1.69x
projection-caching	debug	incr-unchanged	0.34%	1.69x
deeply-nested-multi	check	incr-full	0.33%	1.67x
ucd	check	incr-full	0.33%	1.63x
deeply-nested-multi	debug	incr-unchanged	0.31%	1.55x
coercions	opt	incr-full	0.30%	1.52x

Seeing the following in cachegrind diff for the most regressed primary benchmark:

<   2,428,649   (5.1%,  93.6%)  compiler/rustc_middle/src/ty/generic_args.rs:
    3,723,192   (7.8%)            <&rustc_middle::ty::list::List<rustc_middle::ty::generic_args::GenericArg> as rustc_type_ir::fold::TypeFoldable<rustc_middle::ty::context::TyCtxt>>::try_fold_with::<rustc_infer::infer::freshen::TypeFreshener>
   -2,463,845  (-5.1%)            <rustc_trait_selection::traits::select::SelectionContext>::poly_select
    1,179,357   (2.5%)            <rustc_trait_selection::traits::select::SelectionContext>::match_impl
     -410,052  (-0.9%)            <rustc_middle::ty::Ty as rustc_type_ir::fold::TypeSuperFoldable<rustc_middle::ty::context::TyCtxt>>::try_super_fold_with::<rustc_infer::infer::freshen::TypeFreshener>
       95,950   (0.2%)            <&rustc_middle::ty::list::List<rustc_middle::ty::Ty> as rustc_type_ir::fold::TypeFoldable<rustc_middle::ty::context::TyCtxt>>::try_fold_with::<rustc_infer::infer::freshen::TypeFreshener>

Which I think again is down to inlining, but at least now narrowing down to a likely mono item...

Prefer not to early return from derived visitables LLVM is not able to tail-call optimise the final early return, as there instructions that follow it. Making that call the returned expression resolves this. (As identified in rust-lang#117896 (comment)). r? wg-compiler-performance

eggyal added 5 commits November 14, 2023 01:02

Derive traversable impls for GenericArgKind

d0c9ba3

Simplify traversable impls in solver wrapper types

674b677

Rather than being concerned with internals of ExternalConstraintsData and PredefinedOpaquesData, we derive traversable impls for those types to which we then delegate.

Unimpl TypeSuperVisitable for UnevaluatedConst

3c332fe

UnevaluatedConst is not a type of any interest to folders or visitors, and therefore should not be "super visitable" at all.

Enforce ClosureOutlivesSubjectTy not traversable

e26386f

Previously only enforced via comment, now via negative impls.

Improve docs of traversable derivation macros

d6fcafe

rustbot assigned BoxyUwU Nov 14, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-trait-system-refactor The Rustc Trait System Refactor Initiative (-Znext-solver) labels Nov 14, 2023

Unify traversable derive macro implementations

dce79e4

rustbot assigned lcnr and unassigned BoxyUwU Nov 14, 2023

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 14, 2023

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Nov 14, 2023

Inline visiting of GenericArgs

35e3019

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 23, 2023

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 23, 2023

lcnr reviewed Nov 29, 2023

View reviewed changes

lcnr added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 29, 2023

JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 11, 2024

Dylan-DPC mentioned this pull request Feb 17, 2024

Automatically make trivial types noop-traversable #117726

Closed

JohnCSimon closed this Apr 14, 2024

rustbot added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label Apr 14, 2024

eggyal deleted the traversables-1-tidy-up branch April 16, 2024 13:52

eggyal restored the traversables-1-tidy-up branch April 16, 2024 16:18

eggyal mentioned this pull request Apr 22, 2024

Prefer not to early return from derived visitables #124263

Closed

eggyal deleted the traversables-1-tidy-up branch July 11, 2024 11:14

	/// is not required to implement `TypeFoldable`). However use of this attribute is dangerous
	/// is not required to implement `TypeFoldable`). Use of this attribute is dangerous

	/// (and its type is not required to implement `TypeVisitable`). However use of this attribute
	/// (and its type is not required to implement `TypeVisitable`). Use of this attribute

TypeFoldable/TypeVisitable tidy up #117896

TypeFoldable/TypeVisitable tidy up #117896

Conversation

eggyal commented Nov 14, 2023

eggyal commented Nov 14, 2023

lcnr commented Nov 14, 2023

This comment has been minimized.

bors commented Nov 14, 2023

bors commented Nov 14, 2023

bors commented Nov 14, 2023

This comment has been minimized.

rust-timer commented Nov 14, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

eggyal commented Nov 22, 2023

lcnr commented Nov 23, 2023

This comment has been minimized.

bors commented Nov 23, 2023

bors commented Nov 23, 2023

This comment has been minimized.

rust-timer commented Nov 23, 2023

Overall result: ❌ regressions - ACTION NEEDED

lcnr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcnr commented Nov 29, 2023 • edited Loading

eggyal commented Nov 29, 2023

lcnr commented Nov 29, 2023

JohnCSimon commented Apr 14, 2024

eggyal commented Apr 19, 2024 • edited Loading

eggyal commented Apr 19, 2024 • edited Loading

eggyal commented Apr 20, 2024 • edited Loading

Summary

Aggregations

check

debug

opt

full

incr-full

incr-unchanged

incr-patched

primary

secondary

Details

eggyal commented Apr 21, 2024

Summary

Primary benchmarks

Secondary benchmarks

lcnr commented Nov 29, 2023 •

edited

Loading

eggyal commented Apr 19, 2024 •

edited

Loading

eggyal commented Apr 19, 2024 •

edited

Loading

eggyal commented Apr 20, 2024 •

edited

Loading