transmute: Mark edges by byte sets, not byte values #140168

joshlf · 2025-04-22T18:33:00Z

This leads to drastic performance improvements. For example, on the author's 2024 MacBook Pro, the time to convert the Tree representation of a u64 to its equivalent DFA representation drops from ~8.5ms to ~1us, a reduction of ~8,500x. See bench_dfa_from_tree.

Similarly, the time to execute a transmutability query from u64 to u64 drops from ~35us to ~1.7us, a reduction of ~20x. See bench_transmute.

r? @jswrenn

matthiaskrgr · 2025-04-22T18:41:44Z

@bors try @rust-timer queue

transmute: Mark edges by byte sets, not byte values This leads to drastic performance improvements. For example, on the author's 2024 MacBook Pro, the time to convert the `Tree` representation of a `u64` to its equivalent DFA representation drops from ~8.5ms to ~1us, a reduction of ~8,500x. See `bench_dfa_from_tree`. Similarly, the time to execute a transmutability query from `u64` to `u64` drops from ~35us to ~1.7us, a reduction of ~20x. See `bench_transmute`. r? `@jswrenn`

bors · 2025-04-22T18:42:55Z

⌛ Testing commit 4198021 with merge 4bd3736...

matthiaskrgr · 2025-04-23T06:09:29Z

@bors try @rust-timer queue

bors · 2025-04-23T06:10:39Z

⌛ Trying commit 2431d88 with merge a0d017e...

transmute: Mark edges by byte sets, not byte values This leads to drastic performance improvements. For example, on the author's 2024 MacBook Pro, the time to convert the `Tree` representation of a `u64` to its equivalent DFA representation drops from ~8.5ms to ~1us, a reduction of ~8,500x. See `bench_dfa_from_tree`. Similarly, the time to execute a transmutability query from `u64` to `u64` drops from ~35us to ~1.7us, a reduction of ~20x. See `bench_transmute`. r? `@jswrenn`

bors · 2025-04-23T08:17:26Z

☀️ Try build successful - checks-actions
Build commit: a0d017e (a0d017e8bbf248582473ad0c1a931dbb8755d47d)

rust-timer · 2025-04-23T09:48:15Z

Finished benchmarking commit (a0d017e): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.1%, 0.3%]	21
Regressions ❌ (secondary)	0.3%	[0.1%, 0.6%]	58
Improvements ✅ (primary)	-0.2%	[-0.2%, -0.2%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.2%	[-0.2%, 0.3%]	22

Max RSS (memory usage)

Results (primary 0.0%, secondary 0.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.1%	[0.4%, 2.5%]	3
Regressions ❌ (secondary)	2.3%	[1.9%, 2.5%]	6
Improvements ✅ (primary)	-1.6%	[-2.4%, -0.7%]	2
Improvements ✅ (secondary)	-6.1%	[-7.9%, -4.3%]	2
All ❌✅ (primary)	0.0%	[-2.4%, 2.5%]	5

Cycles

Results (secondary -2.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.3%	[-2.8%, -2.0%]	3
All ❌✅ (primary)	-	-	0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 774.578s -> 775.487s (0.12%)
Artifact size: 365.08 MiB -> 365.07 MiB (-0.00%)

compiler/rustc_transmute/src/layout/dfa.rs

jswrenn · 2025-04-23T17:38:53Z

compiler/rustc_transmute/src/layout/dfa.rs

    }

-    pub(crate) fn refs_from(&self, start: State) -> Option<&Map<R, State>> {
-        Some(&self.transitions.get(&start)?.ref_transitions)
+    pub(crate) fn iter_bytes_from(&self, start: State) -> impl Iterator<Item = (Byte, State)> {


Would lightly prefer dropping the iter_ prefix, so long as there's no conflict between this and a hypothetical bytes_from method that doesn't return an iter:

Suggested change

pub(crate) fn iter_bytes_from(&self, start: State) -> impl Iterator<Item = (Byte, State)> {

pub(crate) fn iter_bytes_from(&self, start: State) -> impl Iterator<Item = (Byte, State)> {

jswrenn · 2025-04-23T17:39:40Z

compiler/rustc_transmute/src/layout/dfa.rs

+        use crate::layout::Byte;
+
+        #[derive(Eq, PartialEq, Copy, Clone, Debug)]
+        pub(super) struct Run<S> {


Doc comment?

jswrenn · 2025-04-23T17:41:27Z

compiler/rustc_transmute/src/layout/dfa.rs

+                Self { start: *range.start(), end: *range.end(), dst }
+            }
+
+            pub(super) fn from_closed_open(range: Range<u16>, dst: S) -> Self {


from_inclusive_exclusive, perhaps? that nomenclature is consistent with https://doc.rust-lang.org/stable/std/ops/enum.Bound.html

jswrenn · 2025-04-23T17:44:11Z

compiler/rustc_transmute/src/layout/dfa.rs

+                idx >= u16::from(self.start) && idx <= u16::from(self.end)
+            }
+
+            pub(super) fn as_closed_open(&self) -> (u16, u16) {


Same naming note here.

jswrenn · 2025-04-23T17:51:45Z

compiler/rustc_transmute/src/layout/dfa.rs

+            self.uninit
+        }
+
+        pub(crate) fn map<SS>(self, mut f: impl FnMut(S) -> SS) -> EdgeSet<SS> {


map_states

jswrenn · 2025-04-23T17:57:52Z

compiler/rustc_transmute/src/layout/mod.rs

@@ -62,6 +101,21 @@ impl Ref for ! {
    }
 }

+#[cfg(test)]
+impl<const N: usize> Ref for [u8; N] {


Can you make this [(); N], since the bytes do not currently encode anything?

Or, alternatively, do what we do for Def: define a named type, Ref, that stands in for references in tests.

Went with the [(); N] approach.

rustbot · 2025-04-23T18:05:14Z

These commits modify the Cargo.lock file. Unintentional changes to Cargo.lock can be introduced when switching branches and rebasing PRs.

If this was unintentional then you should revert the changes before this PR is merged.
Otherwise, you can ignore this comment.

jswrenn · 2025-04-23T18:11:53Z

Thanks!

@matthiaskrgr these changes are limited to the performance of the experimental TransmuteFrom trait. This trait is currently only used in ui tests of the TransmuteFrom trait. The integration of TransmuteFrom into the trait solver is unmodified here, so this PR should not have any impact on broader rustc performance.

@bors r+

bors · 2025-04-23T18:11:56Z

📌 Commit 048863e has been approved by jswrenn

It is now in the queue for this repository.

In the `Tree` and `Dfa` representations of a type's layout, store byte ranges rather than needing to separately store each byte value. This permits us to, for example, represent a `u8` using a single 0..=255 edge in the DFA rather than using 256 separate edges. This leads to drastic performance improvements. For example, on the author's 2024 MacBook Pro, the time to convert the `Tree` representation of a `u64` to its equivalent DFA representation drops from ~8.5ms to ~1us, a reduction of ~8,500x. See `bench_dfa_from_tree`. Similarly, the time to execute a transmutability query from `u64` to `u64` drops from ~35us to ~1.7us, a reduction of ~20x. See `bench_transmute`.

jswrenn · 2025-04-23T18:46:40Z

@bors r+

bors · 2025-04-23T18:46:42Z

📌 Commit 4326a44 has been approved by jswrenn

It is now in the queue for this repository.

Zalathar · 2025-04-24T03:24:11Z

Scheduling: Prefer rollup=never PRs over non-rolled-up iffy PRs.

@bors p=1

bors · 2025-04-24T05:36:42Z

⌛ Testing commit 4326a44 with merge 7f69523...

panstromek · 2025-04-24T05:55:59Z

@matthiaskrgr these changes are limited to the performance of the experimental TransmuteFrom trait. This trait is currently only used in ui tests of the TransmuteFrom trait. The integration of TransmuteFrom into the trait solver is unmodified here, so this PR should not have any impact on broader rustc performance.

Regressions from the perf run are all real, so this doesn't seem to inluence only TransmuteFrom (various unrelated benchmarks regressed). This will definitely come up in next week perf triage, do we have some explanation for what happened there and if it's justified before this lands?

bors · 2025-04-24T08:49:40Z

☀️ Test successful - checks-actions
Approved by: jswrenn
Pushing 7f69523 to master...

github-actions · 2025-04-24T08:52:49Z

What is this?

This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing c02a4f0 (parent) -> 7f69523 (this PR)

Test differences

Show 7 test diffs

Stage 1

maybe_transmutable::tests::alt::should_permit_identity_transmutation: [missing] -> pass (J0)
maybe_transmutable::tests::benches::bench_dfa_from_tree: [missing] -> pass (J0)
maybe_transmutable::tests::benches::bench_transmute: [missing] -> pass (J0)
maybe_transmutable::tests::bool::transmute_u8: [missing] -> pass (J0)
maybe_transmutable::tests::r#ref::should_permit_identity_transmutation: [missing] -> pass (J0)
maybe_transmutable::tests::size::size: [missing] -> pass (J0)
maybe_transmutable::tests::uninit::size: [missing] -> pass (J0)

Job group index

J0: aarch64-apple, aarch64-gnu, i686-gnu-2, i686-gnu-nopt-2, i686-msvc-2, x86_64-apple-1, x86_64-gnu, x86_64-gnu-llvm-19-3, x86_64-gnu-llvm-20-3, x86_64-gnu-nopt, x86_64-gnu-stable, x86_64-mingw-2, x86_64-msvc-2

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard 7f695232a80fa1833e2282f2577c5e1ff066bf39 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

dist-aarch64-linux: 5460.7s -> 7562.7s (38.5%)
dist-apple-various: 6406.0s -> 7955.6s (24.2%)
aarch64-apple: 4239.3s -> 3691.7s (-12.9%)
x86_64-apple-2: 5451.5s -> 4825.3s (-11.5%)
x86_64-apple-1: 7714.6s -> 7103.8s (-7.9%)
x86_64-msvc-2: 6765.3s -> 7184.7s (6.2%)
i686-gnu-2: 6739.7s -> 6334.7s (-6.0%)
x86_64-msvc-ext3: 7891.1s -> 7430.1s (-5.8%)
dist-ohos: 10561.0s -> 11037.3s (4.5%)
dist-x86_64-apple: 9242.5s -> 9631.3s (4.2%)

How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

rust-timer · 2025-04-24T10:20:54Z

Finished benchmarking commit (7f69523): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary 0.1%, secondary 3.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.7%	[0.7%, 0.7%]	1
Regressions ❌ (secondary)	3.3%	[2.0%, 4.6%]	2
Improvements ✅ (primary)	-0.5%	[-0.5%, -0.5%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.1%	[-0.5%, 0.7%]	2

Cycles

Results (primary -0.4%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.4%	[-0.4%, -0.4%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.4%	[-0.4%, -0.4%]	1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 774.94s -> 776.02s (0.14%)
Artifact size: 365.09 MiB -> 365.11 MiB (0.00%)

panstromek · 2025-04-24T10:43:40Z

Looks like the regression is gone now, so nevermind then.

tmiasko · 2025-04-26T20:56:27Z

compiler/rustc_transmute/src/maybe_transmutable/mod.rs

+                                // - If `assume.validity`, then we should
+                                //   succeed because the user is responsible for
+                                //   ensuring that the *specific* byte value
+                                //   appearing at runtime is valid for the
+                                //   destination type. When `assume.validity`,
+                                //   `src_quantifier ==
+                                //   Quantifier::ThereExists`, so adding an
+                                //   `Answer::Yes` has the effect of ensuring
+                                //   that the "there exists" is always
+                                //   satisfied.


Previously, when assuming validity, query nevertheless ensured that there exists a value that is valid for both src and dst:

#![feature(transmutability)] use std::mem::{Assume, TransmuteFrom}; pub fn is_transmutable<Src, Dst>() where Dst: TransmuteFrom<Src, { Assume::VALIDITY }>, { } pub union MaybeUninitByte { pub uninit: (), pub byte: u8, } fn main() { is_transmutable::<MaybeUninitByte, [u8; 2]>(); // previously rejected, now accepted }

I'd expect this to hit the Answer::No(Reason::DstIsTooBig) branch here:

rust/compiler/rustc_transmute/src/maybe_transmutable/mod.rs

Lines 180 to 187 in 4326a44

} else if src_state == self.src.accept {

// extension: `size_of(Src) <= size_of(Dst)`

if let Some(dst_state_prime) = self.dst.get_uninit_edge_dst(dst_state) {

self.answer_memo(cache, src_state, dst_state_prime)

} else {

Answer::No(Reason::DstIsTooBig)

}

} else {

...but that's clearly not happening. cc @joshlf

transmutability: uninit transition matches unit byte only The previous implementation was inconsistent about transitions that apply for an init byte. For example, when answering a query, an init byte could use corresponding init transition. Init byte could also use uninit transition, but only when the corresponding init transition was absent. This behaviour was incompatible with DFA union construction. Define an uninit transition to match an uninit byte only and update implementation accordingly. To describe that `Tree::uninit` is valid for any value, build an automaton that accepts any byte value. Additionally, represent byte ranges uniformly as a pair of integers to avoid special case for uninit byte. Fixes rust-lang#140337. Fixes rust-lang#140168 (comment). r? `@jswrenn` `@joshlf`

transmutability: uninit transition matches unit byte only The previous implementation was inconsistent about transitions that apply for an init byte. For example, when answering a query, an init byte could use corresponding init transition. Init byte could also use uninit transition, but only when the corresponding init transition was absent. This behaviour was incompatible with DFA union construction. Define an uninit transition to match an uninit byte only and update implementation accordingly. To describe that `Tree::uninit` is valid for any value, build an automaton that accepts any byte value. Additionally, represent byte ranges uniformly as a pair of integers to avoid special case for uninit byte. Fixes rust-lang#140337. Fixes rust-lang#140168 (comment). r? ``@jswrenn`` ``@joshlf``

transmutability: uninit transition matches unit byte only The previous implementation was inconsistent about transitions that apply for an init byte. For example, when answering a query, an init byte could use corresponding init transition. Init byte could also use uninit transition, but only when the corresponding init transition was absent. This behaviour was incompatible with DFA union construction. Define an uninit transition to match an uninit byte only and update implementation accordingly. To describe that `Tree::uninit` is valid for any value, build an automaton that accepts any byte value. Additionally, represent byte ranges uniformly as a pair of integers to avoid special case for uninit byte. Fixes rust-lang#140337. Fixes rust-lang#140168 (comment). r? ```@jswrenn``` ```@joshlf```

rustbot assigned jswrenn Apr 22, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 22, 2025

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 22, 2025

joshlf force-pushed the no-nfas branch from 334574d to 4198021 Compare April 22, 2025 18:41

This comment has been minimized.

Sign in to view

joshlf force-pushed the no-nfas branch from 4198021 to ae823fd Compare April 22, 2025 20:07

This comment has been minimized.

Sign in to view

joshlf force-pushed the no-nfas branch from ae823fd to ab09494 Compare April 22, 2025 20:54

joshlf marked this pull request as ready for review April 22, 2025 20:54

This comment has been minimized.

Sign in to view

joshlf force-pushed the no-nfas branch from ab09494 to 2431d88 Compare April 22, 2025 21:08

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Apr 23, 2025

jswrenn reviewed Apr 23, 2025

View reviewed changes

compiler/rustc_transmute/src/layout/dfa.rs Show resolved Hide resolved

jswrenn reviewed Apr 23, 2025

View reviewed changes

joshlf force-pushed the no-nfas branch from 2431d88 to 048863e Compare April 23, 2025 18:05

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 23, 2025

This comment has been minimized.

Sign in to view

joshlf force-pushed the no-nfas branch from 048863e to 4326a44 Compare April 23, 2025 18:45

bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 24, 2025

bors merged commit 7f69523 into rust-lang:master Apr 24, 2025
7 checks passed

rustbot added this to the 1.88.0 milestone Apr 24, 2025

bors mentioned this pull request Apr 24, 2025

transmutability: Support char, NonZeroXxx #140215

Merged

rustbot removed the perf-regression Performance regression. label Apr 24, 2025

tmiasko reviewed Apr 26, 2025

View reviewed changes

tmiasko mentioned this pull request Apr 28, 2025

transmutability: uninit transition matches unit byte only #140380

Merged

	pub(crate) fn iter_bytes_from(&self, start: State) -> impl Iterator<Item = (Byte, State)> {
	pub(crate) fn iter_bytes_from(&self, start: State) -> impl Iterator<Item = (Byte, State)> {

	} else if src_state == self.src.accept {
	// extension: `size_of(Src) <= size_of(Dst)`
	if let Some(dst_state_prime) = self.dst.get_uninit_edge_dst(dst_state) {
	self.answer_memo(cache, src_state, dst_state_prime)
	} else {
	Answer::No(Reason::DstIsTooBig)
	}
	} else {

transmute: Mark edges by byte sets, not byte values #140168

transmute: Mark edges by byte sets, not byte values #140168

Conversation

joshlf commented Apr 22, 2025

matthiaskrgr commented Apr 22, 2025

This comment has been minimized.

bors commented Apr 22, 2025

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

matthiaskrgr commented Apr 23, 2025

This comment has been minimized.

bors commented Apr 23, 2025

bors commented Apr 23, 2025

This comment has been minimized.

rust-timer commented Apr 23, 2025

Overall result: ❌ regressions - please read the text below

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rustbot commented Apr 23, 2025

jswrenn commented Apr 23, 2025

bors commented Apr 23, 2025

This comment has been minimized.

jswrenn commented Apr 23, 2025

bors commented Apr 23, 2025

Zalathar commented Apr 24, 2025

bors commented Apr 24, 2025

panstromek commented Apr 24, 2025

bors commented Apr 24, 2025

github-actions bot commented Apr 24, 2025

Test differences

Stage 1

Job duration changes

rust-timer commented Apr 24, 2025

Overall result: no relevant changes - no action needed

panstromek commented Apr 24, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment