Skip to content

transmute: Mark edges by byte sets, not byte values #140168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 24, 2025

Conversation

joshlf
Copy link
Contributor

@joshlf joshlf commented Apr 22, 2025

This leads to drastic performance improvements. For example, on the author's 2024 MacBook Pro, the time to convert the Tree representation of a u64 to its equivalent DFA representation drops from ~8.5ms to ~1us, a reduction of ~8,500x. See bench_dfa_from_tree.

Similarly, the time to execute a transmutability query from u64 to u64 drops from ~35us to ~1.7us, a reduction of ~20x. See bench_transmute.

r? @jswrenn

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 22, 2025
@matthiaskrgr
Copy link
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 22, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 22, 2025
transmute: Mark edges by byte sets, not byte values

This leads to drastic performance improvements. For example, on the author's 2024 MacBook Pro, the time to convert the `Tree` representation of a `u64` to its equivalent DFA representation drops from ~8.5ms to ~1us, a reduction of ~8,500x. See `bench_dfa_from_tree`.

Similarly, the time to execute a transmutability query from `u64` to `u64` drops from ~35us to ~1.7us, a reduction of ~20x. See `bench_transmute`.

r? `@jswrenn`
@bors
Copy link
Collaborator

bors commented Apr 22, 2025

⌛ Testing commit 4198021 with merge 4bd3736...

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@joshlf joshlf marked this pull request as ready for review April 22, 2025 20:54
@rust-log-analyzer

This comment has been minimized.

@matthiaskrgr
Copy link
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@bors
Copy link
Collaborator

bors commented Apr 23, 2025

⌛ Trying commit 2431d88 with merge a0d017e...

bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 23, 2025
transmute: Mark edges by byte sets, not byte values

This leads to drastic performance improvements. For example, on the author's 2024 MacBook Pro, the time to convert the `Tree` representation of a `u64` to its equivalent DFA representation drops from ~8.5ms to ~1us, a reduction of ~8,500x. See `bench_dfa_from_tree`.

Similarly, the time to execute a transmutability query from `u64` to `u64` drops from ~35us to ~1.7us, a reduction of ~20x. See `bench_transmute`.

r? `@jswrenn`
@bors
Copy link
Collaborator

bors commented Apr 23, 2025

☀️ Try build successful - checks-actions
Build commit: a0d017e (a0d017e8bbf248582473ad0c1a931dbb8755d47d)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (a0d017e): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.2% [0.1%, 0.3%] 21
Regressions ❌
(secondary)
0.3% [0.1%, 0.6%] 58
Improvements ✅
(primary)
-0.2% [-0.2%, -0.2%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.2% [-0.2%, 0.3%] 22

Max RSS (memory usage)

Results (primary 0.0%, secondary 0.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.1% [0.4%, 2.5%] 3
Regressions ❌
(secondary)
2.3% [1.9%, 2.5%] 6
Improvements ✅
(primary)
-1.6% [-2.4%, -0.7%] 2
Improvements ✅
(secondary)
-6.1% [-7.9%, -4.3%] 2
All ❌✅ (primary) 0.0% [-2.4%, 2.5%] 5

Cycles

Results (secondary -2.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.3% [-2.8%, -2.0%] 3
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 774.578s -> 775.487s (0.12%)
Artifact size: 365.08 MiB -> 365.07 MiB (-0.00%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Apr 23, 2025
}

pub(crate) fn refs_from(&self, start: State) -> Option<&Map<R, State>> {
Some(&self.transitions.get(&start)?.ref_transitions)
pub(crate) fn iter_bytes_from(&self, start: State) -> impl Iterator<Item = (Byte, State)> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would lightly prefer dropping the iter_ prefix, so long as there's no conflict between this and a hypothetical bytes_from method that doesn't return an iter:

Suggested change
pub(crate) fn iter_bytes_from(&self, start: State) -> impl Iterator<Item = (Byte, State)> {
pub(crate) fn iter_bytes_from(&self, start: State) -> impl Iterator<Item = (Byte, State)> {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

use crate::layout::Byte;

#[derive(Eq, PartialEq, Copy, Clone, Debug)]
pub(super) struct Run<S> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Self { start: *range.start(), end: *range.end(), dst }
}

pub(super) fn from_closed_open(range: Range<u16>, dst: S) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_inclusive_exclusive, perhaps? that nomenclature is consistent with https://doc.rust-lang.org/stable/std/ops/enum.Bound.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

idx >= u16::from(self.start) && idx <= u16::from(self.end)
}

pub(super) fn as_closed_open(&self) -> (u16, u16) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same naming note here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

self.uninit
}

pub(crate) fn map<SS>(self, mut f: impl FnMut(S) -> SS) -> EdgeSet<SS> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map_states

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -62,6 +101,21 @@ impl Ref for ! {
}
}

#[cfg(test)]
impl<const N: usize> Ref for [u8; N] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this [(); N], since the bytes do not currently encode anything?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, alternatively, do what we do for Def: define a named type, Ref, that stands in for references in tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with the [(); N] approach.

@rustbot
Copy link
Collaborator

rustbot commented Apr 23, 2025

These commits modify the Cargo.lock file. Unintentional changes to Cargo.lock can be introduced when switching branches and rebasing PRs.

If this was unintentional then you should revert the changes before this PR is merged.
Otherwise, you can ignore this comment.

@jswrenn
Copy link
Member

jswrenn commented Apr 23, 2025

Thanks!

@matthiaskrgr these changes are limited to the performance of the experimental TransmuteFrom trait. This trait is currently only used in ui tests of the TransmuteFrom trait. The integration of TransmuteFrom into the trait solver is unmodified here, so this PR should not have any impact on broader rustc performance.

@bors r+

@bors
Copy link
Collaborator

bors commented Apr 23, 2025

📌 Commit 048863e has been approved by jswrenn

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 23, 2025
@rust-log-analyzer

This comment has been minimized.

In the `Tree` and `Dfa` representations of a type's layout, store byte
ranges rather than needing to separately store each byte value. This
permits us to, for example, represent a `u8` using a single 0..=255 edge
in the DFA rather than using 256 separate edges.

This leads to drastic performance improvements. For example, on the
author's 2024 MacBook Pro, the time to convert the `Tree` representation
of a `u64` to its equivalent DFA representation drops from ~8.5ms to
~1us, a reduction of ~8,500x. See `bench_dfa_from_tree`.

Similarly, the time to execute a transmutability query from `u64` to
`u64` drops from ~35us to ~1.7us, a reduction of ~20x. See
`bench_transmute`.
@jswrenn
Copy link
Member

jswrenn commented Apr 23, 2025

@bors r+

@bors
Copy link
Collaborator

bors commented Apr 23, 2025

📌 Commit 4326a44 has been approved by jswrenn

It is now in the queue for this repository.

@Zalathar
Copy link
Contributor

Scheduling: Prefer rollup=never PRs over non-rolled-up iffy PRs.

@bors p=1

@bors
Copy link
Collaborator

bors commented Apr 24, 2025

⌛ Testing commit 4326a44 with merge 7f69523...

@panstromek
Copy link
Contributor

@matthiaskrgr these changes are limited to the performance of the experimental TransmuteFrom trait. This trait is currently only used in ui tests of the TransmuteFrom trait. The integration of TransmuteFrom into the trait solver is unmodified here, so this PR should not have any impact on broader rustc performance.

Regressions from the perf run are all real, so this doesn't seem to inluence only TransmuteFrom (various unrelated benchmarks regressed). This will definitely come up in next week perf triage, do we have some explanation for what happened there and if it's justified before this lands?

@bors
Copy link
Collaborator

bors commented Apr 24, 2025

☀️ Test successful - checks-actions
Approved by: jswrenn
Pushing 7f69523 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 24, 2025
@bors bors merged commit 7f69523 into rust-lang:master Apr 24, 2025
7 checks passed
@rustbot rustbot added this to the 1.88.0 milestone Apr 24, 2025
Copy link

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing c02a4f0 (parent) -> 7f69523 (this PR)

Test differences

Show 7 test diffs

Stage 1

  • maybe_transmutable::tests::alt::should_permit_identity_transmutation: [missing] -> pass (J0)
  • maybe_transmutable::tests::benches::bench_dfa_from_tree: [missing] -> pass (J0)
  • maybe_transmutable::tests::benches::bench_transmute: [missing] -> pass (J0)
  • maybe_transmutable::tests::bool::transmute_u8: [missing] -> pass (J0)
  • maybe_transmutable::tests::r#ref::should_permit_identity_transmutation: [missing] -> pass (J0)
  • maybe_transmutable::tests::size::size: [missing] -> pass (J0)
  • maybe_transmutable::tests::uninit::size: [missing] -> pass (J0)

Job group index

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard 7f695232a80fa1833e2282f2577c5e1ff066bf39 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. dist-aarch64-linux: 5460.7s -> 7562.7s (38.5%)
  2. dist-apple-various: 6406.0s -> 7955.6s (24.2%)
  3. aarch64-apple: 4239.3s -> 3691.7s (-12.9%)
  4. x86_64-apple-2: 5451.5s -> 4825.3s (-11.5%)
  5. x86_64-apple-1: 7714.6s -> 7103.8s (-7.9%)
  6. x86_64-msvc-2: 6765.3s -> 7184.7s (6.2%)
  7. i686-gnu-2: 6739.7s -> 6334.7s (-6.0%)
  8. x86_64-msvc-ext3: 7891.1s -> 7430.1s (-5.8%)
  9. dist-ohos: 10561.0s -> 11037.3s (4.5%)
  10. dist-x86_64-apple: 9242.5s -> 9631.3s (4.2%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (7f69523): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary 0.1%, secondary 3.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.7% [0.7%, 0.7%] 1
Regressions ❌
(secondary)
3.3% [2.0%, 4.6%] 2
Improvements ✅
(primary)
-0.5% [-0.5%, -0.5%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [-0.5%, 0.7%] 2

Cycles

Results (primary -0.4%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.4% [-0.4%, -0.4%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.4% [-0.4%, -0.4%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 774.94s -> 776.02s (0.14%)
Artifact size: 365.09 MiB -> 365.11 MiB (0.00%)

@rustbot rustbot removed the perf-regression Performance regression. label Apr 24, 2025
@panstromek
Copy link
Contributor

Looks like the regression is gone now, so nevermind then.

Comment on lines +233 to +242
// - If `assume.validity`, then we should
// succeed because the user is responsible for
// ensuring that the *specific* byte value
// appearing at runtime is valid for the
// destination type. When `assume.validity`,
// `src_quantifier ==
// Quantifier::ThereExists`, so adding an
// `Answer::Yes` has the effect of ensuring
// that the "there exists" is always
// satisfied.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, when assuming validity, query nevertheless ensured that there exists a value that is valid for both src and dst:

#![feature(transmutability)]
use std::mem::{Assume, TransmuteFrom};

pub fn is_transmutable<Src, Dst>()
where
    Dst: TransmuteFrom<Src, { Assume::VALIDITY }>,
{
}

pub union MaybeUninitByte {
    pub uninit: (),
    pub byte: u8,
}

fn main() {
    is_transmutable::<MaybeUninitByte, [u8; 2]>(); // previously rejected, now accepted
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect this to hit the Answer::No(Reason::DstIsTooBig) branch here:

} else if src_state == self.src.accept {
// extension: `size_of(Src) <= size_of(Dst)`
if let Some(dst_state_prime) = self.dst.get_uninit_edge_dst(dst_state) {
self.answer_memo(cache, src_state, dst_state_prime)
} else {
Answer::No(Reason::DstIsTooBig)
}
} else {

...but that's clearly not happening. cc @joshlf

Zalathar added a commit to Zalathar/rust that referenced this pull request Apr 30, 2025
transmutability: uninit transition matches unit byte only

The previous implementation was inconsistent about transitions that
apply for an init byte. For example, when answering a query, an init
byte could use corresponding init transition. Init byte could also use
uninit transition, but only when the corresponding init transition was
absent. This behaviour was incompatible with DFA union construction.

Define an uninit transition to match an uninit byte only and update
implementation accordingly. To describe that `Tree::uninit` is valid
for any value, build an automaton that accepts any byte value.

Additionally, represent byte ranges uniformly as a pair of integers to
avoid special case for uninit byte.

Fixes rust-lang#140337.
Fixes rust-lang#140168 (comment).

r? `@jswrenn` `@joshlf`
Zalathar added a commit to Zalathar/rust that referenced this pull request Apr 30, 2025
transmutability: uninit transition matches unit byte only

The previous implementation was inconsistent about transitions that
apply for an init byte. For example, when answering a query, an init
byte could use corresponding init transition. Init byte could also use
uninit transition, but only when the corresponding init transition was
absent. This behaviour was incompatible with DFA union construction.

Define an uninit transition to match an uninit byte only and update
implementation accordingly. To describe that `Tree::uninit` is valid
for any value, build an automaton that accepts any byte value.

Additionally, represent byte ranges uniformly as a pair of integers to
avoid special case for uninit byte.

Fixes rust-lang#140337.
Fixes rust-lang#140168 (comment).

r? ``@jswrenn`` ``@joshlf``
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Apr 30, 2025
transmutability: uninit transition matches unit byte only

The previous implementation was inconsistent about transitions that
apply for an init byte. For example, when answering a query, an init
byte could use corresponding init transition. Init byte could also use
uninit transition, but only when the corresponding init transition was
absent. This behaviour was incompatible with DFA union construction.

Define an uninit transition to match an uninit byte only and update
implementation accordingly. To describe that `Tree::uninit` is valid
for any value, build an automaton that accepts any byte value.

Additionally, represent byte ranges uniformly as a pair of integers to
avoid special case for uninit byte.

Fixes rust-lang#140337.
Fixes rust-lang#140168 (comment).

r? ```@jswrenn``` ```@joshlf```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants